ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS #ESCBOS

From Hw to Sw: Parallel Logic Applied to Event-‐Driven Firmware Jonny Doin – GridVortex

#ESCBOS

From Hardware to Firmware • Introduc+on • Mul+tasking: the holy grail of compu+ng • Parallel compu+ng and VHDL • process() and sequen+al parallel logic • Signals and Sensi+vity lists in VHDL • Signals and Sensi+vity lists in Firmware • Bit-‐banding on Cortex-‐M • Event-‐driven scheduling • Hardware scheduling and Mul+core µC • Final thoughts

#ESCBOS

Intro

In this talk we will see:

• Architectural aspects of mul+-‐tasking

• Some techniques for implemen+ng event-‐driven firmware

• Concepts of Hardware Design that can be applied to Firmware development

#ESCBOS

Mul3tasking

Mul+tasking is one of the most important concepts of modern compu+ng.

Efficient use of processing bandwidth affects energy and real-‐+me response.

Microcontrollers with over 200MIPS are becoming very accessible to even the smallest applica+ons.

hRps://s-‐media-‐cache-‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg

#ESCBOS

Mul3tasking (2)

Mul+tasking can be described as simula+on of a parallel processing system using a smaller number of sequen+al processors.

Several mul+tasking schemes evolved over +me for tradi+onal compu+ng systems:

• Priority-‐based scheduling and mul+threading • Collabora+ve mul+tasking •  Interrupt-‐based real +me systems •  Event-‐driven mul+tasking

#ESCBOS

Mul3tasking (3)

Mul+tasking schemes are a compromise:

• Cost of scheduling

•  System blocking +me

•  Effec+ve processing bandwidth

•  System response +me USER TASK CPU TIME

SCHEDULER CPU TIME

#ESCBOS

Parallel processing and VHDL

Truly parallel systems can be implemented in digital hardware.

Languages to describe and design such systems have specific language features to describe parallel logic.

VHDL uses a state-‐based model to describe parallel processing.

#ESCBOS

process() and parallel logic In VHDL, sec+ons of sequen+al logic that run in parallel with the rest of the system are defined using the process() structure: !counter: process (clk_i, cnt_clear) is begin if cnt_clear = '1' then cnt_reg <= 0; else if clk_i'event and clk_i = '1' then if cnt_ce = '1' then cnt_reg <= cnt_next; end if; end if; end if; end process counter; cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;

Register, sequen+al logic

Adder, combina+onal logic

#ESCBOS

Signals and sensi3vity lists The process() defini+on includes a list of signals:

process (clk_i, cnt_clear)

Logic in the process() is only “executed” when any signals declared on its sensi(vity list change state.

Any other logic in the circuit can alter the state of these signals, and when that happens, the process is executed.

The signals in VHDL have much more to them. They have a “transac+on +meline” and support future transac+ons to be scheduled on the signal.

#ESCBOS

Signals and sensi3vity lists (2)

VHDL sensi+vity lists:

•  Simple state-‐based, event-‐driven paradigm

•  Simulate parallel hardware logic

•  Simulators use processing bandwidth efficiently

The paradigm is based on the delta cycle, a concept similar to an execu(on pass of the logic. All signals will be assigned their values only at the end of the current delta cycle.

#ESCBOS

Signals and sensi3vity lists (3)

The VHDL concepts of process() with sensi+vity lists and delta cycles can be implemented in a bare-‐metal firmware to achieve mul+tasking with low processing cost.

The benefits of these elements of mul+tasking are:

•  Fast event-‐driven scheduling

•  Structural integrity of the logic

•  Scalability for mul+core systems

#ESCBOS

Bit-‐banding on Cortex-‐M

ARM Cortex-‐M cores have dedicated memory addressing hardware to implement atomic bit-‐access in memory without read-‐modify-‐write ar+facts.

• bit-‐signals can be used as efficient Inter Process Communica+on (IPC)

•  Fastest atomic opera+ons in a Cortex-‐M (faster than STREX/LDREX)

• Map to a special area in RAM

#ESCBOS

Bit-‐banding on Cortex-‐M (2)

Programmers Model

ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-14ID070610 Non-Confidential

3.4 System address mapThe processor contains a bus matrix that arbitrates the processor core and optional Debug Access Port (DAP) memory accesses to both the external memory system and to the internal System Control Space (SCS) and debug components.

Priority is always given to the processor to ensure that any debug accesses are as non-intrusive as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug resources are completely non-intrusive.

Figure 3-1 shows the system address map.

Figure 3-1 System address map

Table 3-3 shows the processor interfaces that are addressed by the different memory map regions.

System

External device

External RAM

Peripheral

SRAM

Code

0xFFFFFFFF

Private peripheral bus - External0xE0100000

0xE0040000

0xA0000000

0x60000000

0x40000000

0x20000000

0x00000000

ROM Table

ETMTPIU

ReservedSCS

ReservedFPBDWTITM

External PPB0xE00420000xE00410000xE0040000

0xE000F0000xE000E0000xE00030000xE0002000

0xE00FF000

0x40000000Bit band region

Bit band alias32MB

1MB

31MB

0x40100000

0x42000000

0x44000000

0xE00010000xE0000000

Private peripheral bus - Internal

Bit band region

Bit band alias32MB

1MB

31MB

0x200000000x20100000

0x22000000

1.0GB

1.0GB

0.5GB

0.5GB

0.5GB

0xE0000000

0xE0100000

0xE0040000

0x24000000

Table 3-3 Memory regions

Memory Map Region

Code Instruction fetches are performed over the ICode bus. Data accesses are performed over the DCode bus.

SRAM Instruction fetches and data accesses are performed over the system bus.

SRAM bit-band Alias region. Data accesses are aliases. Instruction accesses are not aliases.

•  Hardware remapping of accesses

•  Known adresses for any Cortex-‐M

•  Atomic writes on individual bits

•  Simultaneous reads on all 32bits

source: ARM DDI 0439C, page 3-‐20

#ESCBOS

Bit-‐banding on Cortex-‐M (3) Bit-‐banding memory remap structure:

•  Words (32bit) in the alias region map to individual bits in the normal SRAM memory

•  The remapped writes are guaranteed atomic

Programmers Model

ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-20ID070610 Non-Confidential

• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C

= 0x22000000 + (0*32) + 7*4.

Figure 3-2 Bit-band mapping

3.7.1 Directly accessing an alias region

Writing to a word in the alias region has the same effect as a read-modify-write operation on the

targeted bit in the bit-band region.

Bit [0] of the value written to a word in the alias region determines the value written to the

targeted bit in the bit-band region. Writing a value with bit [0] set writes a 1 to the bit-band bit,

and writing a value with bit [0] cleared writes a 0 to the bit-band bit.

Bits [31:1] of the alias word have no effect on the bit-band bit. Writing 0x01 has the same effect

as writing 0xFF. Writing 0x00 has the same effect as writing 0x0E.

Reading a word in the alias region returns either 0x01 or 0x00. A value of 0x01 indicates that the

targeted bit in the bit-band region is set. A value of 0x00 indicates that the targeted bit is clear.

Bits [31:1] are zero.

3.7.2 Directly accessing a bit-band region

You can directly access the bit-band region with normal reads and writes to that region.

0x23FFFFE4

0x22000004

0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC

0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C

32MB alias region

0

7 0

07

0x200000000x200000010x200000020x20000003

6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1

07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1

0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF

1MB SRAM bit-band region

source: ARM DDI 0439C, page 3-‐20

#ESCBOS

Event-‐driven scheduling

Using the concepts from VHDL and the atomic Bit-‐banding from Cortex-M it is possible to:

•  Implement event-‐driven mul+tasking

•  Have process()-‐like handlers with light overhead •  Implement state machine logic efficiently

•  Use bit signals as efficient IPC

#ESCBOS

Event-‐driven scheduling (2) typedef uint32_t * PFLAGS_T; typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits PFLAGS_T pflags_base; // Ptr to the base of the word alias array } IPC_FLAGS_T; // for the ipc macros, pass a IPC_FLAGS_T struct #define get_bit(flags, bit) ((flags).pflags_base[(bit)]) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) #define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0) #define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) #define clr_bits(flags) (*((flags).pflags_bits) = 0) #define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask)) extern void init_ipc(void); extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);

#ESCBOS

Event-‐driven scheduling (3) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) so:

set_bit(my_flags, 7); translates to:

myflags.pflags_base[7] = 1; where: IPC_FLAGS_T myflags; myflags.pflags_base = (PFLAGS_T) 0x22000000; myflags.pflags_bits = (PFLAGS_T) 0x20000000;

... 0x00000001

bit-‐band alias area

0x22000000

0x22000080

bit-‐band region 0x00000080 0x20000000

#ESCBOS

Event-‐driven scheduling (4) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) so:

if(event(my_flags, 7)) { ... }

translates to:

if(((myflags.pflags_base[7] = 0), 1)) after evaluation of the side effect, becomes:

if((1))

comma operator

side effect part result

#ESCBOS

Event-‐driven scheduling (5) enum keypad_bits_t { bit_keypad_value_update = 0, bit_keypressed_wait, bit_refresh_debounce_tmr, }; void process_keypad(void) { if(event_refresh_debounce_tmr()) { keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME; keypad_data.state = KEYPAD_DEBOUNCE; } ... } static void trigger_keypad_update(void *object) { keypad_data.latched = read_keypad_value(); set_bit_refresh_debounce_tmr(); }

#ESCBOS

Event-‐driven scheduling (6)

This event-‐driven architecture:

•  Is simple to implement

•  Scales well even with mul+core Cortex-‐M systems

•  Improves processing granularity

•  Can be implemented in hardware on ARM+FPGA systems

#ESCBOS

Hardware scheduling

The event-‐driven scheduling can be implemented directly in hardware on a ARM+FPGA system.

Instead of using a round-‐robin cycle in firmware, the underlying hardware can place a “call” to each process() according to its sensi+vity list.

This approach can reduce overhead to a few instruc+on cycles for a very responsive real+me system.

#ESCBOS

Mul3core Cortex-‐M devices

The event-‐driven paradigm can be effec+vely implemented in a mul+core Cortex-‐M system with common memory.

hRp://hothardware.com/newsimages/Item9563/cortex-‐m3-‐arm-‐cpu.png

BUX MATRIX

SHARED RAM SHARED FLASH

This approach simplifies system par++oning on the processor cores, and can decrease system response +me for event-‐driven bare-‐metal logic.

Even when no bit-‐banding is available in the shared memory, atomic events can be used.

#ESCBOS

Final Thoughts

The event-‐driven paradigm is a powerful and scalable architectural structure.

It is being used in bare-‐metal embedded systems with 300KLOC+.

If coupled with hardware scheduling support, it can be used to implement very fast event response systems that are very hard to implement with priority-‐based schedulers.

#ESCBOS

Thank you

Jonny Doin [email protected]

Documents

ParallelLogicToEventDrivenFirmware_Doin