Upload
jonny-doin
View
82
Download
0
Embed Size (px)
Citation preview
#ESCBOS #ESCBOS
From Hw to Sw: Parallel Logic Applied to Event-‐Driven Firmware Jonny Doin – GridVortex
#ESCBOS
From Hardware to Firmware • Introduc+on • Mul+tasking: the holy grail of compu+ng • Parallel compu+ng and VHDL • process() and sequen+al parallel logic • Signals and Sensi+vity lists in VHDL • Signals and Sensi+vity lists in Firmware • Bit-‐banding on Cortex-‐M • Event-‐driven scheduling • Hardware scheduling and Mul+core µC • Final thoughts
#ESCBOS
Intro
In this talk we will see:
• Architectural aspects of mul+-‐tasking
• Some techniques for implemen+ng event-‐driven firmware
• Concepts of Hardware Design that can be applied to Firmware development
#ESCBOS
Mul3tasking
Mul+tasking is one of the most important concepts of modern compu+ng.
Efficient use of processing bandwidth affects energy and real-‐+me response.
Microcontrollers with over 200MIPS are becoming very accessible to even the smallest applica+ons.
hRps://s-‐media-‐cache-‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg
#ESCBOS
Mul3tasking (2)
Mul+tasking can be described as simula+on of a parallel processing system using a smaller number of sequen+al processors.
Several mul+tasking schemes evolved over +me for tradi+onal compu+ng systems:
• Priority-‐based scheduling and mul+threading • Collabora+ve mul+tasking • Interrupt-‐based real +me systems • Event-‐driven mul+tasking
#ESCBOS
Mul3tasking (3)
Mul+tasking schemes are a compromise:
• Cost of scheduling
• System blocking +me
• Effec+ve processing bandwidth
• System response +me USER TASK CPU TIME
SCHEDULER CPU TIME
#ESCBOS
Parallel processing and VHDL
Truly parallel systems can be implemented in digital hardware.
Languages to describe and design such systems have specific language features to describe parallel logic.
VHDL uses a state-‐based model to describe parallel processing.
#ESCBOS
process() and parallel logic In VHDL, sec+ons of sequen+al logic that run in parallel with the rest of the system are defined using the process() structure: !counter: process (clk_i, cnt_clear) is begin if cnt_clear = '1' then cnt_reg <= 0; else if clk_i'event and clk_i = '1' then if cnt_ce = '1' then cnt_reg <= cnt_next; end if; end if; end if; end process counter; cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;
Register, sequen+al logic
Adder, combina+onal logic
#ESCBOS
Signals and sensi3vity lists The process() defini+on includes a list of signals:
process (clk_i, cnt_clear)
Logic in the process() is only “executed” when any signals declared on its sensi(vity list change state.
Any other logic in the circuit can alter the state of these signals, and when that happens, the process is executed.
The signals in VHDL have much more to them. They have a “transac+on +meline” and support future transac+ons to be scheduled on the signal.
#ESCBOS
Signals and sensi3vity lists (2)
VHDL sensi+vity lists:
• Simple state-‐based, event-‐driven paradigm
• Simulate parallel hardware logic
• Simulators use processing bandwidth efficiently
The paradigm is based on the delta cycle, a concept similar to an execu(on pass of the logic. All signals will be assigned their values only at the end of the current delta cycle.
#ESCBOS
Signals and sensi3vity lists (3)
The VHDL concepts of process() with sensi+vity lists and delta cycles can be implemented in a bare-‐metal firmware to achieve mul+tasking with low processing cost.
The benefits of these elements of mul+tasking are:
• Fast event-‐driven scheduling
• Structural integrity of the logic
• Scalability for mul+core systems
#ESCBOS
Bit-‐banding on Cortex-‐M
ARM Cortex-‐M cores have dedicated memory addressing hardware to implement atomic bit-‐access in memory without read-‐modify-‐write ar+facts.
• bit-‐signals can be used as efficient Inter Process Communica+on (IPC)
• Fastest atomic opera+ons in a Cortex-‐M (faster than STREX/LDREX)
• Map to a special area in RAM
#ESCBOS
Bit-‐banding on Cortex-‐M (2)
Programmers Model
ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-14ID070610 Non-Confidential
3.4 System address mapThe processor contains a bus matrix that arbitrates the processor core and optional Debug Access Port (DAP) memory accesses to both the external memory system and to the internal System Control Space (SCS) and debug components.
Priority is always given to the processor to ensure that any debug accesses are as non-intrusive as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug resources are completely non-intrusive.
Figure 3-1 shows the system address map.
Figure 3-1 System address map
Table 3-3 shows the processor interfaces that are addressed by the different memory map regions.
System
External device
External RAM
Peripheral
SRAM
Code
0xFFFFFFFF
Private peripheral bus - External0xE0100000
0xE0040000
0xA0000000
0x60000000
0x40000000
0x20000000
0x00000000
ROM Table
ETMTPIU
ReservedSCS
ReservedFPBDWTITM
External PPB0xE00420000xE00410000xE0040000
0xE000F0000xE000E0000xE00030000xE0002000
0xE00FF000
0x40000000Bit band region
Bit band alias32MB
1MB
31MB
0x40100000
0x42000000
0x44000000
0xE00010000xE0000000
Private peripheral bus - Internal
Bit band region
Bit band alias32MB
1MB
31MB
0x200000000x20100000
0x22000000
1.0GB
1.0GB
0.5GB
0.5GB
0.5GB
0xE0000000
0xE0100000
0xE0040000
0x24000000
Table 3-3 Memory regions
Memory Map Region
Code Instruction fetches are performed over the ICode bus. Data accesses are performed over the DCode bus.
SRAM Instruction fetches and data accesses are performed over the system bus.
SRAM bit-band Alias region. Data accesses are aliases. Instruction accesses are not aliases.
• Hardware remapping of accesses
• Known adresses for any Cortex-‐M
• Atomic writes on individual bits
• Simultaneous reads on all 32bits
source: ARM DDI 0439C, page 3-‐20
#ESCBOS
Bit-‐banding on Cortex-‐M (3) Bit-‐banding memory remap structure:
• Words (32bit) in the alias region map to individual bits in the normal SRAM memory
• The remapped writes are guaranteed atomic
Programmers Model
ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-20ID070610 Non-Confidential
• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C
= 0x22000000 + (0*32) + 7*4.
Figure 3-2 Bit-band mapping
3.7.1 Directly accessing an alias region
Writing to a word in the alias region has the same effect as a read-modify-write operation on the
targeted bit in the bit-band region.
Bit [0] of the value written to a word in the alias region determines the value written to the
targeted bit in the bit-band region. Writing a value with bit [0] set writes a 1 to the bit-band bit,
and writing a value with bit [0] cleared writes a 0 to the bit-band bit.
Bits [31:1] of the alias word have no effect on the bit-band bit. Writing 0x01 has the same effect
as writing 0xFF. Writing 0x00 has the same effect as writing 0x0E.
Reading a word in the alias region returns either 0x01 or 0x00. A value of 0x01 indicates that the
targeted bit in the bit-band region is set. A value of 0x00 indicates that the targeted bit is clear.
Bits [31:1] are zero.
3.7.2 Directly accessing a bit-band region
You can directly access the bit-band region with normal reads and writes to that region.
0x23FFFFE4
0x22000004
0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC
0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C
32MB alias region
0
7 0
07
0x200000000x200000010x200000020x20000003
6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1
07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1
0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF
1MB SRAM bit-band region
source: ARM DDI 0439C, page 3-‐20
#ESCBOS
Event-‐driven scheduling
Using the concepts from VHDL and the atomic Bit-‐banding from Cortex-M it is possible to:
• Implement event-‐driven mul+tasking
• Have process()-‐like handlers with light overhead • Implement state machine logic efficiently
• Use bit signals as efficient IPC
#ESCBOS
Event-‐driven scheduling (2) typedef uint32_t * PFLAGS_T; typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits PFLAGS_T pflags_base; // Ptr to the base of the word alias array } IPC_FLAGS_T; // for the ipc macros, pass a IPC_FLAGS_T struct #define get_bit(flags, bit) ((flags).pflags_base[(bit)]) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) #define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0) #define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) #define clr_bits(flags) (*((flags).pflags_bits) = 0) #define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask)) extern void init_ipc(void); extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
#ESCBOS
Event-‐driven scheduling (3) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) so:
set_bit(my_flags, 7); translates to:
myflags.pflags_base[7] = 1; where: IPC_FLAGS_T myflags; myflags.pflags_base = (PFLAGS_T) 0x22000000; myflags.pflags_bits = (PFLAGS_T) 0x20000000;
... 0x00000001
bit-‐band alias area
0x22000000
0x22000080
bit-‐band region 0x00000080 0x20000000
#ESCBOS
Event-‐driven scheduling (4) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) so:
if(event(my_flags, 7)) { ... }
translates to:
if(((myflags.pflags_base[7] = 0), 1)) after evaluation of the side effect, becomes:
if((1))
comma operator
side effect part result
#ESCBOS
Event-‐driven scheduling (5) enum keypad_bits_t { bit_keypad_value_update = 0, bit_keypressed_wait, bit_refresh_debounce_tmr, }; void process_keypad(void) { if(event_refresh_debounce_tmr()) { keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME; keypad_data.state = KEYPAD_DEBOUNCE; } ... } static void trigger_keypad_update(void *object) { keypad_data.latched = read_keypad_value(); set_bit_refresh_debounce_tmr(); }
#ESCBOS
Event-‐driven scheduling (6)
This event-‐driven architecture:
• Is simple to implement
• Scales well even with mul+core Cortex-‐M systems
• Improves processing granularity
• Can be implemented in hardware on ARM+FPGA systems
#ESCBOS
Hardware scheduling
The event-‐driven scheduling can be implemented directly in hardware on a ARM+FPGA system.
Instead of using a round-‐robin cycle in firmware, the underlying hardware can place a “call” to each process() according to its sensi+vity list.
This approach can reduce overhead to a few instruc+on cycles for a very responsive real+me system.
#ESCBOS
Mul3core Cortex-‐M devices
The event-‐driven paradigm can be effec+vely implemented in a mul+core Cortex-‐M system with common memory.
hRp://hothardware.com/newsimages/Item9563/cortex-‐m3-‐arm-‐cpu.png
BUX MATRIX
SHARED RAM SHARED FLASH
This approach simplifies system par++oning on the processor cores, and can decrease system response +me for event-‐driven bare-‐metal logic.
Even when no bit-‐banding is available in the shared memory, atomic events can be used.
#ESCBOS
Final Thoughts
The event-‐driven paradigm is a powerful and scalable architectural structure.
It is being used in bare-‐metal embedded systems with 300KLOC+.
If coupled with hardware scheduling support, it can be used to implement very fast event response systems that are very hard to implement with priority-‐based schedulers.