24
#ESCBOS #ESCBOS From Hw to Sw: Parallel Logic Applied to EventDriven Firmware Jonny Doin – GridVortex

ParallelLogicToEventDrivenFirmware_Doin

Embed Size (px)

Citation preview

Page 1: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS #ESCBOS

From  Hw  to  Sw:  Parallel  Logic  Applied  to  Event-­‐Driven  Firmware  Jonny  Doin  –  GridVortex  

Page 2: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

From  Hardware  to  Firmware •  Introduc+on  •  Mul+tasking:  the  holy  grail  of  compu+ng  •  Parallel  compu+ng  and  VHDL    •  process()  and  sequen+al  parallel  logic  •  Signals  and  Sensi+vity  lists  in  VHDL  •  Signals  and  Sensi+vity  lists  in  Firmware  •  Bit-­‐banding  on  Cortex-­‐M  •  Event-­‐driven  scheduling  •  Hardware  scheduling  and  Mul+core  µC  •  Final  thoughts  

Page 3: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Intro

In  this  talk  we  will  see:  

•  Architectural  aspects  of  mul+-­‐tasking  

•  Some  techniques  for  implemen+ng  event-­‐driven  firmware  

•  Concepts  of  Hardware  Design  that  can  be  applied  to  Firmware  development  

Page 4: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Mul3tasking

Mul+tasking  is  one  of  the  most  important  concepts  of  modern  compu+ng.  

Efficient  use  of  processing  bandwidth  affects  energy  and  real-­‐+me  response.  

Microcontrollers  with  over  200MIPS  are  becoming  very  accessible  to  even  the  smallest  applica+ons.  

hRps://s-­‐media-­‐cache-­‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg  

Page 5: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Mul3tasking  (2)

Mul+tasking  can  be  described  as  simula+on  of  a  parallel  processing  system  using  a  smaller  number  of  sequen+al  processors.  

Several  mul+tasking  schemes  evolved  over  +me  for  tradi+onal  compu+ng  systems:  

• Priority-­‐based  scheduling  and  mul+threading  • Collabora+ve  mul+tasking  •  Interrupt-­‐based  real  +me  systems  •  Event-­‐driven  mul+tasking  

Page 6: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Mul3tasking  (3)

Mul+tasking  schemes  are  a  compromise:  

• Cost  of  scheduling  

•  System  blocking  +me  

•  Effec+ve  processing  bandwidth  

•  System  response  +me  USER  TASK  CPU  TIME  

SCHEDULER  CPU  TIME  

Page 7: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Parallel  processing  and  VHDL

Truly  parallel  systems  can  be  implemented  in  digital  hardware.  

Languages  to  describe  and  design  such  systems  have  specific  language  features  to  describe  parallel  logic.  

VHDL  uses  a  state-­‐based  model  to  describe  parallel  processing.  

Page 8: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

process()  and  parallel  logic In  VHDL,  sec+ons  of  sequen+al  logic  that  run  in  parallel  with  the  rest  of  the  system  are  defined  using  the  process()  structure:  !counter: process (clk_i, cnt_clear) is begin if cnt_clear = '1' then cnt_reg <= 0; else if clk_i'event and clk_i = '1' then if cnt_ce = '1' then cnt_reg <= cnt_next; end if; end if; end if; end process counter; cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;

Register,  sequen+al  logic  

Adder,  combina+onal  logic  

Page 9: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Signals  and  sensi3vity  lists The  process()  defini+on  includes  a  list  of  signals:  

process (clk_i, cnt_clear)

Logic  in  the  process()  is  only  “executed”  when  any  signals  declared  on  its  sensi(vity  list  change  state.    

Any  other  logic  in  the  circuit  can  alter  the  state  of  these  signals,  and  when  that  happens,  the  process  is  executed.  

The  signals  in  VHDL  have  much  more  to  them.  They  have  a  “transac+on  +meline”  and  support  future  transac+ons  to  be  scheduled  on  the  signal.    

Page 10: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Signals  and  sensi3vity  lists  (2)

VHDL  sensi+vity  lists:  

•  Simple  state-­‐based,  event-­‐driven  paradigm  

•  Simulate  parallel  hardware  logic  

•  Simulators  use  processing  bandwidth  efficiently  

The  paradigm  is  based  on  the  delta  cycle,  a  concept  similar  to  an  execu(on  pass  of  the  logic.  All  signals  will  be  assigned  their  values  only  at  the  end  of  the  current  delta  cycle.    

Page 11: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Signals  and  sensi3vity  lists  (3)

The  VHDL  concepts  of  process()  with  sensi+vity  lists  and  delta  cycles  can  be  implemented  in  a  bare-­‐metal  firmware  to  achieve  mul+tasking  with  low  processing  cost.  

The  benefits  of  these  elements  of  mul+tasking  are:  

•  Fast  event-­‐driven  scheduling  

•  Structural  integrity  of  the  logic  

•  Scalability  for  mul+core  systems  

Page 12: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Bit-­‐banding  on  Cortex-­‐M

ARM  Cortex-­‐M  cores  have  dedicated  memory  addressing  hardware  to  implement  atomic  bit-­‐access  in  memory  without  read-­‐modify-­‐write  ar+facts.    

• bit-­‐signals  can  be  used  as  efficient  Inter  Process  Communica+on  (IPC)  

•  Fastest  atomic  opera+ons  in  a  Cortex-­‐M  (faster  than  STREX/LDREX)  

• Map  to  a  special  area  in  RAM  

Page 13: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Bit-­‐banding  on  Cortex-­‐M  (2)

Programmers Model

ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-14ID070610 Non-Confidential

3.4 System address mapThe processor contains a bus matrix that arbitrates the processor core and optional Debug Access Port (DAP) memory accesses to both the external memory system and to the internal System Control Space (SCS) and debug components.

Priority is always given to the processor to ensure that any debug accesses are as non-intrusive as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug resources are completely non-intrusive.

Figure 3-1 shows the system address map.

Figure 3-1 System address map

Table 3-3 shows the processor interfaces that are addressed by the different memory map regions.

System

External device

External RAM

Peripheral

SRAM

Code

0xFFFFFFFF

Private peripheral bus - External0xE0100000

0xE0040000

0xA0000000

0x60000000

0x40000000

0x20000000

0x00000000

ROM Table

ETMTPIU

ReservedSCS

ReservedFPBDWTITM

External PPB0xE00420000xE00410000xE0040000

0xE000F0000xE000E0000xE00030000xE0002000

0xE00FF000

0x40000000Bit band region

Bit band alias32MB

1MB

31MB

0x40100000

0x42000000

0x44000000

0xE00010000xE0000000

Private peripheral bus - Internal

Bit band region

Bit band alias32MB

1MB

31MB

0x200000000x20100000

0x22000000

1.0GB

1.0GB

0.5GB

0.5GB

0.5GB

0xE0000000

0xE0100000

0xE0040000

0x24000000

Table 3-3 Memory regions

Memory Map Region

Code Instruction fetches are performed over the ICode bus. Data accesses are performed over the DCode bus.

SRAM Instruction fetches and data accesses are performed over the system bus.

SRAM bit-band Alias region. Data accesses are aliases. Instruction accesses are not aliases.

•  Hardware  remapping  of  accesses  

•  Known  adresses  for  any  Cortex-­‐M  

•  Atomic  writes  on  individual  bits  

•  Simultaneous  reads  on  all  32bits  

source:  ARM  DDI  0439C,  page  3-­‐20  

Page 14: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Bit-­‐banding  on  Cortex-­‐M  (3) Bit-­‐banding  memory  remap  structure:  

•  Words  (32bit)  in  the  alias  region  map  to  individual  bits  in  the  normal  SRAM  memory  

•  The  remapped  writes  are  guaranteed  atomic  

Programmers Model

ARM DDI 0439C Copyright © 2009, 2010 ARM Limited. All rights reserved. 3-20ID070610 Non-Confidential

• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C

= 0x22000000 + (0*32) + 7*4.

Figure 3-2 Bit-band mapping

3.7.1 Directly accessing an alias region

Writing to a word in the alias region has the same effect as a read-modify-write operation on the

targeted bit in the bit-band region.

Bit [0] of the value written to a word in the alias region determines the value written to the

targeted bit in the bit-band region. Writing a value with bit [0] set writes a 1 to the bit-band bit,

and writing a value with bit [0] cleared writes a 0 to the bit-band bit.

Bits [31:1] of the alias word have no effect on the bit-band bit. Writing 0x01 has the same effect

as writing 0xFF. Writing 0x00 has the same effect as writing 0x0E.

Reading a word in the alias region returns either 0x01 or 0x00. A value of 0x01 indicates that the

targeted bit in the bit-band region is set. A value of 0x00 indicates that the targeted bit is clear.

Bits [31:1] are zero.

3.7.2 Directly accessing a bit-band region

You can directly access the bit-band region with normal reads and writes to that region.

0x23FFFFE4

0x22000004

0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC

0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C

32MB alias region

0

7 0

07

0x200000000x200000010x200000020x20000003

6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1

07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1

0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF

1MB SRAM bit-band region

source:  ARM  DDI  0439C,  page  3-­‐20  

Page 15: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling

Using  the  concepts  from  VHDL  and  the  atomic  Bit-­‐banding  from  Cortex-M  it  is  possible  to:  

•  Implement  event-­‐driven  mul+tasking  

•  Have  process()-­‐like  handlers  with  light  overhead  •  Implement  state  machine  logic  efficiently  

•  Use  bit  signals  as  efficient  IPC  

Page 16: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling  (2) typedef uint32_t * PFLAGS_T; typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits PFLAGS_T pflags_base; // Ptr to the base of the word alias array } IPC_FLAGS_T; // for the ipc macros, pass a IPC_FLAGS_T struct #define get_bit(flags, bit) ((flags).pflags_base[(bit)]) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) #define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0) #define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) #define clr_bits(flags) (*((flags).pflags_bits) = 0) #define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask)) extern void init_ipc(void); extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);

Page 17: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling  (3) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) so:

set_bit(my_flags, 7); translates to:

myflags.pflags_base[7] = 1; where: IPC_FLAGS_T myflags; myflags.pflags_base = (PFLAGS_T) 0x22000000; myflags.pflags_bits = (PFLAGS_T) 0x20000000;

...  0x00000001  

bit-­‐band  alias  area  

0x22000000  

0x22000080  

bit-­‐band  region  0x00000080  0x20000000  

Page 18: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling  (4) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) so:

if(event(my_flags, 7)) { ... }

translates to:

if(((myflags.pflags_base[7] = 0), 1)) after evaluation of the side effect, becomes:

if((1))

comma  operator  

side  effect  part   result  

Page 19: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling  (5) enum keypad_bits_t { bit_keypad_value_update = 0, bit_keypressed_wait, bit_refresh_debounce_tmr, }; void process_keypad(void) { if(event_refresh_debounce_tmr()) { keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME; keypad_data.state = KEYPAD_DEBOUNCE; } ... } static void trigger_keypad_update(void *object) { keypad_data.latched = read_keypad_value(); set_bit_refresh_debounce_tmr(); }

Page 20: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Event-­‐driven  scheduling  (6)

This  event-­‐driven  architecture:  

•  Is  simple  to  implement  

•  Scales  well  even  with  mul+core  Cortex-­‐M  systems  

•  Improves  processing  granularity  

•  Can  be  implemented  in  hardware  on  ARM+FPGA  systems  

Page 21: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Hardware  scheduling

The  event-­‐driven  scheduling  can  be  implemented  directly  in  hardware  on  a  ARM+FPGA  system.  

Instead  of  using  a  round-­‐robin  cycle  in  firmware,  the  underlying  hardware  can  place  a  “call”  to  each  process()  according  to  its  sensi+vity  list.  

This  approach  can  reduce  overhead  to  a  few  instruc+on  cycles  for  a  very  responsive  real+me  system.  

Page 22: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Mul3core  Cortex-­‐M  devices

The  event-­‐driven  paradigm  can  be  effec+vely  implemented  in  a  mul+core  Cortex-­‐M  system  with  common  memory.  

hRp://hothardware.com/newsimages/Item9563/cortex-­‐m3-­‐arm-­‐cpu.png  

BUX  MATRIX  

SHARED    RAM   SHARED  FLASH  

This  approach  simplifies  system  par++oning  on  the  processor  cores,  and  can  decrease  system  response  +me  for  event-­‐driven  bare-­‐metal  logic.  

Even  when  no  bit-­‐banding  is  available  in  the  shared  memory,  atomic  events  can  be  used.  

Page 23: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Final  Thoughts

The  event-­‐driven  paradigm  is  a  powerful  and  scalable  architectural  structure.  

It  is  being  used  in  bare-­‐metal  embedded  systems  with  300KLOC+.  

If  coupled  with  hardware  scheduling  support,  it  can  be  used  to  implement  very  fast  event  response  systems  that  are  very  hard  to  implement  with  priority-­‐based  schedulers.  

Page 24: ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS

Thank  you  

Jonny  Doin  [email protected]