26
#ESCBOS #ESCBOS Integra(ng Safety in Silicon: Failsafe cells for IoT Designs Jonny Doin – GridVortex

SiliconFailsafeForIoT_Doin

Embed Size (px)

Citation preview

Page 1: SiliconFailsafeForIoT_Doin

#ESCBOS #ESCBOS

Integra(ng  Safety  in  Silicon:  Failsafe  cells  for  IoT  Designs  Jonny  Doin  –  GridVortex  

Page 2: SiliconFailsafeForIoT_Doin

#ESCBOS

h"p://www.funfix.com/Gallery/Images/lg_Rock-­‐Climbing-­‐in-­‐Talkeetna-­‐Alaska.jpg  

Agenda •  Safety:  What  is  Safety?  •  Failure:  What  consJtutes  Failure?  •  Failsafe  /  Failsafe  cell  Design  •  LT  Spice  as  a  system  modeling  tool  •  Modeling  the  Firmware/Hardware  interfaces  •  SimulaJng  SoPware  failure  at  the  interface  •  Circuit  behavior  under  failure  scenarios  •  Final  thoughts  

Page 3: SiliconFailsafeForIoT_Doin

#ESCBOS

Safety:  What  is  Safety?

A  Safe  System  is  one  that  exhibits:  

• DeterminisJc  responses  

Ø  Controlled  Behaviors  for  all  inputs  

Ø Never  place  its  outputs  in  a  hazardous  state  

h"p://large.stanford.edu/publicaJons/coal/references/hvistendahl/images/f1big.jpg  

Page 4: SiliconFailsafeForIoT_Doin

#ESCBOS

Safety:  What  is  Safety?  (2)

REALITY: !ALL SYSTEMS !

WILL FAIL!

h"p://stat.ks.kidsklik.com/staJcs/files/2012/10/13496768121110667387.jpg  

Page 5: SiliconFailsafeForIoT_Doin

#ESCBOS

Safety:  What  is  Safety?  (3)

In  the  real  world,  systems  are  always  connected  to  other  systems.    Hazardous  output  states  must  be  qualified  from  the  downstream  (external)  systems  point  of  view.     h"ps://www.engineerjobs.co.uk/images/industry-­‐sectors/img_60_instrumentaJon.jpg  

Page 6: SiliconFailsafeForIoT_Doin

#ESCBOS

Failure

Failure  is  a  malfuncJon  on  the  system,  or  a  deviaJon  on  designed  behavior.  

On  any  system,  such  a  deviaJon  on  the  chain  of  processing  can  lead  to  system  failure.  

h"p://photos1.blogger.com/blogger/4548/1285/1600/Matrix%20System%20Failure.jpg  

Page 7: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  Design

Failsafe  design  can  be  “costly”  in  system  resources.    

For  example,  achieving  funcJonal  safety  in  Microcontollers  may  require  fully  redundant  processors,  running  in  lockstep  mode.      

 h"p://img.deusm.com/designnews/2011/09/233762/114610_803509.jpg  

Example:  Cortex-­‐R4  in  Lockstep  

Page 8: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  Design  (2)

One  example  where  cost  is  paramount  is  IoT  chips,  designed  in  mature  processes  (e.g.  180nm)  with  mixed-­‐signal  circuitry.  

These  designs  usually  have  small,  low-­‐cost  processor  cores,  such  as  an  ARM  Cortex-­‐M0.  

A  hybrid  failsafe  approach  can  be  beneficial  on  many  of  those  IoT  cases.  

ARM  Cortex-­‐M0  

Controlled  Subsystem  (actuators,  power)  

GPIOs"

Page 9: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  Design  (3)

Designs  can  handle  system  failures  at  the  criJcal  interfaces,  by  idenJfying  signal  state  failure  and  insuring  a  known  system  state.    This  design  pa"ern  is  recursive,  i.e.,  can  be  applied  to  subsystems  down  to  the  smaller  modules,  to  ensure  that  the  whole  system  fails  in  a  safe  mode.  

Complex  Control  System  

Controlled  Subsystem  (actuators,  motors)  

Cri(cal  Interface  

Page 10: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  cell  design

The  design  case  we’ll  look  into  is  a  hybrid  IoT  applicaJon  chip,  with  an  integrated  Cortex-­‐M0.    

The  design  goals  are:  

• Firmware  failure  detecJon  

• Safe  reboot  of  the  CPU  • Safe  drive  logic  for  no  loss  of  control  

 

ARM  Cortex-­‐M0  

Controlled  Subsystem  (actuators,  power)  

GPIOs"

Failsafe  Logic  CONTROL I/Os"

Page 11: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  cell  design  (2)

Failsafe  cells  use  dynamic  signals  as  control  commands,  or  use  encoded  states.    

Signals  that  are  “frozen”  at  ‘0’  or  ‘1’,  or  illegal  states,  indicate  a  failed  soPware  control  funcJon.  

The  failsafe  logic  takes  over  and  guarantees  failsafe  behavior.  

ARM  Cortex-­‐M0  

Controlled  Subsystem  (actuators,  power)  

GPIOs"

Failsafe  Logic  CONTROL I/Os"

Page 12: SiliconFailsafeForIoT_Doin

#ESCBOS

Failsafe  cell  design  (3) The  failsafe  cell  can  be  a  digital  funcJon  that  validates  the  control  states,  or  a  detector  for  the  invalid  steady  state  control  signals.    

Failsafe  circuitry  contain  hardwired  logic  that  takes  control  and  guarantees  behavior  like  basic  control  loop  and  failsafe  responses.  

Page 13: SiliconFailsafeForIoT_Doin

#ESCBOS

LTSpice  as  a  System  tool

LT  Spice  is  a  fast  and  accurate  circuit  simulaJon  tool.  

Used  as  a  circuit  simulator,  LT  Spice  can  predict  actual  behavior  with  high  precision.  

Modelling  interacJon  of  Firmware  and  Analog  hardware  in  the  design  stage  is  a  powerful  capability.  

130.5ms 132.0ms 133.5ms 135.0ms 136.5msV5942.1V8942.1V1052.1V4052.1V7052.1V0152.1V3152.1V6152.1V9152.1V2252.1V5252.1V68052.1V88052.1V09052.1V29052.1V49052.1V69052.1V89052.1V00152.1V20152.1V40152.1V0.0V1.0V2.0V3.0V4.0V5.0V6.0V7.0V8.0V9.0V0.1

V(adc_val) V(adc_in)

V(vip)

V(isr_block)

Page 14: SiliconFailsafeForIoT_Doin

#ESCBOS

LTSpice  as  a  System  tool  (2)

LT  Spice  allows  modeling  mixed-­‐signal  systems,  including  Firmware  behavior  interacJon  with  Analog  hardware:  

•  Behavioral  sources  (B)  •  Digital  Gate  primiJves  (Axxx)  

•  Hierarchical  subcircuits  •  Waveform  and  data  file  generators  

Page 15: SiliconFailsafeForIoT_Doin

#ESCBOS

Modelling  system  interfaces

Designing  the  Fw/Hw  interface  as  a  failsafe  node  has  a  number  of  advantages:  

•  ImplementaJon  Decoupling  of  Firmware  and  Hardware  

•  Addresses  CPU  failure  •  Lower  cost  of  implementaJon  

Page 16: SiliconFailsafeForIoT_Doin

#ESCBOS

Modelling  system  interfaces  (2)

Some  examples  of  System  interfaces  for  failsafe  funcJons  on  control  circuitry  and  Firmware  /  Hw  interface:    

•  Failsafe  “Passive”  drivers  

•  AC  coupled  commands  

•  Failsafe  “ON”  actuators  

Page 17: SiliconFailsafeForIoT_Doin

#ESCBOS

Example:  Failsafe  “passive”

Output  analog  drivers  can  be  designed  to  fail  in  high-­‐impedance  mode  

Page 18: SiliconFailsafeForIoT_Doin

#ESCBOS

Example:  Failsafe  “passive”  (2)

The  2  analog  outputs  are  buffered  with  failsafe  drivers  that  go  high  impedance  when  VCC  is  lost  

Page 19: SiliconFailsafeForIoT_Doin

#ESCBOS

Example:  Failsafe  “passive”  (3)

Each  output  is  buffered  and  isolated  with  2  transistors.  

When  VCC  fails,  the  transistors  cut  off,  with  very  high  impedance.  

A  68K  resistor  is  seen  by  the  output  current  source  and  will  drive  the  output  voltage  to  6.8V,  bringing  the  output  to  100%.  

This  failsafe  guarantees  the  downstream  system  is  ON,  even  on  loss  of  control.  

Page 20: SiliconFailsafeForIoT_Doin

#ESCBOS

Example:  AC-­‐coupled  cmds

On  a  firmware  failure,  toggling  signals  will  stop  at  VCC  or  GND.  AC-­‐coupled  commands  can  detect  such  firmware  failures.  

Page 21: SiliconFailsafeForIoT_Doin

#ESCBOS

Example:  Failsafe  “ON”

A  firmware  failure  will  keep  the  actuator  ON.  The  firmware  commands  are  designed  to  turn  it  OFF.    

Page 22: SiliconFailsafeForIoT_Doin

#ESCBOS

Firmware  control  Loop:  Servo  DAC

PWM  value  is  set  to  50%  when  the  error  is  Zero.  PosiJve  errors  make  the  PWM  duty  cycle  to  be  >  50%,  driving  the  net  integrated  voltage  “down”.  NegaJve  errors  set  <  50%  duty  cycles,  driving  the  net  integrated  voltage  “up”.  Delays  in  the  Firmware  control  loop  can  adversely  affect  the  output  correcJon.  We  can  simulate  the  effects  of  interrupts  causing  long  control  loop  latencies.  

Page 23: SiliconFailsafeForIoT_Doin

#ESCBOS

Detail:  firmware  interference

•  For  comparison,  we  removed  the  PWM  from  the  control  loop:  direct  interrupt-­‐driven  GPIO  mode  instead  of  Servo  PWM  mode  

•  SimulaJng  perturbaJon  by  Interrupts  blocking  Jme  delaying  GPIO  control  loop  

• Any  firmware  latency  directly  affects  the  output  stability  

• Hard  realJme  requirements  for  direct  GPIO  control  loop  

130.5ms 132.0ms 133.5ms 135.0ms 136.5msV5942.1V8942.1V1052.1V4052.1V7052.1V0152.1V3152.1V6152.1V9152.1V2252.1V5252.1V68052.1V88052.1V09052.1V29052.1V49052.1V69052.1V89052.1V00152.1V20152.1V40152.1V0.0V1.0V2.0V3.0V4.0V5.0V6.0V7.0V8.0V9.0V0.1

V(adc_val) V(adc_in)

V(vip)

V(isr_block)

Page 24: SiliconFailsafeForIoT_Doin

#ESCBOS

Detail:  firmware  interference  (2)

• Control  loop  via  PWM  as  Servo  drive  

•  Same  delays  caused  by  Interrupts  blocking  Jme,  delaying  PWM  error  update  

• PWM  servo  maintains  DC  voltage,  with  minor  error  deviaJons  

•  SoP  realJme  requirements  for  PWM  Servo  control  loop  

• Accept  soP  Jming  failures  from  Firmware  operaJon  

Page 25: SiliconFailsafeForIoT_Doin

#ESCBOS

Final  thoughts

• Failsafe  design  is  an  essenJal  part  of  Embedded  Systems  

• On  ultralow  cost  IoT  systems,  funcJonal  safety  can  be  hard  to  achieve  

• Failsafe  cells  operate  at  the  interfaces  of  the  control  chain  

• ImplementaJon  cost  is  very  a"racJve,  enabling  use  of  low-­‐end  processors  

• Simple  Mixed-­‐Signal  techniques  can  be  used  in  failsafe  cells  

Page 26: SiliconFailsafeForIoT_Doin

#ESCBOS

Thank  you  

Jonny  Doin  [email protected]