15
1 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13 THE HSA SYSTEM ARCHITECTURE REQUIREMENTS – AN OVERVIEW PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

Embed Size (px)

DESCRIPTION

Presentation HC-4015 by Paul Blinzer at the AMD Developer Summit (APU13) November 11-13, 2013.

Citation preview

Page 1: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

1   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  HSA  SYSTEM  ARCHITECTURE  REQUIREMENTS  –  AN  OVERVIEW  PAUL  BLINZER,  FELLOW,  HSA  SYSTEM  SOFTWARE,  AMD  SYSTEM  ARCHITECTURE  WORKGROUP  CHAIR,  HSA  FOUNDATION  

Page 2: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

2   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

AGENDA  

!  What  is  the  HSA  FoundaKon?  

!  The  System  Architecture  Workgroup  and  its  goals  

!  What  defines  HSA  plaVorms  and  components?  

!  The  Shared  Virtual  Memory  requirements  

!  The  HSA  Memory  Model  Requirements  

!  The  HSA  Queuing  Architecture  

!  Some  other  requirements  set  by  the  System  Architecture  specificaKon  

!  Where  to  find  further  informaKon  

!  Q  &  A  

Page 3: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

3   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

WHAT  IS  THE  HSA  FOUNDATION?  

!  The  HSA  FoundaKon  is  a  not-­‐for-­‐profit  consorKum  of  SOC  and  SOC  IP  vendors,  OEMs,  academia,  OSVs  and  ISVs  defining  a  consistent  heterogeneous  plaVorm  architecture  to  make  it  dramaKcally  easier  to  program  heterogeneous  parallel  devices  !  It  spans  mulKple  host  plaVorm  architectures  and  programmable  data  parallel  components  (e.g.  CPU:  x86,  ARM,  MIPS,  …  device  types:  GPUs,  DSPs,  …)  to  work  collaboraKvely  within  the  same  HSA  system  architecture  

!  It  defines  a  set  of  specificaKons  that  define  HW  &  SW  plaVorm  requirements  to  enable  applicaKons  to  target  the  feature  set  from  high  level  languages  and  APIs  

!  It’s  not  a  replacement  to  e.g.  OpenCL  but  complementary  to  it,  defining  the  system  level  properKes  “below  the  API”,  leveraged  by  applicaKon-­‐  and  system  soiware  

!  The  System  Architecture  specificaKon  defines  the  required  component  and  plaVorm  features  for  HSA  compliant  components  

!  This  presentaKon  is  an  overview  of  the  current  System  Architecture  definiKons  and  does  not  represent  a  complete  or  “final”  state  !  that  one  is  the  specificaKon  itself  when  available  ☺  

"  This  is  the  short  version…  

Platform(Software)System

ArchitectureSpecification

Programmer’sReferenceManual

SystemRuntime

Specification

ConformanceTools

Page 4: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

4   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  SYSTEM  ARCHITECTURE  WORKGROUP  OF  THE  HSA  FOUNDATION  

"  Who  ParKcipates  and  what  are  the  goals?    "  The  workgroup  membership  spans  a  wide  variety  of  IP  and  plaVorm  architecture  owners  

‒  Several  host  plaVorm  architectures  are  targeted    

"  The  specificaKons  define  a  common  set  of  plaVorm  properKes  that  provide  a  dependable  hardware  and  system  foundaKon  for  applicaKon  soiware,  libraries  and  runKmes  

"  The  goal  is  to  eliminate  “weak  points”  in  the  system  soiware-­‐  and  hardware  architecture  of  tradiKonal  plaVorms  that  lead  to  unnecessary  overhead  in  the  operaKons  of  data  parallel  workloads  

"  The  main  deliverables  are:  ‒ Well-­‐defined,    consistent  and  dependable  memory  model  all  HSA  agents  operate  in  ‒  Share  access  to  process  virtual  memory  between  HSA  agents  (“ptr-­‐is-­‐ptr”)  ‒  Low-­‐latency  workload  dispatch  contained  in  user-­‐mode  queues  ‒  Scalability  across  a  wide  range  of  plaVorms  ‒ These  properKes  are  leveraged  in  the  “HSA  Programmer’s  Reference”,  HSAIL  and  HSA  RunKme  specificaKons  

Page 5: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

5   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?  

"  In  short,  an  HSA  compaKble  plaVorm  consists  of  “HSA  agents”  (hardware  components  that  parKcipate  in  the  HSA  memory  model)  adhering  to  the  various  system  architecture  requirements  

"  Each  HSA  agent  adheres  to  the  same  queuing  &  dispatch  mechanics,  low-­‐latency  synchronizaKon  primiKves,  memory  coherence  and  data  visibility  (memory  model)  requirements  ‒  Defined  mainly  in  the  “(Soiware)  System  Architecture”  specificaKon  

‒  The  HSAIL  and  “Programmer’s  Reference  Manual”  specificaKons  define  the  soiware  execuKon  model  

‒  Architected  mechanisms  to  enqueue  and  dispatch  workloads  from  one  HSA  agent  queue  to  another  eliminate  the  need  to  use  the  host  CPU  for  these  purposes  for  a  lot  of  scenarios  

‒  Architected  infrastructure  allows  exchanging  data  with  non-­‐HSA  compliant  components  in  a  plaVorm  

‒  Fundamental  data  types  are  naturally  aligned  

Page 6: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

6   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?  

Proper&es   Small  Machine  Model   Large  Machine  Model  PlaVorm  targets   embedded  or  personal  device  space  (controllers,  

smartphones,  etc.)  PC,  workstaKon,  cloud  Server,  etc  running  more  demanding  workloads  

NaKve  pointer  size   32bit   64bit  (+  32bit  ptr  if  32bit  processes  are  supported)  

FloaKng  point  size   Half  (FP16*),  Single  (FP32)  precision     Half  (FP16*),  Single  (FP32),  Double  (FP64)  precision  

Atomic  ops  size   32bit   32bit,  64bit  

‒ There  are  two  different  machine  models  (“small”  and  “large”)  that  target  different  funcKonality  levels  ‒  It  takes  into  account  different  feature  requirements  for  different  plaVorm  environments    ‒  In  all  cases,  the  same  HSA  applicaKon  programming  model  is  used  to  target  HSA  agents  and  provides  the  same  power–efficient  and  low-­‐latency    dispatch  mechanisms,  synchronizaKon  primiKves  and  SW  programming  model  

‒ ApplicaKons  wriren  to  target  HSA  small  model  machines  will  generally  work  on  large  model  machines,  too  ‒  If  the  large  model  plaVorm  and  host  OperaKng  System  provides  a  32bit  process  environment  

*min.  Load  and  store  on  memory  

Page 7: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

7   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(1)  

"  Each  HSA  agent  adheres  to  the  same  user  process  address  space  view  as  the  host  CPU  ‒  HSA  operates  in  a  “flat”  virtual  address  space,  using  64bit  &  32bit  ptrs  depending  on  applicaKon/machine  model  

‒  A  pointer  value  references  the  same  memory  for  every  HSA  agent  ‒  An  HSA  agent  can  “walk”  or  update  linked  data  structures  directly  without  any  assistance  from  a  host  CPU  

"  The  process  address  view  is  established  by  the  hardware’s  page  table  mappings  ‒  HSA  agent  virtual  address  range  matches  the  host  plaVorm  (e.g.  48bit,  32bit,  …)  ‒  HSA  agents  always  operate  at  “user  privilege”  of  the  host  CPU,  policy  enforced  by  system  ‒  HSA  agents  observe  the  same  memory  page  table  arributes  (cache,  read,  write,  …)  and  page  sizes  of  the  host  CPU,  policy  enforced  by  system  

"  HSA  agents  support  page  faults,  allowing  to  directly  operate  on  pageable  memory  as  provided  by  the  OperaKng  System  environment  ‒  For  allocated  pageable  memory,  System  Soiware  takes  page  faults,  commits  memory,  loads  contents  from  backup  store  and  restarts  execuKon  like  it  does  for  any  access  from  host  CPU  threads  

‒  There  is  no  tedious  device  buffer  copy,  explicit  page  lock  or  similar  needed  to  access  data  in  allocated  memory  by  an  HSA  agent  directly!  

‒ The  Basis  of  “ptr-­‐is-­‐ptr”  

Page 8: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

8   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(2)  

"  On  AMD  processor-­‐based  pla9orms,  the  IOMMUv2  device  provides  the  HSAMMU  translaKon  services  via  standard  PCI  Express™  ATS/PRI  protocols  to  HSA  compliant  hardware  when  accessing  memory  from  the  HSA  agent  

‒  IOMMUv2  integraKon  into  OS  memory  manager  provides  the  low-­‐level  infrastructure  (e.g.  in  Linux®  kernel)  

‒  Different  host  plaVorm  architectures  may  use  different  detail  mechanisms  here  

"  The  HSAMMU  funcKonality  is  provided  in  addiKon  to  IOMMU  funcKonality  used  in  device  virtualizaKon  ‒  separate  translaKon  levels  are  used  (see  block  diagram)  

"  ImplementaKon  of  shared  virtual  address  space  by  other  vendors  on  other  host  plaVorms  may  be  different  ‒  As  long  as  it  follows  the  HSA  Sysarch  requirements,  it  is  ok    

‒  The  implementaKon  detail  is  not  relevant  to  the  applicaKon  and  dealt  within  the  system  soiware  (e.g.  OS)  

"  The  basis  of  “ptr-­‐is-­‐ptr”  

Com

mand

Buffer

Event Log

HSA MMU(IOMMUv2 device)

I/O page tables

DeviceTable

Device Tablebase

register

System memory

Command Buffer

base registerEvent Log

base register

InterruptR

emappingTable

Guest & host

translation

Hosttranslation

Page S

ervice R

equest LogPage Req

Logbase register

Event Counter registers

Perf Counters &RAS Info (opt.)

Peripheral Page Requests

(PPR) Service

HSA MMUTranslation Tables

(per Process, PASID)

HSA MMU Data structures

Page 9: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

9   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  HSA  MEMORY  MODEL  REQUIREMENTS  

"  A  memory  model  defines  how  writes  by  one  work  item  or  agent  becomes  visible  to  other  work  items  and  agents,  rules  that  need  to  be  adhered  to  by  compilers  and  applicaKon  threads  ‒  It  defines  visibility  and  ordering  rules  of  write  and  read  events  across  work  items,  HSA  agents  and  interacKons  with  non-­‐HSA  components  in  the  system  

‒  Important  to  define  scope  for  performance  opKmizaKons  in  the  compiler,  to  allow  reordering  of  code  in  the  Finalizer  

"  At  its  base,  the  HSA  memory  model  is  based  on  a  “relaxed”  load  acquire/store  release  model  ‒  Inherently  maps  to  many  CPU  and  device  architectures  very  easily  ‒  Efficient  sequenKal  consistency  mechanisms  supported  to  fit  high-­‐level  language  programming  models  

"  A  consistent,  full  set  of  atomic  operaKons  is  available  ‒  Naturally  aligned  on  size,  small  machine  model  supports  32bit,  large  machine  model  supports  32bit  and  64bit  

"  Cache  Coherency  between  HSA  agents  (&  host  CPU)  is  maintained  by  default  ‒  key  feature  of  the  HSA  system  &  plaVorm  environment  

 

"  What  are    Its  key  properKes?  

Page 10: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

10   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(1)  

"  The  queue  dispatch  occurs  through  architected  queue  packets  (“Architected  Queuing  Language”,  AQL  )  that  references  the  work  items  &  parameters  ‒  Dispatch  to  HW  occurs  directly  in  user  mode,  eliminaKng  a  notable  source  of  latency  overhead  in  tradiKonal  architectures!  

‒  Two  architected  packet  types  exist  at  the  moment,  dispatch  and  barrier  packets  

‒  Each  queue  is  defined  by  several  architected  parameters  (type,  base  address,  size,  read  index,  write  index,  …)  that  allow  targeKng  the  queue  from  other  HSA  agents  and  the  host  CPU  

‒  The  design  allows  an  HSA  agent  on  the  plaVorm  to  build  &  dispatch  jobs  to  a  queue  using  HSA  architected  interfaces  

"  ApplicaKons  and  runKme  can  build  different  queuing  models  on  top  of  the  infrastructure  ‒  Single-­‐producer,  MulK-­‐producer  queuing  models,  lock-­‐free  dispatch,  …  are  all  opKons  SW  can  implement  on  top  of  the  system  architecture’s  queue  definiKon  to  fit  the  use  model  

"  The  basis  of  the  workload  dispatch  on  HSA    

Page 11: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

11   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(2)  

"  The  HSA  System  Architecture  defines  a  user  mode  queue  based  dispatch  mechanism    ‒  Each  queue  is  only  valid  within  that  process  context  and  represents  a  virtual  enKty  that  is  scheduled  to  hardware  

‒  The  job  execuKon  occurs  at  “user  privilege”  like  the  rest  of  the  applicaKon  code,  enforced  by  system  architecture  

"  Each  HSA  agent  allows  for  mulKple  queues  per  applicaKon  process  ‒  HSA  defines  in-­‐order  dispatch  semanKcs  of  work  items  within  queues  for  efficient  HW  implementaKon  

‒  HW  may  execute  dispatch  packets  “out-­‐of-­‐order”,  if  no  dependencies  exist  and  in-­‐order  semanKcs  are  followed  externally  

‒  “Out  of  order”  execuKon  applies  between  queues,  with  explicit,  memory  based  synchronizaKon  mechanisms  between  them  as  needed  

"  It  is  “cheap”  to  create  queues  in  HSA,  so  applicaKons  can  have  one  queue  per  HSA  agent  for  each  applicaKon  thread,  or  leveraging  mulKple  HSA  user  queues  per  thread  if  needed  ‒  This  gives  applicaKons  a  lot  of  flexibility  to  structure  the  queue  layout  to  match  the  problem  instead  of  trying  to  fit  the  problem  to  work  with  one  or  a  few  queues  only  

"  The  basis  of  the  workload  dispatch  on  HSA    

Page 12: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

12   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE    

"  HSA  Memory  based  signaling  and  synchronizaKon  primiKves  ‒  Defines  memory  based  semanKcs  to  synchronize  with  work  items  processed  by  HSA  agents    

‒  e.g.  32bit  or  64bit  value,  content  update,  wait  on  value  by  HSA  agents  and  AQL  packets  

‒  Hardware-­‐assisted,  power-­‐efficient  &  low-­‐latency  way  to  synchronize  execuKon  of  work  items  between  threads  

‒  Allows  one-­‐to-­‐one  and  one-­‐to-­‐many  signaling  

‒  The  signaling  semanKcs  follow  atomicity  requirements  defined  in  the  memory  model    

‒  RunKme  &  applicaKon  SW  can  use  infrastructure  to  build  mutexes,  semaphores,  other  synchronizaKon    primiKves  

"  HSA  Cache  Coherency  Domains  ‒  Defines  the  scope  of  HSA  cache  coherency  and  relate  to  other  non-­‐HSA  system  resource  operaKons  

‒  Associated  with  the  memory  model  requirements  

‒  Architected  way  to  interact  with  non-­‐HSA  plaVorm  infrastructure  (e.g.  graphics)  

"  Miscellaneous  menKon,  but  nevertheless  important  to  make  it  work  well…  

Page 13: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

13   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE    

"  HSA  system  Kmestamp  requirements    ‒  Defines  a  low-­‐overhead  mechanism  to  “determine  the  passing  of  Kme”  on  an  HSA  plaVorm  

‒  Represented  by  a  64bit  Kmestamp  value  that  does  not  roll  over  and  is  incremented  at  a  constant  rate  in  HW  

‒  The  value  can  be  queried  by  HSAIL  or  HSA  runKme  

‒  ApplicaKons  and  tools  are  able  to  build  a  consistent  Kmeline  across  all  HSA  agents    

"  HSA  Topology  requirements  ‒  Defines  system  topology  and  properKes  of  HSA  agents  discoverable  on  an  HSA  plaVorm  by  an  applicaKon  to  take  advantage  of  plaVorm  properKes  

‒  Examples  are  #of  compute  units,  max.  work  item  dimensions,  work  group  size,  work  item  size,  queue  properKes,  …  

‒  API’s  like  OpenCL™  and  others  can  leverage  HSA  system  topology  data  to  discover  memory  layout,  compute  unit  properKes  and  other  properKes  and  consistently  report  the  system  topology  for  applicaKons  to  leverage  

"  Miscellaneous  menKon,  but  nevertheless    important  HSA Platform - Simple

System Memory

HSA APU

GPU

H-CU

H-CUH-CU

H-CUMem HSA MMU

CPUcore

corecore

core

HSA Platform

System Memory

HSA APU

GPU

H-CU

H-CU

H-CU

CPU

Mem HSA MMU

HSA GPU

GPU

H-CU

H-CU

H-CU

Device Local Memory

IOBUS

Mem

Firmware

Add-In GPU (optional)

core

corecore

core

System Firmware

Page 14: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

14   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

WHERE  TO  FIND  FURTHER  INFORMATION  ON  SYSTEM  ARCHITECTURE?  

"  HSA  FoundaKon  Website:  hrp://www.hsafoundaKon.com  ‒  The  main  locaKon  for  specs,  developer  info,  tools,  publicaKons  and  many  things  more  

‒  HSA  Programmer’s  Reference  Manual  v  0.95  has  been  published  

‒  HSA  PlaVorm  Soiware  Systems  Architecture  SpecificaKon  is  quickly  nearing  the  0.95  state    

‒  Will  be  published  aier  raKficaKon  by  the  HSA  FoundaKon  Board  of  Directors  

‒  Stay  Tuned  

Page 15: HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

15   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    

ANY  QUESTIONS?  "  Of  course  there  are,  so  go  ahead  ☺