Upload
jay-wang
View
744
Download
4
Embed Size (px)
DESCRIPTION
2014 HSA Forum, ITRI, Taiwan.
Citation preview
ITRI
Industrial Technology
Research Institute
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院
HSA System Architecture Overview
王振傑 (Jay Wang)
嵌入式系統與晶片技術組 -系統架構設計部 (D200)
資訊與通訊研究所 (ICL)
2014-10-31
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Platform Model
2
In HSA system, a regular device is called an HSA agent, and if the HSA
agent can run kernels then it is also an HSA component.
Serial and Task
Parallel Workloads
Data Parallel
Workloads
SIMD
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Three Eras of Processor Performance
3
?
Sin
gle
-th
rea
d
Pe
rfo
rma
nce
Time
we are
here
Enabled by:
Moore’s Observation
Voltage Scaling
Micro-Architecture
Constrained by:
Power
Complexity
Single-Core Era
Mo
de
rn A
pp
lica
tio
n
Pe
rfo
rma
nce
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by:
Moore’s Observation
Abundant data parallelism
Power efficient data parallel
processing (GPUs)
Constrained by:
Programming models
Communication overheads T
hro
ug
hp
ut
Pe
rfo
rma
nce
Time (# of processors)
we are
here
Enabled by:
Moore’s Observation
Desire for Throughput
20 years of SMP arch
Constrained by:
Power
Parallel SW availability
Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB … Shader CUDA OpenCL
C++ and Java
SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Intermediate Language (HSAIL)
4
The HSA Foundation members are building a heterogeneous compute software ecosystem
built on open, royalty-free industry standards and open-source software: the HSA
runtimes and compilation tools are based on open-source technologies such as LLVM and
GCC. ( https://github.com/HSAFoundation )
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Memory Consistency Model
(Relaxed Model)
Second Operation
ld_rlx
st_rlx atomic_rlx
atomicNoRet_rlx atomic_acq
atomicNoRet_acq fence_acq
atomic_rel
atomicNoRet_rel
fence_rel
atomic_ar
atomicNoRet_ar
fence_ar
First
Operation
ld_rlx or st_rlx yes yes yes yes no no
atomic_rlx
atomicNoRet_rlx yes yes yes no no no
atomic_acq
atomicNoRet_acq
fence_acq no no no no no no
atomic_rel
atomicNoRet_rel yes yes no no no no
fence_rel yes no no no no no
atomic_ar
atomicNoRet_ar
fence_ar no no no no no no
8
relaxed ;
…..
acquire ;
…..
release ;
…..
acq_rel ;
…..
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
9
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Legacy GPU Compute
Multiple memory pools and address spaces
Data copies before/after GPU compute
10
System Memory GPU Memory
1
23
Host CPUs GPU
Virtual Memory #1 Virtual Memory #2
(HSA Agent)(HSA Agent)
(HSA Component)
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Host CPUs GPU
@ 2014 JAY WANG
(HSA Agent)(HSA Agent)
(HSA Component)
Shared Virtual Memory
System Memory GPU Memory
Shared Virtual Memory (HSA)
11
32-bit HSA System
(32 bits VA)
64-bit HSA System
(≥ 48 bits VA)
MMU
OS Page Table
MMU
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Memory Hierarchy
12
1) Global
2) Group
3) Private
4) Kernarg
5) Readonly
6) Spill
7) Arg Virtual Address Range Reservation (System Memory or Device Local Memory)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Cache Coherency Domains
13
System Memory
Cache
Cache
Cache
Coherency
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
14
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Signaling and Synchronization
The required mechanisms for HSAIL and the HSA runtime are:
Allocate/Destroy an HSA signal
Read the current HSA signal value
Wait on an HSA signal to meet a specified condition (with a maximum wait duration
requested)
Send an HSA signal value
Atomic read-modify-write an HSA signal value
15
Signal Handle(hsa_signal_t)
Signal Value(hsa_signal_value_t)
HSA Agent
HSA
ComponentHost CPU
HSA Agent
HSA Runtime
APIs
HSAIL
Instructions
Implementation-
defined data
Sig32 or Sig64
© 2014 JAY WANG
sem_init()
sem_wait()
sem_post()
sem_destroy()
pthread_mutex_init()
pthread_mutex_lock()
pthread_mutex_unlock()
pthread_mutex_destroy()
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Runtime APIs for Signaling
16
HSA Runtime APIs ( for HSA application )
• hsa_signal_create ( )
• hsa_signal_destroy ( )
• hsa_signal_load_{acquire, relaxed} ( )
• hsa_signal_store_{relaxed, release} ( )
• hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_add_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_and_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_or_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_wait__{acquire, relaxed} ( )
HSA Runtime Programmer’s Reference Manual (v1.00)
2.4 Signals
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSAIL Instructions for Signaling
17
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model,
Compiler Writer’s Guide, and Object Format (BRIG) (Version 1.0 Provisional)
6.8 Notification (signal) Operation
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Atomic Memory Operations
HSA requires the following standard atomic memory operations to be
supported by HSA Components (other HSA Agents only need to
support the subset of these operations required by their role in the
system):
Load from memory
Store to memory
Fetch from memory, apply logic operation (bitwise AND/OR/XOR)
with one addition operand, and store back.
Fetch from memory, apply integer arithmetic operation (add,
subtract, increment, decrement, minimum, maximum) with one
addition operand, and store back.
Exchange memory location with operand.
Compare-and-swap (CAS); load memory location, compare with first
operand, if equal than store second operand back to memory
location.
18
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA System Timestamp
The HSA system provide for a low overhead mechanism of determining the
passing of time.
A system timestamp is required that can be read from HSAIL or through the
HSA runtime.
It is also possible to determine the system timestamp frequency through the
HSA runtime.
19
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
20
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
User Model Queuing
Multiple user-level
command queues
Runtime-allocated
Architected Queuing
Language (AQL)
21
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
User Mode Queue Operations
HSA Runtime APIs ( for HSA application )
• hsa_queue_create ( )
• hsa_queue_destroy ( )
• hsa_queue_inactivate ( )
• hsa_queue_load_write_index_{acquire, relaxed} ( )
• hsa_queue_store_write_index_{relaxed, release} ( )
• hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( )
• hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( )
• hsa_queue_load_read_index_{acquire, relaxed} ( )
• hsa_queue_store_read_index_{relaxed, release} ( )
23
HSAIL Instructions ( for HSA component )
• agentcount_u32 dest
• agentid_u32 dest
• ldk_uLength dest, kernelName
• queueid_u32 dest
• queueptr_uLength dest
• ldqueuewriteindex_segment_order_u64 dest, address
• stqueuewriteindex_segment_order_u64 address, src
• casqueuewriteindex_segment_order_u64 dest, address, src0, src1
• addqueuewriteindex_segment_order_u64 dest, address, src
• ldqueuereadindex_segment_order_u64 dest, address
• stqueuereadindex_segment_order_u64 address, src
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
AQL Packet Types
24
HSA signaling object handle used to indicate completion of the job.
Format (8-bit)
barrier (1-bit)
acquireFenceScope (2-bit)
releaseFenceScope (2-bit)
reserved (3-bit)
Format
0 Always_Reserved
1 Invalid
2 Kernel_Dispatch
3 Barrier_AND
4 Agent_Dispatch
5 Barrier_OR
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Kernel Dispatch Packet
25
Work-group Size
Grid Size
Segment Size
Pointer to the Kernel
Pointer to the
arguments
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Agent Dispatch Packet
26
64-bit direct or indirect
arguments
Pointer to location to
store the function
return value(s) in
The function to be performed
by the destination HSA Agent.
The type value is split into
the following ranges:
0x0000 ~ 0x7FFF Reserved
0x8000 ~ 0xFFFF User registered
function
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Barrier-AND / Barrier-OR Packet
The Barrier packet defines dependencies for the HSA Packet Processor
to monitor.
The HSA Packet Processor will not launch any further packets until the Barrier-
AND / Barrier-OR packet is complete.
27
Handles for dependent
signaling objects to be
evaluated by the packet
processor.
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Packet Process Flow
All preceding packets in the queue must have completed their launch phase.
If the barrier bit in the packet header is set than all preceding packets in the
queue must have completed.
In the launch phase an acquire memory fence is applied before the packet
enters the active phase.
Kernel Dispatch packets and Agent Dispatch packets execute on the HSA
Component/Agent, and the active phase ends when the task completes.
Barrier-AND and Barrier-OR packets remain in the active phase until their
condition is met.
The first step in the completion phase is the memory release fence.
After the memory release fence completes the signal specified by the
completionSignal field in the AQL packet is signaled with a decrementing
atomic operation.
28
Launch Phase
Active Phase
Completion Phase
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Barrier-bit Example
29
completionSignal
Barrier bit = 1
AQL Packet
De
qu
eu
eE
nq
ue
ue L
au
nch
Ph
ase
Activ
e P
ha
se
Co
mp
letio
n P
ha
se
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
31
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Agent Scheduling
32
AQL packet
(Agent Dispatch packet or Barrier-AND/OR packet)
HSA
Agent
Scheduling
Agent Dispatch Queue
Agent Dispatch Queue
Agent Dispatch Queue
Agent Dispatch Queue
Non-HSA Task Pool
Agent Dispatch Queue
Application #1
Application #2
Application #3
HSA
Agent
Trigger
Task execution completeAQL packet submission
Barrier packet complete
Agt
Agt
Agt
Agt
Agt
Agt
Agt
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Component Context Switching
33
HSA
Agent
Scheduling
Agent Dispatch Queue
Agent Dispatch Queue
Agent Dispatch Queue
Agent Dispatch Queue
Non-HSA Task Pool
Agent Dispatch Queue
Application #1
Application #2
Application #3
Agt
Agt
Agt
Agt
Agt
Agt
Agt
Compute Unit
(CU)
Compute Unit
(CU)
Compute Unit
(CU)
HSA Agent
Context
Switching
HSA Component
Kernel
ProgramKernel
ProgramKernel
Program
WG
WG
WG
1. Switch ( Required )
2. Preempt ( Required as soon as possible )
3. Terminate and context reset (Terminated as fast as possible)
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
34
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
FP Exception Reporting
An HSA Component shall report certain defined exceptions related to
the execution of the HSAIL code to the HSA Runtime.
35
DETECT
Policy
BREAK
Policy
Lane
0
Lane
1
Lane
2Lane(N-1)
Lane
3
Work
Item
Work
Item
Work
Item
Work
Item
Work
Item
Lane
4
Work
Item
Work-Group 0 Work-Group 2Work-Group 1 Work-Group X
Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y
Grid
Work-Group 1
Wavefront Size
N = 1 ~ 64
Compute Unit
PC
HSA Component (HSA Agent)
Wavefront 2
SIMD (Single Instruction, Multiple Data) style
Status bits
Policy
Exception Handler
HSA Runtime
Host CPU (HSA Agent)
Exception Module
Exception Policy
DETECT
BREAK
Signaling
cleardetectexcept_u32
getdetectexcept_u32
setdetectexcept_u32
HSAIL Instruction
Exception
CodeDescription
Invalid operatoin
Divide-by-zero
Overflow
Underflow
Inexact
0
1
2
3
4
IEEE754-2008
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Debug Infrastructure
The HSA Component shall provide mechanisms to allow system
software and some select application software (for example,
debuggers and profilers) to set breakpoints and collect throughput
information for profiling.
36
Lane
0
Lane
1
Lane
2Lane(N-1)
Lane
3
Work
Item
Work
Item
Work
Item
Work
Item
Work
Item
Lane
4
Work
Item
Work-Group 0 Work-Group 2Work-Group 1 Work-Group X
Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y
Grid
Work-Group 1
Wavefront Size
N = 1 ~ 64
Compute
Unit
PC
HSA Component (HSA Agent)
Wavefront 2
SIMD (Single Instruction, Multiple Data) style
Status bits
Policy
Exception Module
Instruction
Breakpoint
Debug Module
Host CPU (HSA Agent)
Debuggers
HSA Component
Debug Inteface
Profilers
Conditional
Beakpoint
Memory
Brakpoint
© 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
37
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Execution Environment
38
You have 2 OpenCL platform(s)
----------------------------------------------
Platform[0].Name = NVIDIA CUDA
Platform[0].Vendor = NVIDIA Corporation
Platform[0].Version = OpenCL 1.1 CUDA 4.2.1
Platform[0].Profile = FULL_PROFILE
----------------------------------------------
Platform[1].Name = Intel(R) OpenCL
Platform[1].Vendor = Intel(R) Corporation
Platform[1].Version = OpenCL 1.2
Platform[1].Profile = FULL_PROFILE
----------------------------------------------
Platform[0] has 1 device(s)
----------------------------------------------
Device[0].Type = CL_DEVICE_TYPE_GPU
Device[0].Name = GeForce GT 625
Device[0].Vendor = NVIDIA Corporation
Device[0].Version = OpenCL 1.1 CUDA
Device[0].DriverVersion = 320.49
Device[0].Profile = FULL_PROFILE
Device[0].OpenCL_C = OpenCL C 1.1
Device[0].MaxComputeUnits = 1
Device[0].MaxWiDimensions = 3
Device[0].MaxWiSize = (1024,1024,64)
Device[0].MaxWgSize = 1024
Device[0].MaxClkFrequency = 1747 MHz
Device[0].AddrSpaceSize = 32 bits
Platform[1] has 1 device(s)
----------------------------------------------
Device[0].Type = CL_DEVICE_TYPE_CPU
Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz
Device[0].Vendor = Intel(R) Corporation
Device[0].Version = OpenCL 1.2 (Build 80752)
Device[0].DriverVersion = 3.0.1.15216
Device[0].Profile = FULL_PROFILE
Device[0].OpenCL_C = OpenCL C 1.2
Device[0].MaxComputeUnits = 4
Device[0].MaxWiDimensions = 3
Device[0].MaxWiSize = (1024,1024,1024)
Device[0].MaxWgSize = 1024
Device[0].MaxClkFrequency = 3100 MHz
Device[0].AddrSpaceSize = 32 bits
OpenCL APIs
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Platform Topology Discovery
HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O
39
HSA Platform Node 2
Node 0
Add-In Board (optional)
HSA discrete GPU
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Device Local
Memory
coherent
non-coherent
Mem
Mem
HSA MMU
SBIOS
UEFI
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
Mem
Node 1
PCIe
BridgePCIE
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Mem HSA MMU
Add-In Board (optional)
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
PCIE
Mem
VBIOS
UEFI GOP
So
cke
t In
terc
on
ne
ct
Node 3
PCIE
Node 4
PCIE
VBIOS
UEFI GOP
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Signaling and Synchronization
5. Atomic Memory Operations
6. HSA System Timestamp
7. User Mode Queuing
8. Architected Queuing Language (AQL)
9. HSA Agent Scheduling
10. HSA Component Context Switching
11. IEEE754-2008 Floating Point Exceptions
12. HSA Component Hardware Debug Infrastructure
13. HSA Platform Topology Discovery
14. Images
40
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Images
A graphics feature that can
sometimes be useful in data-
parallel computing
Used to store one-, two-, or
three-dimensional images
predefined image formats
Image memory is a special kind
of memory access
Dedicated hardware to speed
up image operations.
41
The OpenCL™ Specification
Version 2.0:
5.3 Image Objects
http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Summary
Programming model issues
HSA Intermediate Language (HSAIL) + HSA Runtime
Architected Queuing Language (AQL) + Signaling
Debug infrastructure
Communication overhead issues
Cache coherent shared virtual memory (CC-SVM)
Architected Queuing Language (AQL) for user mode queuing
Hardware-assisted signaling and atomic operations for synchronization
42
CPUs GPU DSP
...
HSAIL
Unified Coherent Memory
HSA Runtime
AQL
© 2014 JAY WANG