42
ITRI Industrial Technology Research Institute 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 HSA System Architecture Overview 王振傑 (Jay Wang) 嵌入式系統與晶片技術組 -系統架構設計部 (D200) 資訊與通訊研究所 (ICL) [email protected] 2014-10-31

HSA System Architecture Overview (2014-10-31)

Embed Size (px)

DESCRIPTION

2014 HSA Forum, ITRI, Taiwan.

Citation preview

ITRI

Industrial Technology

Research Institute

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院

HSA System Architecture Overview

王振傑 (Jay Wang)

嵌入式系統與晶片技術組 -系統架構設計部 (D200)

資訊與通訊研究所 (ICL)

[email protected]

2014-10-31

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Platform Model

2

In HSA system, a regular device is called an HSA agent, and if the HSA

agent can run kernels then it is also an HSA component.

Serial and Task

Parallel Workloads

Data Parallel

Workloads

SIMD

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Three Eras of Processor Performance

3

?

Sin

gle

-th

rea

d

Pe

rfo

rma

nce

Time

we are

here

Enabled by:

Moore’s Observation

Voltage Scaling

Micro-Architecture

Constrained by:

Power

Complexity

Single-Core Era

Mo

de

rn A

pp

lica

tio

n

Pe

rfo

rma

nce

Time (Data-parallel exploitation)

we are

here

Heterogeneous

Systems Era

Enabled by:

Moore’s Observation

Abundant data parallelism

Power efficient data parallel

processing (GPUs)

Constrained by:

Programming models

Communication overheads T

hro

ug

hp

ut

Pe

rfo

rma

nce

Time (# of processors)

we are

here

Enabled by:

Moore’s Observation

Desire for Throughput

20 years of SMP arch

Constrained by:

Power

Parallel SW availability

Scalability

Multi-Core Era

Assembly C/C++ Java … pthreads OpenMP / TBB … Shader CUDA OpenCL

C++ and Java

SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Intermediate Language (HSAIL)

4

The HSA Foundation members are building a heterogeneous compute software ecosystem

built on open, royalty-free industry standards and open-source software: the HSA

runtimes and compilation tools are based on open-source technologies such as LLVM and

GCC. ( https://github.com/HSAFoundation )

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSAIL Programming Model

5

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Runtime Stack

6

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Kernel Execution

7

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Memory Consistency Model

(Relaxed Model)

Second Operation

ld_rlx

st_rlx atomic_rlx

atomicNoRet_rlx atomic_acq

atomicNoRet_acq fence_acq

atomic_rel

atomicNoRet_rel

fence_rel

atomic_ar

atomicNoRet_ar

fence_ar

First

Operation

ld_rlx or st_rlx yes yes yes yes no no

atomic_rlx

atomicNoRet_rlx yes yes yes no no no

atomic_acq

atomicNoRet_acq

fence_acq no no no no no no

atomic_rel

atomicNoRet_rel yes yes no no no no

fence_rel yes no no no no no

atomic_ar

atomicNoRet_ar

fence_ar no no no no no no

8

relaxed ;

…..

acquire ;

…..

release ;

…..

acq_rel ;

…..

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

9

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Legacy GPU Compute

Multiple memory pools and address spaces

Data copies before/after GPU compute

10

System Memory GPU Memory

1

23

Host CPUs GPU

Virtual Memory #1 Virtual Memory #2

(HSA Agent)(HSA Agent)

(HSA Component)

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Host CPUs GPU

@ 2014 JAY WANG

(HSA Agent)(HSA Agent)

(HSA Component)

Shared Virtual Memory

System Memory GPU Memory

Shared Virtual Memory (HSA)

11

32-bit HSA System

(32 bits VA)

64-bit HSA System

(≥ 48 bits VA)

MMU

OS Page Table

MMU

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Memory Hierarchy

12

1) Global

2) Group

3) Private

4) Kernarg

5) Readonly

6) Spill

7) Arg Virtual Address Range Reservation (System Memory or Device Local Memory)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Cache Coherency Domains

13

System Memory

Cache

Cache

Cache

Coherency

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

14

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Signaling and Synchronization

The required mechanisms for HSAIL and the HSA runtime are:

Allocate/Destroy an HSA signal

Read the current HSA signal value

Wait on an HSA signal to meet a specified condition (with a maximum wait duration

requested)

Send an HSA signal value

Atomic read-modify-write an HSA signal value

15

Signal Handle(hsa_signal_t)

Signal Value(hsa_signal_value_t)

HSA Agent

HSA

ComponentHost CPU

HSA Agent

HSA Runtime

APIs

HSAIL

Instructions

Implementation-

defined data

Sig32 or Sig64

© 2014 JAY WANG

sem_init()

sem_wait()

sem_post()

sem_destroy()

pthread_mutex_init()

pthread_mutex_lock()

pthread_mutex_unlock()

pthread_mutex_destroy()

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Runtime APIs for Signaling

16

HSA Runtime APIs ( for HSA application )

• hsa_signal_create ( )

• hsa_signal_destroy ( )

• hsa_signal_load_{acquire, relaxed} ( )

• hsa_signal_store_{relaxed, release} ( )

• hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_add_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_and_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_or_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( )

• hsa_signal_wait__{acquire, relaxed} ( )

HSA Runtime Programmer’s Reference Manual (v1.00)

2.4 Signals

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSAIL Instructions for Signaling

17

HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model,

Compiler Writer’s Guide, and Object Format (BRIG) (Version 1.0 Provisional)

6.8 Notification (signal) Operation

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Atomic Memory Operations

HSA requires the following standard atomic memory operations to be

supported by HSA Components (other HSA Agents only need to

support the subset of these operations required by their role in the

system):

Load from memory

Store to memory

Fetch from memory, apply logic operation (bitwise AND/OR/XOR)

with one addition operand, and store back.

Fetch from memory, apply integer arithmetic operation (add,

subtract, increment, decrement, minimum, maximum) with one

addition operand, and store back.

Exchange memory location with operand.

Compare-and-swap (CAS); load memory location, compare with first

operand, if equal than store second operand back to memory

location.

18

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA System Timestamp

The HSA system provide for a low overhead mechanism of determining the

passing of time.

A system timestamp is required that can be read from HSAIL or through the

HSA runtime.

It is also possible to determine the system timestamp frequency through the

HSA runtime.

19

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

20

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

User Model Queuing

Multiple user-level

command queues

Runtime-allocated

Architected Queuing

Language (AQL)

21

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Packet Processor

22

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

User Mode Queue Operations

HSA Runtime APIs ( for HSA application )

• hsa_queue_create ( )

• hsa_queue_destroy ( )

• hsa_queue_inactivate ( )

• hsa_queue_load_write_index_{acquire, relaxed} ( )

• hsa_queue_store_write_index_{relaxed, release} ( )

• hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( )

• hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( )

• hsa_queue_load_read_index_{acquire, relaxed} ( )

• hsa_queue_store_read_index_{relaxed, release} ( )

23

HSAIL Instructions ( for HSA component )

• agentcount_u32 dest

• agentid_u32 dest

• ldk_uLength dest, kernelName

• queueid_u32 dest

• queueptr_uLength dest

• ldqueuewriteindex_segment_order_u64 dest, address

• stqueuewriteindex_segment_order_u64 address, src

• casqueuewriteindex_segment_order_u64 dest, address, src0, src1

• addqueuewriteindex_segment_order_u64 dest, address, src

• ldqueuereadindex_segment_order_u64 dest, address

• stqueuereadindex_segment_order_u64 address, src

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

AQL Packet Types

24

HSA signaling object handle used to indicate completion of the job.

Format (8-bit)

barrier (1-bit)

acquireFenceScope (2-bit)

releaseFenceScope (2-bit)

reserved (3-bit)

Format

0 Always_Reserved

1 Invalid

2 Kernel_Dispatch

3 Barrier_AND

4 Agent_Dispatch

5 Barrier_OR

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Kernel Dispatch Packet

25

Work-group Size

Grid Size

Segment Size

Pointer to the Kernel

Pointer to the

arguments

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Agent Dispatch Packet

26

64-bit direct or indirect

arguments

Pointer to location to

store the function

return value(s) in

The function to be performed

by the destination HSA Agent.

The type value is split into

the following ranges:

0x0000 ~ 0x7FFF Reserved

0x8000 ~ 0xFFFF User registered

function

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Barrier-AND / Barrier-OR Packet

The Barrier packet defines dependencies for the HSA Packet Processor

to monitor.

The HSA Packet Processor will not launch any further packets until the Barrier-

AND / Barrier-OR packet is complete.

27

Handles for dependent

signaling objects to be

evaluated by the packet

processor.

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Packet Process Flow

All preceding packets in the queue must have completed their launch phase.

If the barrier bit in the packet header is set than all preceding packets in the

queue must have completed.

In the launch phase an acquire memory fence is applied before the packet

enters the active phase.

Kernel Dispatch packets and Agent Dispatch packets execute on the HSA

Component/Agent, and the active phase ends when the task completes.

Barrier-AND and Barrier-OR packets remain in the active phase until their

condition is met.

The first step in the completion phase is the memory release fence.

After the memory release fence completes the signal specified by the

completionSignal field in the AQL packet is signaled with a decrementing

atomic operation.

28

Launch Phase

Active Phase

Completion Phase

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Barrier-bit Example

29

completionSignal

Barrier bit = 1

AQL Packet

De

qu

eu

eE

nq

ue

ue L

au

nch

Ph

ase

Activ

e P

ha

se

Co

mp

letio

n P

ha

se

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Barrier-AND Packet Example

30

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

31

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Agent Scheduling

32

AQL packet

(Agent Dispatch packet or Barrier-AND/OR packet)

HSA

Agent

Scheduling

Agent Dispatch Queue

Agent Dispatch Queue

Agent Dispatch Queue

Agent Dispatch Queue

Non-HSA Task Pool

Agent Dispatch Queue

Application #1

Application #2

Application #3

HSA

Agent

Trigger

Task execution completeAQL packet submission

Barrier packet complete

Agt

Agt

Agt

Agt

Agt

Agt

Agt

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Component Context Switching

33

HSA

Agent

Scheduling

Agent Dispatch Queue

Agent Dispatch Queue

Agent Dispatch Queue

Agent Dispatch Queue

Non-HSA Task Pool

Agent Dispatch Queue

Application #1

Application #2

Application #3

Agt

Agt

Agt

Agt

Agt

Agt

Agt

Compute Unit

(CU)

Compute Unit

(CU)

Compute Unit

(CU)

HSA Agent

Context

Switching

HSA Component

Kernel

ProgramKernel

ProgramKernel

Program

WG

WG

WG

1. Switch ( Required )

2. Preempt ( Required as soon as possible )

3. Terminate and context reset (Terminated as fast as possible)

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

34

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

FP Exception Reporting

An HSA Component shall report certain defined exceptions related to

the execution of the HSAIL code to the HSA Runtime.

35

DETECT

Policy

BREAK

Policy

Lane

0

Lane

1

Lane

2Lane(N-1)

Lane

3

Work

Item

Work

Item

Work

Item

Work

Item

Work

Item

Lane

4

Work

Item

Work-Group 0 Work-Group 2Work-Group 1 Work-Group X

Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y

Grid

Work-Group 1

Wavefront Size

N = 1 ~ 64

Compute Unit

PC

HSA Component (HSA Agent)

Wavefront 2

SIMD (Single Instruction, Multiple Data) style

Status bits

Policy

Exception Handler

HSA Runtime

Host CPU (HSA Agent)

Exception Module

Exception Policy

DETECT

BREAK

Signaling

cleardetectexcept_u32

getdetectexcept_u32

setdetectexcept_u32

HSAIL Instruction

Exception

CodeDescription

Invalid operatoin

Divide-by-zero

Overflow

Underflow

Inexact

0

1

2

3

4

IEEE754-2008

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Debug Infrastructure

The HSA Component shall provide mechanisms to allow system

software and some select application software (for example,

debuggers and profilers) to set breakpoints and collect throughput

information for profiling.

36

Lane

0

Lane

1

Lane

2Lane(N-1)

Lane

3

Work

Item

Work

Item

Work

Item

Work

Item

Work

Item

Lane

4

Work

Item

Work-Group 0 Work-Group 2Work-Group 1 Work-Group X

Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y

Grid

Work-Group 1

Wavefront Size

N = 1 ~ 64

Compute

Unit

PC

HSA Component (HSA Agent)

Wavefront 2

SIMD (Single Instruction, Multiple Data) style

Status bits

Policy

Exception Module

Instruction

Breakpoint

Debug Module

Host CPU (HSA Agent)

Debuggers

HSA Component

Debug Inteface

Profilers

Conditional

Beakpoint

Memory

Brakpoint

© 2014 JAY WANG

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

37

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Execution Environment

38

You have 2 OpenCL platform(s)

----------------------------------------------

Platform[0].Name = NVIDIA CUDA

Platform[0].Vendor = NVIDIA Corporation

Platform[0].Version = OpenCL 1.1 CUDA 4.2.1

Platform[0].Profile = FULL_PROFILE

----------------------------------------------

Platform[1].Name = Intel(R) OpenCL

Platform[1].Vendor = Intel(R) Corporation

Platform[1].Version = OpenCL 1.2

Platform[1].Profile = FULL_PROFILE

----------------------------------------------

Platform[0] has 1 device(s)

----------------------------------------------

Device[0].Type = CL_DEVICE_TYPE_GPU

Device[0].Name = GeForce GT 625

Device[0].Vendor = NVIDIA Corporation

Device[0].Version = OpenCL 1.1 CUDA

Device[0].DriverVersion = 320.49

Device[0].Profile = FULL_PROFILE

Device[0].OpenCL_C = OpenCL C 1.1

Device[0].MaxComputeUnits = 1

Device[0].MaxWiDimensions = 3

Device[0].MaxWiSize = (1024,1024,64)

Device[0].MaxWgSize = 1024

Device[0].MaxClkFrequency = 1747 MHz

Device[0].AddrSpaceSize = 32 bits

Platform[1] has 1 device(s)

----------------------------------------------

Device[0].Type = CL_DEVICE_TYPE_CPU

Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz

Device[0].Vendor = Intel(R) Corporation

Device[0].Version = OpenCL 1.2 (Build 80752)

Device[0].DriverVersion = 3.0.1.15216

Device[0].Profile = FULL_PROFILE

Device[0].OpenCL_C = OpenCL C 1.2

Device[0].MaxComputeUnits = 4

Device[0].MaxWiDimensions = 3

Device[0].MaxWiSize = (1024,1024,1024)

Device[0].MaxWgSize = 1024

Device[0].MaxClkFrequency = 3100 MHz

Device[0].AddrSpaceSize = 32 bits

OpenCL APIs

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

HSA Platform Topology Discovery

HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O

39

HSA Platform Node 2

Node 0

Add-In Board (optional)

HSA discrete GPU

System Memory

(cacheable)

coherent

(non-cacheable)

non-coherent

HSA APU

GPU

H-CU

H-CU

H-CU

GPU

H-CU

H-CU

H-CU

CPU

Core

Core

Core

Device Local

Memory

coherent

non-coherent

Mem

Mem

HSA MMU

SBIOS

UEFI

HSA discrete GPU

GPU

H-CU

H-CU

H-CU

Device Local

Memory

coherent

non-coherent

Mem

Node 1

PCIe

BridgePCIE

System Memory

(cacheable)

coherent

(non-cacheable)

non-coherent

HSA APU

GPU

H-CU

H-CU

H-CU

CPU

Core

Core

Core

Mem HSA MMU

Add-In Board (optional)

HSA discrete GPU

GPU

H-CU

H-CU

H-CU

Device Local

Memory

coherent

non-coherent

PCIE

Mem

VBIOS

UEFI GOP

So

cke

t In

terc

on

ne

ct

Node 3

PCIE

Node 4

PCIE

VBIOS

UEFI GOP

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

System Arch. Requirements

1. Shared Virtual Memory

2. Cache Coherency Domains

3. Flat Addressing

4. Signaling and Synchronization

5. Atomic Memory Operations

6. HSA System Timestamp

7. User Mode Queuing

8. Architected Queuing Language (AQL)

9. HSA Agent Scheduling

10. HSA Component Context Switching

11. IEEE754-2008 Floating Point Exceptions

12. HSA Component Hardware Debug Infrastructure

13. HSA Platform Topology Discovery

14. Images

40

@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Images

A graphics feature that can

sometimes be useful in data-

parallel computing

Used to store one-, two-, or

three-dimensional images

predefined image formats

Image memory is a special kind

of memory access

Dedicated hardware to speed

up image operations.

41

The OpenCL™ Specification

Version 2.0:

5.3 Image Objects

http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf

2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)

Summary

Programming model issues

HSA Intermediate Language (HSAIL) + HSA Runtime

Architected Queuing Language (AQL) + Signaling

Debug infrastructure

Communication overhead issues

Cache coherent shared virtual memory (CC-SVM)

Architected Queuing Language (AQL) for user mode queuing

Hardware-assisted signaling and atomic operations for synchronization

42

CPUs GPU DSP

...

HSAIL

Unified Coherent Memory

HSA Runtime

AQL

© 2014 JAY WANG