HSA ENABLEMENT OF APARAPI EASING THE DEVELOPER PATH TO APU/GPU ACCELERATED JAVA APPLICATIONS
VIGNESH RAVI – SOFTWARE DEVELOPER HSA TEAM AMD GARY FROST – SOFTWARE FELLOW AMD
2 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HSA ENABLEMENT OF APARAPI : AGENDA
! Java GPU enablement via Aparapi ‒ Why Java? ‒ Aparapi
‒ What is it and how is it used?
! Introduction to HSA ! How HSA simplifies Java GPU programming with Aparapi
‒ Simpler programming model using lambda expressions ‒ Removal of previous constraints thanks to SVM (Shared Virtual Memory)
! The nuts and bolts of our current HSA enablement ‒ HSAIL generation ‒ Dispatch via HSA Runtime APIs
! Summary ! Q&A
3 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
WHY JAVA?
! Java by the numbers ‒ 9 Million Developers ‒ 1 Billion Java downloads per year ‒ 97% Enterprise desktops run Java ‒ 100% of blue ray players ship with Java hVp://oracle.com.edgesuite.net/[meline/java/
! Java 7 language & libraries already include concurrency features ‒ primi[ves (threads, locks, monitors, atomic ops) ‒ libraries (fork/join, thread pools, executors, futures)
! Upcoming Java 8 include stream processing enhancements ‒ support for ‘lambda’ expressions ‒ Lambda centric concurrent stream processing libs/apis (java.u[l.stream.*)
4 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
INITIAL APARAPI PROJECT OVERVIEW (2011)
! Open Source framework ! Allows Java developers access to GPU compute ! Aparapi Java API for expressing data parallel workloads
Kernel kernel = new Kernel(){ @Override public void run(){ int i=getGlobalID(); square[i]=in[i]*in[i]; } }; kernel.execute(size);
! Aparapi runtime capable of converting bytecode to OpenCL™ ‒ Execution on OpenCL™ 1.1+ capable devices (GPUs and APUs) Or… ‒ Execute via a thread pool if OpenCL™ is unavailable.
GPU ISA
JVM
Java Applica[on
OpenCL™
OpenCL™ compiler & Runtime
Overload Aparapi Kernel Base Class’s run() method
CPU ISA
Overload Aparapi Kernel Class’s run() method
Aparapi converts bytecode to OpenCL™
CPU GPU
5 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
MEET HSA AND HSAIL
! Heterogeneous System Architecture standardizes CPU/GPU func[onality ‒ Be ISA-‐agnos[c for both CPUs and accelerators ‒ Support high-‐level programming languages ‒ Provide the ability to access pageable system memory from the GPU ‒ Maintain cache coherency for system memory between CPU and GPU
! Specifica[ons and simulator from HSA Founda[on ‒ HSAIL portable ISA is “finalized” to par[cular hardware ISA at run[me ‒ Run[me specifica[on for job launch and control ‒ HSAIL™ simulator for development and tes[ng before hardware availability
6 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
APARAPI HSA ENABLEMENT (2013-‐2014)
! Open Source project sponsored ! Enhanced to support HSA and Java 8 lambda expression
Device.hsa().forEach(size, i -> square[i]=in[i]*in[i] );
! Allow developers to efficiently represent data parallel algorithms using new Java 8 Lambda expressions
! API’s have same look & feel as proposed Java 8 stream API features
! No modifica[ons to the JVM. ‒ We provide external JNI/Java libraries.
GPU ISA
JVM
Java Applica[on
GPU CPU
HSAIL
HSA Finalizer & Runtime
Aparapi Lambda based API
Aparapi converts bytecode to HSAIL
CPU ISA
7 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HSA AND LAMBDA ENABLED APARAPI EXECUTION EXAMPLE
Device.hsa().forEach(size, i -> square[i]=int[i]*int[i] );
Is this the first execuAon of this
lambda instance?
Execute HSAIL Kernel on GPU/APU
Does PlaLorm Supports HSA?
Y Can bytecode be converted to
HSAIL?
Y
Convert bytecode to
HSAIL
Y
Y Do we have HSAIL for this lambda ?
N
Execute Kernel using Java thread Pool
N
N
N
8 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
SUMATRA PROJECT : NATIVE SUPPORT FOR GPU OFFLOAD ADDED TO JAVA
! AMD/Oracle sponsored Open Source (OpenJDK) project
! Targeted at OpenJDK Java 9 (2015) ! Allow developers to efficiently represent data parallel algorithms in Java using
Stream API + Lambda expressions
! Sumatra is not pushing new ‘programming model’
! Instead we ‘repurpose’ Stream API + Lambda to enable both CPU or GPU compu[ng
! A Sumatra enabled Java Virtual Machine™ will dispatch ‘selected’ constructs to HSA enabled devices at run[me.
! Developers already refactoring JDK to use stream & lambda API’s ‒ So anyone using exis[ng JDK should see GPU accelera[on without any code changes.
! Links: ‒ hVp://openjdk.java.net/projects/sumatra ‒ hVps://wikis.oracle.com/display/HotSpotInternals/Sumatra ‒ hVp://mail.openjdk.java.net/pipermail/sumatra-‐dev
GPU ISA
JVM
Java Applica[on
GPU CPU
HSAIL
HSA Finalizer & Runtime
CPU ISA
Java JDK Stream + Lambda API
Java GRAAL JIT backend
9 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HSA ENABLEMENT OF JAVA
9
CPU ISA GPU ISA
JVM
Java Applica[on
GPU CPU
OpenCL™
OpenCL™ Compiler and Run[me
APARAPI API
Java 7 – OpenCL enabled Aparapi • AMD ini[ated Open Source project • APIs for data parallel algorithms
GPU accelerate Java applica[ons
No need to learn OpenCL
• Ac[ve community captured mindshare ~20 contributors >7000 downloads ~150 visits per day
CPU ISA GPU ISA
JVM
Java Applica[on
GPU CPU
HSAIL™
HSA Finalizer & Run[me
APARAPI + Lambda API
Java 8 – HSA enabled Aparapi • Java 8 brings Stream + Lambda API.
More natural way of expressing data parallel algorithms Ini[ally targeted at mul[-‐core.
• APARAPI will :-‐ Support Java 8 Lambdas Dispatch code to HSA enabled devices at run[me via HSAIL
Java 9 – HSA enabled Java (Sumatra) • Adds na[ve GPU compute support to Java Virtual Machine
(JVM)
• Developer uses JDK provided Lambda + Stream API
• JVM uses GRAAL compiler to generate HSAIL
• JVM decides at run[me to execute on either CPU or GPU depending on workload characteris[cs.
GPU ISA
JVM
Java Applica[on
GPU CPU
HSAIL™
HSA™ Finalizer & Run[me
Java JDK Stream + Lambda API
Java GRAAL JIT backend
CPU ISA
We plan to provide HSA Enabled Aparapi (Java 8)
as a bridge technology between OpenCL based Aparapi (Java 7)
and HSA Enabled Sumatra (Java 9)
10 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013| 10
A CASE STUDY CENTERED ON NBODY
! A Java developer implemen[ng a sequen[al version of NBody would probably… ‒ Create a class to represent each body
! Loop through each Body (in array of bodies[]) to update and display
class Body{ float x,y,z,m,vx,vy,vz; // Include method to update position and display void updateAndShow(Screen screen, Body[] bodies){ for (Body other:bodies){
// accumulate forces between other and this } // update vx,vy,vz,x,y and z from accumulated data screen.paint(x,y,z);
} }
for (Body b: bodies) b.updateAndShow(screen, bodies);
11 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013| 11
WITHOUT HSA WE CAN’T (EFFICIENTLY) USE OBJECTS
! In Java; allocated Objects are scaVered on the heap. ‒ There is no way to allocate an array of objects in con[guous memory (as with C++) ‒ We force the developer to resort to using parallel arrays of primi[ves (which are con[guous)
‒ And to infer that x[n], y[n] and z[n] holds the state for bodies[n].
‒ Then the kernel can be used to execute the code on the GPU
float x[], y[], z[], m[], vx,[], vy[], vz[];
Kernel kernel = new Kernel(){ public void run(){
int i = getGlobalId(0); for (int j=0; j<bodies.length; j++){
// accum forces between (x,y,z)[j] and (x,y,z)[i] } // update vx[j],vy[j],vz[j],x[j],y[j] and z[j]
} };
Kernel.execute(bodies.length);
12 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013| 12
HSA ENABLED APARAPI (AND SUMATRA) ALLOWS USE OF OBJECTS
! So we code our Body class exactly as we would if execu[ng in Java.
! Then use new Aparapi lambda enabled API to coordinate dispatch to theGPU
class Body{ float x,y,z,m,vx,vy,vz; // Include method to update position and display void updateAndShow(Screen screen, Body[] bodies){ for (Body other:bodies){
// accumulate forces between other and this } // update vx,vy,vz,x,y and z from accumulated data screen.paint(x,y,z);
} }
Device.hsa().forEach(bodies, b -> { b.updateAndShow(screen, bodies); });
13 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
OVERVIEW OF HSA ENABLED APARAPI
! HSA enabled Aparapi, at run[me: ‒ Step 0: Generate HSAIL from Bytecode ‒ Step 1: Generate host HSA Run[me calls
‒ Step 1.1: Ini[alize HSA run[me, device, queue …
‒ Step 1.2: Finalize HSAIL to generate GPU ISA ‒ Step 1.3: Bind Java args to HSA args ‒ Step 1.4: Dispatch the kernel ‒ Step 1.5: Wait for comple[on ‒ Repeat steps 1.3 -‐ 1.5 for next itera[on of same kernel
‒ Repeat step 0 – 1 for each new kernel
CPU ISA
JVM
Application
Aparapi
GPU CPU
Run
time
MyLambda.java
javac (compiler)
MyLambda.class
Dev
elop
men
t tim
e
Generate HSAIL
Generate HSA RT
calls
Initialize
Bind Args
Finalize
Dispatch GPU ISA
Input
Contains
14 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HIGH LEVEL HSA FEATURES
! Features currently being defined in the HSA Working Groups** ‒ Unified addressing across all processors ‒ Opera[on into pageable system memory ‒ Full memory coherency ‒ Pla|orm atomics ‒ User mode dispatch
‒ Enables fast dispatch with no driver involvement ‒ Architected queuing language
‒ Flexible compute dispatch, easier GPU self-‐enqueue ‒ High level language support for GPU compute processors ‒ Preemp[on and context switching
** All features subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
@ Copyright 2012 HSA Founda[on. All Rights Reserved.
15 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HSA INTERMEDIATE LANGUAGE (HSAIL)**
! HSAIL is a virtual ISA for parallel programs ‒ Finalized to vendor-‐specific ISA by a JIT compiler or “Finalizer” ‒ ISA independent by design for CPU & GPU
! Explicitly parallel ‒ Designed for data parallel programming
! Support for excep[ons, virtual func[ons, and other high level language features ! Lower level than OpenCL™ SPIR
‒ Fits naturally in the OpenCL™ compila[on stack
! Suitable to support addi[onal high level languages and programming models: ‒ Java, C++, OpenMP, etc
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups @ Copyright 2012 HSA Founda[on. All Rights Reserved.
16 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
HSAIL OVERVIEW**
! Similar to assembly language for a RISC CPU ‒ Load-‐store architecture
ld_global_u64 $d0, [$d6 + 120]; $d0= load($d6+120)
add_u64 $d1, $d2, 24; $d1= $d2+24
! 136 opcodes (Java™ bytecode has 200) ‒ Floa[ng point (single, double, half (f16)) ‒ Integer (32-‐bit, 64-‐bit) ‒ Some packed opera[ons ‒ Branches ‒ Func[on calls ‒ Pla$orm Atomic Opera[ons: and, or, xor, exch, add, sub, inc, dec, max, min, cas ‒ Synchronize host CPU and HSA Component!
! Text and Binary formats (“BRIG”)
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
! Four classes of registers ‒ C: 1-‐bit, Control Registers ‒ S: 32-‐bit, Single-‐precision FP or Int ‒ D: 64-‐bit, Double-‐precision FP or Long Int ‒ Q: 128-‐bit, Packed data.
! Fixed number of registers: ‒ 8 C ‒ S, D, Q share a single pool of resources
S + 2*D + 4*Q <= 128
Up to 128 S or 64 D or 32 Q (or a blend)
! Register alloca[on done in high-‐level compiler ‒ Finalizer doesn’t have to perform expensive register alloca[on
INSTRUCTION SET REGISTERS
@ Copyright 2012 HSA Founda[on. All Rights Reserved.
17 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
SEGMENTS AND MEMORY **
! 7 segments of memory ‒ global, readonly, group, spill, private, arg, kernarg, ‒ Memory instruc[ons can (op[onally) specify a segment
! Global Segment ‒ Visible to all HSA agents (including host CPU)
! Group Segment ‒ Provides high-‐performance memory shared in the work-‐group.
‒ Group memory can be read and wriVen by any work-‐item in the work-‐group
‒ HSAIL provides sync opera[ons to control visibility of group memory
! Spill, Private, Arg Segments ‒ Represent different regions of a per-‐work-‐item stack ‒ Typically generated by compiler, not specified by programmer
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
! Kernarg Segment ‒ Programmer writes kernarg segment to pass arguments to a kernel
! Read-‐Only Segment ‒ Remains constant during execu[on of kernel
! Flat Addressing ‒ Each segment mapped into virtual address space
‒ Flat addresses can map to segments based on virtual address
‒ Instruc[ons with no explicit segment use flat addressing
‒ Very useful for high-‐level language support (ie classes, libraries)
‒ Aligns well with OpenCL 2.0 “generic” addressing feature
ld_global_u64 $d0, [$d6] ld_group_u64 $d0,[$d6+24] st_spill_f32 $s1,[$d6+4] ld_kernarg_u64 $d6, [%_arg0] ld_u64 $d0,[$d6+24] ; flat
@ Copyright 2012 HSA Founda[on. All Rights Reserved.
18 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
EXAMPLE – BYTECODE TO HSAIL GENERATION
int in[], out[]; Device.hsa().forEach(len, i-> out[i] = in[i] * in[i] );
javac –g squares.java
0: aload_0 //out[] 1: iload_2 //i 2: aload_1 //in[] 3: iload_2 4: iaload 5: aload_1 6: iload_2 7: iaload 8: imul 9: iastore 10: return
version 0:95: $full : $large; kernel &run( kernarg_u64 %_arg0, //out[] kernarg_u64 %_arg1, //in[] kernarg_s32 %_arg2 ){ ld_kernarg_u64 $d0, [%_arg0]; ld_kernarg_u64 $d1, [%_arg1]; ld_kernarg_s32 $s2, [%_arg2]; workitemabsid_u32 $s2, 0; //i mov_b64 $d3, $d0; mov_b32 $s4, $s2; mov_b64 $d5, $d1; mov_b32 $s6, $s2; cvt_u64_s32 $d6, $s6; mad_u64 $d6, $d6, 4, $d5; ld_global_s32 $s5, [$d6+24]; mov_b64 $d6, $d1; mov_b32 $s7, $s2; cvt_u64_s32 $d7, $s7; mad_u64 $d7, $d7, 4, $d6; ld_global_s32 $s6, [$d7+24]; mul_s32 $s5, $s5, $s6; cvt_u64_s32 $d4, $s4; mad_u64 $d4, $d4, 4, $d3; st_global_s32 $s5, [$d4+24]; ret; };
Generated HSAIL
19 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
APARAPI JNI CALL -‐> HSA RUNTIME API
Device Discovery & Queue Crea[on APIs**
! Discover HSA Device ‒ Both count and device_list are out params ‒ User can iterate over HSA devices in the list
! User-‐Mode Queue Crea[on ‒ User can provide pre-‐allocated buffer ‒ If not, API will allocate a buffer ‒ queue is the user-‐mode queue
HsaStatus HsaGetDevices(unsigned int *count, const HsaDevice **device_list);
HsaStatus HsaCreateUserModeQueue(const HsaDevice *device, void *buffer, size_t buffer_size,
HsaQueuePriority queue_priority, HsaQueueFrac[on queue_frac[on, HsaQueue **queue);
** All APIs subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
20 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
APARAPI JNI -‐> HSA RUNTIME API
Finalize HSAIL to GPU ISA**
! Transla[ng HSAIL text to Binary (BRIG) ‒ BRIG is a binary container for several sec[ons
‒ Code ‒ String ‒ Direc[ve ‒ …
‒ libHsail is an assembler/disassembler ‒ This is a standalone compiler library ‒ Not part of Run[me
! Finalize Brig to IHV specific GPU ISA ‒ Input: Brig ‒ Output: HsaKernelCode which contains ISA
** All APIs subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
Status Assemble (const char* hsail_text, HsaBrig *brig);
HsaStatus HsaFinalizeBrig(const HsaDevice *device, HsaBrig *brig, const char *kernel_name, const char *op[ons, HsaKernelCode **kernel);
21 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
APARAPI JNI -‐> POPULATION OF AQL DISPATCH PACKET
! AQL Dispatch Packet** ‒ Header enables:
‒ Different packet types ‒ Specify if this packet should wait for all previous to complete
‒ Control visibility of data and memory fences before and a�er dispatch
‒ Body enables: ‒ Specify the problem fan out using launch config related fields
‒ How much workgroup memory? ‒ Loca[on of IHV specific GPU ISA ‒ Loca[on of where kernelargs can be found ‒ A signal mechanism to wait on kernel comple[on
! Only popula[ng Kernel info and signal are opaque, so require run[me APIs ‒ Other fields are open, so simple assignments
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
typedef struct HsaAqlDispatchPacket { uint32_t format : 8; uint32_t barrier : 1; uint32_t acquire_fence_scope : 2; uint32_t release_fence_scope : 2; uint32_t invalidate_instruction_cache : 1; uint32_t invalidate_roi_image_cache : 1; uint32_t dimensions : 2; uint32_t reserved : 15; uint16_t workgroup_size[3]; uint16_t reserved2; uint32_t grid_size[3]; uint32_t private_segment_size_bytes; uint32_t group_segment_size_bytes; uint64_t kernel_object_address; uint64_t kernel_arg_address; uint64_t reserved3; uint64_t completion_signal; } HsaAqlDispatchPacket;
Header Fields
Launch Config
Kernel Info
Kernel SynchronizaAon
22 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
POPULATING KERNEL INFO AND SIGNAL USING HSA RT API**
HsaStatus HsaFinalizeBrig(const HsaDevice *device, HsaBrig *brig, const char *kernel_name, const char *op[ons, HsaKernelCode **kernel);
typedef struct HsaKernelCode { … uint32_t workitem_private_segment_byte_size; uint32_t workgroup_group_segment_byte_size; uint64_t kernarg_segment_byte_size; … } HsaKernelCode;
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
typedef struct HsaAqlDispatchPacket { … uint32_t private_segment_size_bytes; uint32_t group_segment_size_bytes; uint64_t kernel_object_address; uint64_t kernel_arg_address; … uint64_t completion_signal; }
Pack Java Args into a vector in JNI
HsaStatus HsaRegisterSystemMemory(void *address, size_t size);
Register vector data address
HsaStatus HsaCreateSignal(HsaSignal *signal);
23 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
DISPATCH AND WAIT ON KERNEL COMPLETION
! Dispatch ‒ Submit AQL Packet into the HsaQueue ‒ Thread safe API
! Wait on Kernel Comple[on
** Subject to change, pending comple[on and ra[fica[on of specifica[ons in the HSA Working Groups
HsaStatus HsaSubmitAql(HsaQueue *queue,HsaAqlDispatchPacket *aql_packet);
bool is_done = false; while (!is_done) { status = HsaQuerySignal(signal, &is_done); assert(status == kHsaStatusSuccess); }
! A�er comple[on, disposing HSA resources ‒ Release queue ‒ Release signal ‒ Release Kernel object ‒ Deregister kernel args related memory
HsaStatus HsaDestroyUserModeQueue(HsaQueue *queue); HsaStatus HsaDestroySignal(HsaSignal signal); HsaStatus HsaFreeKernelCode(HsaKernelCode *kernel); HsaStatus HsaDeregisterSystemMemory(void *address);
24 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
DEMO
25 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
SUMMARY
! Aparapi is already an establish framework for simplifying execu[on of Java on GPU devices
! HSA enabled Aparapi further simplifies GPU accelera[on of Java applica[ons ‒ Aligns with Java 8 features to support ‘lambda’ expression for compactness ‒ Enables ‘large unified’ system memory for GPU accelera[on ‒ Eases programming by enabling direct access to Java objects on heap ‒ Enables fast offload of Java kernels through User-‐mode queue and AQL
! HSA enabled Aparapi lends to more interes[ng future possibili[es ‒ Simplified communica[on and workload balancing across both CPU and GPU ‒ Exploit new computa[on paVerns and recursions through kernel self-‐enqueue
26 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
QUESTIONS & ANSWERS?
27 | HSA ENABLEMENT OF APARAPI | NOVEMBER 2013|
DISCLAIMER & ATTRIBUTION
The informa[on presented in this document is for informa[onal purposes only and may contain technical inaccuracies, omissions and typographical errors.
The informa[on contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, so�ware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obliga[on to update or otherwise correct or revise this informa[on. However, AMD reserves the right to revise this informa[on and to make changes from [me to [me to the content hereof without obliga[on of AMD to no[fy any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combina[ons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdic[ons. OpenCL is a trademark of Apple Inc. HSA is a trademark of the Heterogeneous System Architecture Founda[on. Other names are for informa[onal purposes only and may be trademarks of their respec[ve owners.