Text of OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained...
Slide 1
OpenCL Peter Holvenstot
Slide 2
OpenCL Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel
Slide 3
OpenCL Alternative to CUDA Not limited to ATI GPUs Designed for heterogenous computing Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs
Slide 4
OpenCL Similar structure of host programs and kernels Set of compute devices is called a 'context' Kernels executed by 'processing elements' Kernels can be compiled at run-time or build-time
Slide 5
OpenCL Task Parallelism many kernels running at once OpenCL 1.2 device can be partitioned down to single Compute Unit Built-in kernels for device-specific functionality
Slide 6
Advantages Same code can be run on different devices Can also be run on NVIDIA GPUs! AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) Limited library of portable math routines Most common BLAST and FFT routines
Slide 7
Performance
Slide 8
Slide 9
Slide 10
Disadvantages No official implementation Vendors may meet specs or add restrictions Apple adds restrictions on group size Devices need appropriate settings to perform well Different capabilities different performance Solution: Tuning/load balancing framework
Slide 11
Non-Optimized Performance
Slide 12
Slide 13
Restrictions No recursion, variadics, or function pointer Cannot dynamically allocate memory from device No native variable-length arrays, double-precision Some can be worked around by extensions
Slide 14
Terminology CUDA: Scalar Core Streaming Multiprocssr Warp PTX OpenCL: Stream Core Compute Unit Wavefront Intermediate Language
Slide 15
Terminology CUDA: Host Memory Global/Device Memory Local Memory Constant Memory Shared Memory Registers OpenCL: Host Memory Global Memory Constant Memory Local Memory Private Memory
Slide 16
Terminology CUDA: Grid Block Thread Thread ID Block Index Thread Index OpenCL: NDRange Work group Work item Global ID Block ID Local ID