Download ppt - OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2

OpenCL

Peter Holvenstot

OpenCL

Designed as an API and language specification

Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2

Manufacturers release their own SDK and drivers

Major backers: Apple, AMD/ATI, Intel

OpenCL

Alternative to CUDA

Not limited to ATI GPUs

Designed for “heterogenous computing”

Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs

OpenCL

Similar structure of host programs and kernels

Set of compute devices is called a 'context'

Kernels executed by 'processing elements'

Kernels can be compiled at run-time or build-time

OpenCL

Task Parallelism – many kernels running at once

OpenCL 1.2 – device can be partitioned down to single Compute Unit

Built-in kernels for device-specific functionality

Advantages

Same code can be run on different devices Can also be run on NVIDIA GPUs!

AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units)

Limited library of portable math routines Most common BLAST and FFT routines

Performance

Performance

Performance

Disadvantages

No “official” implementation

Vendors may meet specs or add restrictions Apple adds restrictions on group size

Devices need appropriate settings to perform well Different capabilities → different performance Solution: Tuning/load balancing framework

Non-Optimized Performance

Non-Optimized Performance

Restrictions

No recursion, variadics, or function pointer

Cannot dynamically allocate memory from device

No native variable-length arrays, double-precision

Some can be worked around by extensions

Terminology

CUDA: Scalar Core Streaming Multiprocssr Warp PTX

OpenCL: Stream Core Compute Unit Wavefront Intermediate Language

Terminology

CUDA: Host Memory Global/Device Memory Local Memory Constant Memory Shared Memory Registers

OpenCL: Host Memory Global Memory Global Memory Constant Memory Local Memory Private Memory

Terminology

CUDA: Grid Block Thread Thread ID Block Index Thread Index

OpenCL: NDRange Work group Work item Global ID Block ID Local ID

References

http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf

https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf

http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html

http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf

http://www.netlib.org/lapack/lawnspdf/lawn228.pdf










http://www.netlib.org/lapack/lawnspdf/lawn228.pdf