CUDA (body)

8/3/2019 CUDA (body)

1/12

Compute Unified Device Architecture

Department of Computer Science Page 3

ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of any task would be incomplete without

mentioning the people who made it possible and whose constant encouragement and guidance has been

a source of inspiration throughout the course.

I express my gratitude to Dr. H. C. Nagaraj, Principal for his continuous efforts in creating a

competitive environment in college and encouragement through this course.

I express my gratitude to D.r. Nalini N, H.O.D, Department of Computer Science and Engineering for

her help and presence.

I would like to express my deep sense of gratitude to our guide M.s Prathibha Ballal, Assistant

Professor, CSE for providing constant support and motivation in the class as well as outside it

throughout, without which the seminar would not be a reality.

I am thankful to the entire Department Of Computer Science and Engineering for its co-operation and

suggestions.

Also I thank all my friends who have helped me and have proved to be a constant source of support.

Kamal Datta


2/12



Table of Contents

S No. Description Page No1. Introduction 52. CUDA (Compute Unified Device Architecture) 63. CPU (Central Processing Unit) and GPU(Graphical Processing unit) 84. Comparison (CPU vs. GPU) 105. DirectX support and latest GPUs 126. Conclusion 137. References 14


3/12



CHAPTER1

INTRODUCTION

If you work with a lot of complex programs on your PC, you need a lot of processing capacity.

For the longest time, you were forced to spend on more powerful CPUs to get better performance.

However, the GPU (Graphics Processing Unit), which can be found on the graphics card, can now

offload that work. Games demand complex processes that need to be carried out in real-time, which

means that a GPU often has to process more than the CPU. Thus, graphics cards are clocked at up to

1000 MHz, have super-fast memory, and up to 2 GB of dedicated RAM. You can hardly ask for a better

co-processor in the system.

Clever programmers, supported by GPU manufacturers, came up with the idea to use the

processing capacity of graphics cards in a different way: for video processing, flow simulations or

market price predictions. Three years ago, NVidia developed CUDA (Compute Unified Device

Architecture), a programming environment through which some program processes can be run on the

graphics chip. Only NVidia chips starting from the GeForce 8000 series onwards supported this. Their

competitor AMD supports the general standard OpenCL, pioneered by a company called the Khronos

Group (which NVidia now also supports), using which you can share your programs workloads on

OpenCL-compatible processors (CPU and GPU). Even Microsoft approved of this development,

equipping the new DirectX 11 instructions set with a new interface (Direct Compute), using which, youcan run program processes on the GPU.


4/12



CHAPTER2

CUDA (Compute Unified Device Architecture)

CUDA is a parallel computing platform and programming model invented by NVIDIA. Itenables dramatic increases in computing performance by harnessing the power of the graphics

processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers,

scientists and researchers are finding broad-ranging uses for GPU computing with CUDA. Here are a

few examples:

1. Identify hidden plaque in arteries: Heart attacks are the leading cause of death worldwide.Harvard Engineering, Harvard Medical School and Brigham & Women's Hospital have teamed

up to use GPUs to simulate blood flow and identify hidden arterial plaque without invasive

imaging techniques or exploratory surgery.

2. Analyze air traffic flow: The National Airspace System manages the nationwide coordination ofair traffic flow. Computer models help identify new ways to alleviate congestion and keep

airplane traffic moving efficiently. Using the computational power of GPUs, a team at NASA

obtained a large performance gain, reducing analysis time from ten minutes to three seconds.

3. Visualize molecules: A molecular simulation called NAMD (Nano scale molecular dynamics)gets a large performance boost with GPUs. The speed-up is a result of the parallel architecture of

GPUs, which enables NAMD developers to port compute-intensive portions of the application tothe GPU using the CUDA Toolkit.

You're faced with imperatives: Improve performance. Solve a problem more quickly. Parallel

processing would be faster, but the learning curve is steepisn't it? Not anymore. With CUDA, you can

send C, C++ and FORTRAN code straight to GPU, no assembly language required. Developers at

companies such as Adobe, ANSYS, Autodesk, Math Works and Wolfram Research are waking that

sleeping giantthe GPU -- to do general-purpose scientific and engineering computing across a range

of platforms. Using high-level languages, GPU-accelerated applications run the sequential part of their

workload on the CPUwhich is optimized for single-threaded performancewhile accelerating parallel

processing on the GPU. This is called GPU computing.


5/12



GPU computing is possible because today's GPU does much more than render graphics: It sizzles

with a teraflop of floating point performance and crunches application tasks designed for anything from

finance to medicine. CUDA is widely deployed through thousands of applications and published

research papers and supported by an installed base of over 300 million CUDA-enabled GPUs in

notebooks, workstations, compute clusters and supercomputers.


6/12



CHAPTER3

CPU (Central processing Unit) and GPU (Graphical Processing Unit)

CPU:

GPUs and CPUs function very differently, and handle different types of processes better. Current

CPUs have up to four cores, with maybe double that number in the form of virtual cores via Hyper

Threading (to use unused processing capacity) to get up to eight processing cores per CPU. Six core/12

thread CPUs will become common soon. This means a large number of processing threads can run in

parallel. CPU cores are developed for general tasks, so they are flexible and can cope with varied

situations. They are also designed to accommodate completely

Different threads in every single clock tick on every single core.

Hyper Threading:

Hyper Threading is a form of simultaneous multithreading .It is used to improve the

parallelization of computation (doing multiple tasks at once). For each processor core that is physically

present, the operating system addresses two virtual processors, and shares the workload between them

when possible. Hyper threading works by duplicating certain sections of the processors those that store

the architectural state (architectural state is the part of the CPU which holds the state of the process

which includes Control registers and general purpose registers) but not duplicating the main execution

resources. This allows hyper threading processor to appear as two logical processors to host the

operating system. When execution resources would not be used by the current task in a processor

without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped

processor can use those execution resources to execute another scheduled task.


7/12



Graphical Processing Unit:

Today, graphics chips offer up to 240 cores; 40 times greater than modern CPUs, even though

they are conceptually different and cannot handle many different types of tasks. The Radeon 5000 seriesby ATI has up to 1,600 processing units. These cores (also known as stream processors in GPUs) take

on one thread each, but are packed into clusters that can only make use of one processing operation on

the threads they handle. A GPU is therefore unsuitable for complex tasks which require multiple

different types of processing.

However, many programs demand only one operation, such as counting the number of times a

particular word has appeared in a book. It is exactly here that a GPU can show of its awesome total

processing power. The CPU has to start on page 1, go through the text word by word, and stop at the last

page. A GPU, on the other hand, divides the book into many small parts, distributes them to all its cores,

and then simply counts the appearance of the word in a fraction of the time. The actual real-worldprocesses that best use this capability are found in video and scientific editing work. There are no book

pages, but instead repeated additions and multiplications of floating point numbers in big matrices;

always the exact same operation carried out for thousands of numbers.

If a program demands numerous types of processes, a GPU cannot keep up because of slower

clock rates and restrictions in individual processing steps. The flexible cores of the CPU have a strong

advantage at this point. However, if the processes and data packets are very similar, the GPU cores,

which are arranged in parallel masses, get cracking with spectacular results.

The specialization of GPU core design is, however, not the main limitation when

programming software. The greatest difficulty is posed by parallelism. It must be possible to divide a

program into at least 240 parts (or threads) to be able to use 240 cores. These must all be completely

independent of each other so that one thread can be processed parallel to another. In the end, you do not

know which thread is processed when, so even the sequence needs to be irrelevant. Current CPUs also

experience the same problem, but you have only eight threads to grapple with, not 240 or more.

Many programs have problems with eight threads as well, and its true that there is hardly any

software today that makes full use of a current CPU. Many programs cannot be parallelized, or it is

extremely difficult to do so. The reason is the compilers, which are tools that develop working softwarefrom a program code, which basically need to analyze the code and find elements that can be

parallelized. However, this automatic mechanism often fails. If the input of a process depends on the

output of another process and the location at which either of the processes takes place is not certain,

automating a sequence of processes fails.


8/12



CHAPTER4

Comparison (CPU vs. GPU)

Fig4.1

Utilizing the graphics card power can result in massive speed advantages. If you have an ATI graphics

card of the HD 3000 or HD 4000 series, or an NVidia graphics card from the GeForce 8000 series

onwards, your PC is ready for a massive performance leap. You can check whether your NVidia

graphics card is CUDA compatible with the Cuda-Z tool (http://s ourceforge.net / project s/cu d a-z / ).

You only need suitable software now. Not many free tools are available at present, which benefit from

the power of a graphics card, but you can achieve a massive improvement in the performance by usingpaid ones. Even multimedia tools, especially video and photo editors, in which the same process is

carried out on a lot of independent data, are ideal for running on a GPU. A majority of the programs

which benefit from the power of graphics cards can be found within the scope of video conversion and

processing.


9/12



Fig4.2

If the requirements are different (i.e. different kinds of tasks to be performed) any dual core processor

would process two tasks parallel. (run two threads simultaneously) using hyper-threading technology ,

whereas in case of a graphical processing unit, it is incapable of processing two threads simultaneously.

The GPU cannot work parallel in case of complex problems and processes the tasks individually at a

much slower speed.

Fig 4.3

When the tasks to be done are identical then the GPU (graphical processing unit) has an advantage. The

dual core CPU will process only two tasks parallel even in case of identical packets. The slower clocked

GPU displays its strength with similar data. Eight shaders would run in parallel in a Graphical

processing unit.


10/12



CHAPTER5

DirectX Support and Latest GPUs

The power of a graphics card is far from exhausted though. NVidias CUDA supports only one

graphics card right now, but combining the power of several in SLI configurations is in the offing .

DirectX 11 also makes GPUs the secondary processing units. Microsoft has loaded its latest graphics

API with the Direct Compute programming interface. Direct Compute can be used to write programs

which use the power of the graphics cards unified shaders as independent compute units. These

programmable execution units can already be used as pixel, vertex and geometry shaders, but now they

will be able to handle functions outside of gaming and graphics. Since DirectX has been the most

popular programming interface for games since its inception in 1995, it is obviously assumed that Direct

Compute will garner the same fan base amongst programmers. The basic advantage here is that it no

longer makes a difference whether you have an ATI, NVidia or Intel graphics card.


11/12



CHAPTER6

Conclusion

Last, but not the least, the world is not going to stick with GPUs in their traditional roles only. For a long

time, Intel has been planning its Larrabee project, a processor with several GPU-inspired cores, all ofwhich are as flexible as those in a CPU, and which is said to process graphics as well as programs.

AMDs Fusion initiative is also headed in a similar direction. And both ATI and NVidia already have

plug-in cards, named FireStream and Tesla respectively, which are certainly built with graphics

processors, but are not intended for 3D applications at all (they do not even have video outputs). This is

the start of an era of co-processing. The CPU remains a vital part of the computer, but specialized

processors will take up other tasks, in the form of plug-in cards, additional chips on the motherboard,

and eventually, integrated into the CPU


12/12



CHAPTER7

References

CHIP ( April 2010) http://www.nvidia.com http://en.wikipedia.org/wiki/CUDA developer.nvidia.com http://www.cuda.com
http://www.nvidia.com/http://www.nvidia.com/http://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://www.nvidia.com/

Documents

CUDA (body)