CUDA (body)

Embed Size (px)

Citation preview

  • 8/3/2019 CUDA (body)

    1/12

    Compute Unified Device Architecture

    Department of Computer Science Page 3

    ACKNOWLEDGEMENT

    The satisfaction that accompanies the successful completion of any task would be incomplete without

    mentioning the people who made it possible and whose constant encouragement and guidance has been

    a source of inspiration throughout the course.

    I express my gratitude to Dr. H. C. Nagaraj, Principal for his continuous efforts in creating a

    competitive environment in college and encouragement through this course.

    I express my gratitude to D.r. Nalini N, H.O.D, Department of Computer Science and Engineering for

    her help and presence.

    I would like to express my deep sense of gratitude to our guide M.s Prathibha Ballal, Assistant

    Professor, CSE for providing constant support and motivation in the class as well as outside it

    throughout, without which the seminar would not be a reality.

    I am thankful to the entire Department Of Computer Science and Engineering for its co-operation and

    suggestions.

    Also I thank all my friends who have helped me and have proved to be a constant source of support.

    Kamal Datta

  • 8/3/2019 CUDA (body)

    2/12

    Compute Unified Device Architecture

    Department of Computer Science Page 4

    Table of Contents

    S No. Description Page No1. Introduction 52. CUDA (Compute Unified Device Architecture) 63. CPU (Central Processing Unit) and GPU(Graphical Processing unit) 84. Comparison (CPU vs. GPU) 105. DirectX support and latest GPUs 126. Conclusion 137. References 14

  • 8/3/2019 CUDA (body)

    3/12

    Compute Unified Device Architecture

    Department of Computer Science Page 5

    CHAPTER1

    INTRODUCTION

    If you work with a lot of complex programs on your PC, you need a lot of processing capacity.

    For the longest time, you were forced to spend on more powerful CPUs to get better performance.

    However, the GPU (Graphics Processing Unit), which can be found on the graphics card, can now

    offload that work. Games demand complex processes that need to be carried out in real-time, which

    means that a GPU often has to process more than the CPU. Thus, graphics cards are clocked at up to

    1000 MHz, have super-fast memory, and up to 2 GB of dedicated RAM. You can hardly ask for a better

    co-processor in the system.

    Clever programmers, supported by GPU manufacturers, came up with the idea to use the

    processing capacity of graphics cards in a different way: for video processing, flow simulations or

    market price predictions. Three years ago, NVidia developed CUDA (Compute Unified Device

    Architecture), a programming environment through which some program processes can be run on the

    graphics chip. Only NVidia chips starting from the GeForce 8000 series onwards supported this. Their

    competitor AMD supports the general standard OpenCL, pioneered by a company called the Khronos

    Group (which NVidia now also supports), using which you can share your programs workloads on

    OpenCL-compatible processors (CPU and GPU). Even Microsoft approved of this development,

    equipping the new DirectX 11 instructions set with a new interface (Direct Compute), using which, youcan run program processes on the GPU.

  • 8/3/2019 CUDA (body)

    4/12

    Compute Unified Device Architecture

    Department of Computer Science Page 6

    CHAPTER2

    CUDA (Compute Unified Device Architecture)

    CUDA is a parallel computing platform and programming model invented by NVIDIA. Itenables dramatic increases in computing performance by harnessing the power of the graphics

    processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers,

    scientists and researchers are finding broad-ranging uses for GPU computing with CUDA. Here are a

    few examples:

    1. Identify hidden plaque in arteries: Heart attacks are the leading cause of death worldwide.Harvard Engineering, Harvard Medical School and Brigham & Women's Hospital have teamed

    up to use GPUs to simulate blood flow and identify hidden arterial plaque without invasive

    imaging techniques or exploratory surgery.

    2. Analyze air traffic flow: The National Airspace System manages the nationwide coordination ofair traffic flow. Computer models help identify new ways to alleviate congestion and keep

    airplane traffic moving efficiently. Using the computational power of GPUs, a team at NASA

    obtained a large performance gain, reducing analysis time from ten minutes to three seconds.

    3. Visualize molecules: A molecular simulation called NAMD (Nano scale molecular dynamics)gets a large performance boost with GPUs. The speed-up is a result of the parallel architecture of

    GPUs, which enables NAMD developers to port compute-intensive portions of the application tothe GPU using the CUDA Toolkit.

    You're faced with imperatives: Improve performance. Solve a problem more quickly. Parallel

    processing would be faster, but the learning curve is steepisn't it? Not anymore. With CUDA, you can

    send C, C++ and FORTRAN code straight to GPU, no assembly language required. Developers at

    companies such as Adobe, ANSYS, Autodesk, Math Works and Wolfram Research are waking that

    sleeping giantthe GPU -- to do general-purpose scientific and engineering computing across a range

    of platforms. Using high-level languages, GPU-accelerated applications run the sequential part of their

    workload on the CPUwhich is optimized for single-threaded performancewhile accelerating parallel

    processing on the GPU. This is called GPU computing.

  • 8/3/2019 CUDA (body)

    5/12

    Compute Unified Device Architecture

    Department of Computer Science Page 7

    GPU computing is possible because today's GPU does much more than render graphics: It sizzles

    with a teraflop of floating point performance and crunches application tasks designed for anything from

    finance to medicine. CUDA is widely deployed through thousands of applications and published

    research papers and supported by an installed base of over 300 million CUDA-enabled GPUs in

    notebooks, workstations, compute clusters and supercomputers.

  • 8/3/2019 CUDA (body)

    6/12

    Compute Unified Device Architecture

    Department of Computer Science Page 8

    CHAPTER3

    CPU (Central processing Unit) and GPU (Graphical Processing Unit)

    CPU:

    GPUs and CPUs function very differently, and handle different types of processes better. Current

    CPUs have up to four cores, with maybe double that number in the form of virtual cores via Hyper

    Threading (to use unused processing capacity) to get up to eight processing cores per CPU. Six core/12

    thread CPUs will become common soon. This means a large number of processing threads can run in

    parallel. CPU cores are developed for general tasks, so they are flexible and can cope with varied

    situations. They are also designed to accommodate completely

    Different threads in every single clock tick on every single core.

    Hyper Threading:

    Hyper Threading is a form of simultaneous multithreading .It is used to improve the

    parallelization of computation (doing multiple tasks at once). For each processor core that is physically

    present, the operating system addresses two virtual processors, and shares the workload between them

    when possible. Hyper threading works by duplicating certain sections of the processors those that store

    the architectural state (architectural state is the part of the CPU which holds the state of the process

    which includes Control registers and general purpose registers) but not duplicating the main execution

    resources. This allows hyper threading processor to appear as two logical processors to host the

    operating system. When execution resources would not be used by the current task in a processor

    without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped

    processor can use those execution resources to execute another scheduled task.

  • 8/3/2019 CUDA (body)

    7/12

    Compute Unified Device Architecture

    Department of Computer Science Page 9

    Graphical Processing Unit:

    Today, graphics chips offer up to 240 cores; 40 times greater than modern CPUs, even though

    they are conceptually different and cannot handle many different types of tasks. The Radeon 5000 seriesby ATI has up to 1,600 processing units. These cores (also known as stream processors in GPUs) take

    on one thread each, but are packed into clusters that can only make use of one processing operation on

    the threads they handle. A GPU is therefore unsuitable for complex tasks which require multiple

    different types of processing.

    However, many programs demand only one operation, such as counting the number of times a

    particular word has appeared in a book. It is exactly here that a GPU can show of its awesome total

    processing power. The CPU has to start on page 1, go through the text word by word, and stop at the last

    page. A GPU, on the other hand, divides the book into many small parts, distributes them to all its cores,

    and then simply counts the appearance of the word in a fraction of the time. The actual real-worldprocesses that best use this capability are found in video and scientific editing work. There are no book

    pages, but instead repeated additions and multiplications of floating point numbers in big matrices;

    always the exact same operation carried out for thousands of numbers.

    If a program demands numerous types of processes, a GPU cannot keep up because of slower

    clock rates and restrictions in individual processing steps. The flexible cores of the CPU have a strong

    advantage at this point. However, if the processes and data packets are very similar, the GPU cores,

    which are arranged in parallel masses, get cracking with spectacular results.

    The specialization of GPU core design is, however, not the main limitation when

    programming software. The greatest difficulty is posed by parallelism. It must be possible to divide a

    program into at least 240 parts (or threads) to be able to use 240 cores. These must all be completely

    independent of each other so that one thread can be processed parallel to another. In the end, you do not

    know which thread is processed when, so even the sequence needs to be irrelevant. Current CPUs also

    experience the same problem, but you have only eight threads to grapple with, not 240 or more.

    Many programs have problems with eight threads as well, and its true that there is hardly any

    software today that makes full use of a current CPU. Many programs cannot be parallelized, or it is

    extremely difficult to do so. The reason is the compilers, which are tools that develop working softwarefrom a program code, which basically need to analyze the code and find elements that can be

    parallelized. However, this automatic mechanism often fails. If the input of a process depends on the

    output of another process and the location at which either of the processes takes place is not certain,

    automating a sequence of processes fails.

  • 8/3/2019 CUDA (body)

    8/12

    Compute Unified Device Architecture

    Department of Computer Science Page 10

    CHAPTER4

    Comparison (CPU vs. GPU)

    Fig4.1

    Utilizing the graphics card power can result in massive speed advantages. If you have an ATI graphics

    card of the HD 3000 or HD 4000 series, or an NVidia graphics card from the GeForce 8000 series

    onwards, your PC is ready for a massive performance leap. You can check whether your NVidia

    graphics card is CUDA compatible with the Cuda-Z tool (http://s ourceforge.net / project s/cu d a-z / ).

    You only need suitable software now. Not many free tools are available at present, which benefit from

    the power of a graphics card, but you can achieve a massive improvement in the performance by usingpaid ones. Even multimedia tools, especially video and photo editors, in which the same process is

    carried out on a lot of independent data, are ideal for running on a GPU. A majority of the programs

    which benefit from the power of graphics cards can be found within the scope of video conversion and

    processing.

  • 8/3/2019 CUDA (body)

    9/12

    Compute Unified Device Architecture

    Department of Computer Science Page 11

    Fig4.2

    If the requirements are different (i.e. different kinds of tasks to be performed) any dual core processor

    would process two tasks parallel. (run two threads simultaneously) using hyper-threading technology ,

    whereas in case of a graphical processing unit, it is incapable of processing two threads simultaneously.

    The GPU cannot work parallel in case of complex problems and processes the tasks individually at a

    much slower speed.

    Fig 4.3

    When the tasks to be done are identical then the GPU (graphical processing unit) has an advantage. The

    dual core CPU will process only two tasks parallel even in case of identical packets. The slower clocked

    GPU displays its strength with similar data. Eight shaders would run in parallel in a Graphical

    processing unit.

  • 8/3/2019 CUDA (body)

    10/12

    Compute Unified Device Architecture

    Department of Computer Science Page 12

    CHAPTER5

    DirectX Support and Latest GPUs

    The power of a graphics card is far from exhausted though. NVidias CUDA supports only one

    graphics card right now, but combining the power of several in SLI configurations is in the offing .

    DirectX 11 also makes GPUs the secondary processing units. Microsoft has loaded its latest graphics

    API with the Direct Compute programming interface. Direct Compute can be used to write programs

    which use the power of the graphics cards unified shaders as independent compute units. These

    programmable execution units can already be used as pixel, vertex and geometry shaders, but now they

    will be able to handle functions outside of gaming and graphics. Since DirectX has been the most

    popular programming interface for games since its inception in 1995, it is obviously assumed that Direct

    Compute will garner the same fan base amongst programmers. The basic advantage here is that it no

    longer makes a difference whether you have an ATI, NVidia or Intel graphics card.

  • 8/3/2019 CUDA (body)

    11/12

    Compute Unified Device Architecture

    Department of Computer Science Page 13

    CHAPTER6

    Conclusion

    Last, but not the least, the world is not going to stick with GPUs in their traditional roles only. For a long

    time, Intel has been planning its Larrabee project, a processor with several GPU-inspired cores, all ofwhich are as flexible as those in a CPU, and which is said to process graphics as well as programs.

    AMDs Fusion initiative is also headed in a similar direction. And both ATI and NVidia already have

    plug-in cards, named FireStream and Tesla respectively, which are certainly built with graphics

    processors, but are not intended for 3D applications at all (they do not even have video outputs). This is

    the start of an era of co-processing. The CPU remains a vital part of the computer, but specialized

    processors will take up other tasks, in the form of plug-in cards, additional chips on the motherboard,

    and eventually, integrated into the CPU

  • 8/3/2019 CUDA (body)

    12/12

    Compute Unified Device Architecture

    Department of Computer Science Page 14

    CHAPTER7

    References

    CHIP ( April 2010) http://www.nvidia.com http://en.wikipedia.org/wiki/CUDA developer.nvidia.com http://www.cuda.com

    http://www.nvidia.com/http://www.nvidia.com/http://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://www.nvidia.com/