90
GPU Programming GPU Programming Overview Overview Spring 2011 류류류

GPU Programming Overview

Embed Size (px)

DESCRIPTION

GPU Programming Overview. Spring 2011 류승택. What is a GPU?. GPU stands for G raphics P rocessing U nit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400 ). - PowerPoint PPT Presentation

Citation preview

GPU ProgrammingGPU ProgrammingOverviewOverview

GPU ProgrammingGPU ProgrammingOverviewOverview

Spring 2011 류승택

What is a GPU?

GPU stands for Graphics Processing UnitSimply – It is the processor that resides on your

graphics card.

GPUs allow us to achieve the unprecedented graphics capabilities now available in games

(Demo: NVIDIA GTX 400)

Introduction

■ GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism

• GPUs are >10x faster than CPU for appropriate problems Advantage of commodity

• GPUs are inexpensive• GPUs are Ubiquitous

Desktops, laptops, PDAs, cell phones

Achieving this speedup• Requires a large amount of GPU-specific knowledge

Motivation

■ Challenge Statement GPGPU signifies the dawn of the desktop parallel computing age

Why Program on the GPU ?

Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf

Why Program on the GPU ?

■ Compute Intel Core i7 – 4 cores – 100 GFLOP NVIDIA GTX280 – 240 cores – 1 TFLOP

■ Memory Bandwidth System Memory – 60 GB/s NVIDIA GT200 – 150 GB/s

■ Install Base Over 200 million NVIDIA G80s shipped

How did this happen?

■ Games demand advanced shading■ Fast GPUs = better shading■ Need for speed = continued innovation■ The gaming industry has overtaken the defense,

finance, oil and healthcare industries as the main driving factor for high performance processors.

NVIDIA GPU Evolution

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Real-time Rendering

■ Realtime Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per

second

3D Scene = Collection of 3D primitives

(triangles, lines, points)

Image = Array of pixels

Graphics Review

■ Modeling■ Rendering■ Animation

Graphics Review: Modeling

■ Modeling Polygons vs Triangles

• How do you store a triangle mesh? Implicit Surfaces Height maps …

Triangles

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Triangles

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.

Triangles

Triangles

Implicit Surfaces

Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html

Height Maps

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Graphics Review: Rendering

■ Rendering Goal: Assign color to pixels

■ Two Parts Visible surfaces

• What is in front of what for a given view Shading

• Simulate the interaction of material and light to produce a pixel color

Rasterization

What about ray tracing?

Visible Surfaces

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Visible Surfaces

Z-Buffer / Depth Buffer Fragment vs Pixel

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Shading

Images courtesy of A K Peters, Ltd. www.virtualglobebook.com

Shading

Image from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch14.html

Graphics Pipeline

PrimitiveAssembly

PrimitiveAssembly

VertexTransforms

VertexTransforms

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

Scissor Test Stencil Test Depth Test Blending

Graphics Pipeline

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Graphics Pipeline

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Graphics Pipeline

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Graphics Pipeline

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Graphics Review: Animation

■ Move the camera and/or agents, and re-render the scene In less than 16.6 ms (60 fps)

Evolution of the Programmable Graphics Pipeline

■ Pre GPU■ Fixed function GPU■ Programmable GPU■ Unified Shader Processors

Early 90s – Pre GPU

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

OpenGL Pipeline

OpenGL Pipeline

GPU Shader

■ Fixed functionalities■ Programmable functionalities■ Flexible memory access

Stream Program => GPU

■ A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)

Vertex Shader

■ Vertex transformation■ Once per vertex■ Input attributes

Normal Texture coordinates Colors

Geometry Shader

■ Geometry composition■ Once per geometry■ Input primitives

Points, lines, triangles Lines and triangles with adjacency

■ Output primitives Points, line strips or triangle strips [0, n] primitives outputted

Fragment Shader

■ Pre-pixel (or fragment) composition■ Once per fragment■ Operations on interpolated values

Vertex attributes User-defined varying variables

GPU Shader

Programming Graphics HardwareProgramming Graphics HardwareProgramming Graphics HardwareProgramming Graphics Hardware

PC Architecture

Bus Interface

■ ISA (Industry Standard Architecture) 버스 인터페이스 90 년대 초반의 XT, AT 시절부터 사용 이론적으로 최대 16Mbps 의 속도 주변기기에서의 병목현상은 심각

• 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음

■ PCI (Peripheral Component Interconnect) parallel connection ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 ISA 슬롯보다 크기가 작고 IRQ 공유 일반적인 32 비트 33MHz 는 133Mbps 의 속도 , 64 비트

66MHz 는 524Mbps 속도 주변 장치 대부분이 PCI 인터페이스를 사용

ISA

PCI

AGP

Bus Interface

■ AGP (Accelerated Graphics Port) Serial Connection (cheap, scalable) 인텔에 의해 개발 PCI 에 기반을 두고 있으나 전송 속도는 PCI 보다 두배 이상 빠름 기본적으로 66MHz 로 작동 AGP = 2 x PCI (AGP 2x = 2 x AGP)

• AGP 1x 방식일 경우는 최고 264Mbps• AGP 2x 방식에서는 최고 533Mbps

3D 그래픽 카드용■ PCIe (PCI Express)

Serial Connection 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운

규격을 차세대 그래픽 인터페이스로 확실하게 인정 기존 PCI 의 제한 때문에 탄생한 그래픽 프로세싱 유닛 (GPUs) 에 독보적

존재였던 AGP 가 PCI Express 로 대체되고 있는 상황

PCI

PCIe x1

PCIe x16

GeForce 7800 GTX (PCIe x16)

Generation I: 3dfx Voodoo (1996)

• One of the first true 3D game cards

• Worked by supplementing standard 2D video card.

• Did not do vertex transformations: these were done in the CPU

• Did do texture mapping, z-buffering.

PrimitiveAssembly

PrimitiveAssembly

VertexTransforms

VertexTransforms

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

CPU GPUPCI

Image from “7 years of Graphics”

1995-1998: Texture Mapping and Z-Buffer

- PCI: Peripheral Component Interconnect- 3dfx’s Voodoo

Texture Mapping

Texture Mapping: Perspective-Correct Interpolation

Texture Mapping: Perspective-Correct Interpolation

Aside: Mario Kart 64

■ High fragment load / low vertex load

Image from: http://www.gamespot.com/users/my_shoe/

Aside: Mario Kart Wii

■ High fragment load / low vertex load?

Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/

VertexTransforms

VertexTransforms

Generation II: GeForce/Radeon 7500 (1998)

• Main innovation: shifting the transformation and lighting calculations to the GPU

• Allowed multi-texturing: giving bump maps, light maps, and others..

• Faster AGP bus instead of PCI

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Image from “7 years of Graphics”

1998: Multitexturing

- AGP: Accelerated Graphics Port- NVIDIA’s TNT, ATI’s Rage

Multitexturing

Light Mapping

1999-2000: Transform and Lighting

- Register Combiner: Offer many more texture/color combinations- NVIDIA’s Geforce 256 and Geforce2, ATI’s Radeon 7500)

Bump Mapping

Environment Mapping

Environment Mapping

Projective Texture Mapping

VertexTransforms

VertexTransforms

Generation III: GeForce3/Radeon 8500(2001)

• For the first time, allowed limited amount of programmability in the vertex pipeline

• Also allowed volume texturing and multi-sampling (for antialiasing)

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Small vertexshaders

Small vertexshaders

Image from “7 years of Graphics”

2001: Programmable Vertex Shader

- Z-Cull: Predicts which fragments will fail the Z test and discard them- Texture Shader: Offer more texture addressing and operations- NVIDIA’s Geforce3 and Geforce4 Ti, ATI’s Radeon 8500

A programmable processor for any per-vertex computation

Volume Texture Mapping

VertexTransforms

VertexTransforms

Generation IV: Radeon 9700/GeForce FX (2002)

• This generation is the first generation of fully-programmable graphics cards

• Different versions have different resource limits on fragment/vertex programs

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

AGPProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory

Image from “7 years of Graphics”

Slide from Suresh Venkatasubramanian and Joe Kider

2002-2003: Programmable Pixel Shader

- MRT: Multiple Render Target- NVIDIA’s Geforce FX, ATI’s Radeon 9600 to 9800

A programmable processorfor any per-pixel computation

Shader: Static vs. Dynamic flow control

■ Static flow control Condition varies per batch of triangles

■ Dynamic flow control Condition varies per vertex or pixel

■ Full flow control Static and dynamic flow control

Generation IV.V: GeForce6/X800 (2004)■ Simultaneous rendering to multiple

buffers■ True conditionals and loops ■ PCIe bus■ Vertex texture fetch

VertexTransforms

VertexTransforms

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

PCIeProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory Texture Memory

2004: Shader Model 3.0 and 64 bit Color Support

- PCIe: Peripheral Component Interconnect Express- NVIDIA’s Geforce 6800

Real-time Tone Mapping

■ The image is entirely computed in 64-bit color and tone-mapped for display 64-bit color 16 bit floating-point value per channel (R, G, B, A) Tone Mapping

• HDRI(High Dynamic Range Image) low dynamic range device

From low to high exposure image of the same scene

Generation V: GeForce8800/HD2900 (2006)

Input Assembler

Input Assembler

ProgrammablePixel

Shader

ProgrammablePixel

Shader

RasterOperations

ProgrammableGeometry

Shader

PCIe

ProgrammableVertex shader

ProgrammableVertex shader

OutputMerger

Ground-up GPU redesign Support for Direct3D 10 / OpenGL 3 Geometry Shaders Stream out / transform-feedback Unified shader processors Support for General GPU

programming

Geometry Shaders: Point Sprites

Geometry Shaders: Point Sprites

Geometry Shaders

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

NVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Why Unify Shader Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Why Unify Shader Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Unified Shader Processors

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Terminology

Shader Model

Direct3D OpenGL Video card

Example

2 9 2.x NVIDIA GeForce 6800

ATI Radeon X800

3 10.x 3.x NVIDIA GeForce 8800

ATI Radeon HD 2900

4 11.x 4.x NVIDIA GeForce GTX 480

ATI Radeon HD 5870

Evolution of the Programmable Graphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

Evolution of the Programmable Graphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Tra

nsf

orm

ed

Vert

ices

ProgrammableVertex

Processor

ProgrammableVertex

Processor

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

Pre

-transfo

rmed

Vertice

s

Pre

-transfo

rmed

Fragm

en

ts

Tra

nsf

orm

ed

Fragm

en

ts

GPU

Com

mand &

Data

Stre

am

CPU-GPU Boundary (AGP/PCIe)

Fixed-function pipeline

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Tra

nsf

orm

ed

Vert

ices

ProgrammableVertex

Processor

ProgrammableVertex

Processor

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

Pre

-transfo

rmed

Vertice

s

Pre

-transfo

rmed

Fragm

en

ts

Tra

nsf

orm

ed

Fragm

en

ts

GPU

Com

mand &

Data

Stre

am

CPU-GPU Boundary (AGP/PCIe)

Programmable pipeline

The Future

■ Unified general programming model at primitive, vertex and pixel levels

■ Scary amount of: Floating point horsepower Video memory Bandwidth b/w system and video memory

■ Lower chip costs and power requirements to make 3D graphics hardware ubiquitous Automotive (gaming, navigation, head-up displays) Home (remotes, media center, automation) Mobile (PDAs, cell phones)

Programming the GPUProgramming the GPUProgramming the GPUProgramming the GPU

The Evolution of GPU Programming Language

Programmable Pipeline

Programmable Pipeline

GPU Programming

■ GPU Programming Low-level Language

• Assembler-like• best performance• Platform-dependent• Vertex programming, Fragment programming• Ex) OpenGL extensions, Direct 9

High-level shading language• Easier programming• Easier code reuse• Easier debugging • Easy to read• Ex) Cg, HLSL, GLSL

Assembly vs. High-Level Language

Data Flow through Pipeline

GPU Programming

■ GPU Programming Low-level Language

• OpenGL extensions GL_ARB_vertex_program, GL_ARB_fragment_program

• Direct 9 Vertex Shader 2.0, Pixel Shader 2.0

High-level shading language• Cg

“C for Graphics” By Nvidia

• HLSL “High-Level Shading Language”, Part of DirectX 9

(Microsoft)

• GLSL “OpenGL 2.0 Shading Language”, Proposal by 3D Labs

HLSL and Cg are much more similar to each other than they are to GLSL

Workflow in Cg

Reference

■ Reference David Luebke , General-Purpose Computation on Graphics Hardware Daniel Weiskopf, Basic of GPU-Based Programming Cyril Zeller, Introduction to the Hardware Graphics Pipeline Randy Fernando, Programming the GPU Suresh Venkatasubramanian, GPU Programming and Architecture GPGPU (http://www.gpgpu.org/) GPU Programming

http://euclid.uits.iupui.edu/wiki/index.php/GPU_Programming Shader::Tech http://www.shadertech.com/ Nvidia Developer

http://developer.nvidia.com/object/gpu_programming_guide.html GPGPU DEVELOPER RESOURCES CIS 665: GPU Programming and Architecture : University of Pennsylvania

http://www.seas.upenn.edu/~cis665/Schedule.htm