Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Introduction to Modern GPU Hardware
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University Hsinchu, Taiwan
Fall, 2016
1
The following content are extracted from the material in the references on
last page. If any wrong citation or reference missing, please contact
[email protected] . I will correct the error asap.
This course used only and please do NOT broadcast. Thank you.
Outline
2
GPU Pipeline
History of GPU Hardware
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
GPU Applications
Summary
GPU Fundamentals: Graphics Pipeline
• A simplified graphics pipeline
– Note that pipe widths vary
– Many caches, FIFOs, and so on not shown
GPUCPU
ApplicationTransform
& LightRasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU
Transform
& Light
CPU
Application Rasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
• Programmable vertex processor!
• Programmable pixel processor!
Fragment
Processor
Vertex
Processor
GPUCPU
ApplicationVertex
ProcessorRasterize
Fragment
ProcessorVideo
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
Assemble
Primitives
Geometry
Processor
Programmable primitive assembly!
More flexible memory access!
History of Graphics Hardware (1/3)
6
… - mid ’90s
SGI mainframes and workstations
PC: only 2D graphics hardware
mid ’90s
Consumer 3D graphics hardware (PC)
- 3dfx, NVIDIA, Matrox, ATI, …
Triangle rasterization (only)
Cheap: pushed by game industry
1999
PC-card with TnL (Transform and Lighting)
- NVIDIA GeForce: Graphics Processing Unit (GPU)
PC-card more powerful than specialized workstations
3DFX Voodoo graphics 4MB - 1997
History of Graphics Hardware (2/3)
https://www.zhihu.com/question/21980949
History of Graphics Hardware (3/3)
8
Modern graphics hardware
Graphics pipeline partly programmable
Leaders: AMD(ATI) and NVIDIA
- “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”
Game consoles similar to GPUs (Xbox)
Computational Power (1/2)
• GPUs are fast…
– 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160):
• Computation: 48 GFLOPS peak
• Memory bandwidth: 21 GB/s peak
• Price: $874 (chip)
– NVIDIA GeForce 8800 GTX:
• Computation: 330 GFLOPS observed
• Memory bandwidth: 55.2 GB/s observed
• Price: $599 (board)
• GPUs are getting faster, faster
– CPUs: 1.4× annual growth
– GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth
Computational Power (2/2)
Courtesy Naga Govindaraju
GPU
CPU
Flops Comparison on GPU and CPU
Memory Bandwidths Comparison of CPU and GPU
Motivation
• Why are GPUs getting faster so fast?
– Arithmetic intensity
• the specialized nature of GPUs makes it easier to use additional transistors for computation
– Economics
• multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property
Flexible and Precise
• Modern GPUs are deeply programmable
– Programmable pixel, vertex, and geometry engines
– Solid high-level language support
• Modern GPUs support “real” precision
– 32-bit/64-bit floating point throughout the pipeline
• High enough for many applications
– DX10-class GPUs add 32-bit integers
Graphics Hardware Consideration (1/2)
• GPU = Graphics Processing Unit– Vector processor
– Operates on 4 tuples• Position ( x, y, z, w )
• Color ( red, green, blue, alpha )
• Texture Coordinates ( s, t, r, q )
– 4 tuple ops, 1 clock cycle• SIMD [ Single Instruction Multiple Data ]
– ADD, MUL, SUB, DIV, MADD, …
• Pipelining
– Number of stages
• Parallelism
– Number of parallel processes
• Parallelism + pipelining
– Number of parallel pipelines
1 2 3
1 2 3
1 2 3
1 2 3
1
2
3
Graphics Hardware Consideration (2/2)
Outline
17
GPU Pipeline
History of GPU Hardware
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
Summary
Growth of NVIDIA GPU
• Performance matrices
– Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.
Growth of NVIDIA GPU
NVIDIA GeForce 7900 GTX
Nvidia Graphics Card Architecture
• GeForce-8 Series– 12,288 concurrent threads, hardware managed– 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
L2
Memory
Work DistributionHost CPU
L2
Memory
L2
Memory
L2
Memory
L2
Memory
L2
Memory
NVIDIA FERMI
FERMI: Streaming Multiprocessor (SM)
• Each SM contains
• 32 Cores
• 16 Load/Store units
• 32,768 registers
• Newer FP representation
• IEEE 754-2008
• Two units
• Floating point
• Integer
FERMI: Results
FERMI: Comparison
Kepler: Core Architecturehttp://www.weistang.com/article-941-1.html
Titan vs Tesla Comparison
09/02/11
Maxwell: Core Architecturehttp://www.weistang.com/article-941-1.html
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A
B%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
Kepler vs Maxwell Comparison
09/02/11
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
09/02/11https://zh.wikipedia.org/wiki/CUDA
NVIDIA ULP-Geforce (Tegra2)
31
NVIDIA ULP-Geforce (Tegra2)
• Ultra low power (ULP) GeForce GPU with 4 pixel shaders + 4 vertex shaders
• 32-bit single-channel memory controller with either LPDDR2-600 or DDR2-667 memory
32
NVIDIA ULP-Geforce (Tegra3)
33
NVIDIA ULP-Geforce (Tegra3)
• The GPU in Tegra 3 is an evolution of the Tegra 2 GPU, with twice the number of pixel shader units (8 compared to 4) and higher clock frequency.
• 32-bit single-channel memory controller with either LPDDR2 or DDR3 memory
34
Tegra Roadmap
09/02/11
Mobile Roadmap
09/02/11
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
• Features of ATI Radeon X1900 XTX
– Core speed 650 MHz
– 48 pixel shaderprocessors
– 8 vertex shaderprocessors
– 51 GB/s memory bandwidth
– 512 MB memory
ATI Radeon X1900 XTX
http://product.pcpop.com/000024721/Index
.html
GPU
650MHzGraphics memory
½ GB
CPU
3GHzMain memory
1GB
Cach
e
½M
B
AGP bus
2GB/s
Output
Graphics CardHigh bandwidth
51GB/s
High bandwidth
77GB/s
Par
alle
l P
roce
sses
3GB/s
AGP memory
½ GB
Processor Chip
• High Memory Bandwidth
ATI Radeon X1900 XTX
• Parallelism + pipelining: ATI Radeon 9700
4 vertex pipelines 8 pixel pipelines
ATI Radeon 9700
Radeon Comparison
09/02/11http://www.pcdiy.com.tw/detail/4275
IMG PowerVR Series5XT (SGXMP)
41
IMG PowerVR Series5XT (SGXMP)
42
• Shader-driven Tile-Based Deferred Rendering (TBDR) architecture
• Fully programmable GPU using unique USSE architecture
• All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1
IMG PowerVR Series6 (Rogue)
43
IMG PowerVR Series6 (Rogue)
44
• Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality
IMG PowerVR 7XT Plus
45http://imgtec.eetrend.com/article/7130
IMG PowerVR 7XT Plus
46http://imgtec.eetrend.com/article/7130
Features of ARM Mali
47
ARM Mali-200
48
ARM Mali-300
49
ARM Mali-400MP
50
ARM Mali-450MP
51
ARM Mali-T604
52
ARM Mali-T604
• GPGPU (support OpenCL 1.1)
• Tri-pipe architecture
• The first GPU based on the Midgard architecture
• True IEEE double-precision floating-point math in hardware for Full Profile
• The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU
• 5x performance improvement over previous Mali graphics processors.
53
ARM Mali-T624
9/27/201654
ARM Mali-T678
55
ARM Mali-T678
56
• 50% performance improvement compared to the Mali-T658.
ARM Mali-T760
57
ARM Mali-T880
58
ARM Mali Comparison
59
https://zh.wikipedia.org/wiki/Mali_(GPU)
ARM Mali Comparison
60
https://zh.wikipedia.org/wiki/Mali_(GPU)
Applications (1/7)
• Includes lots of applications
– Ray-tracer
– Image segmentation
– FFT/Linear Algebra
http://graphics.stanford.edu/data/3Ds
canrep/stanford-bunny-cebal-ssh.jpg
http://f.fwallpapers.com/images/3d
-bunny.jpg
09/02/11
Applications (2/7)
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
http://5pit.tw/tech/computer/tid_12880
Applications (3/7)
09/02/11
Applications (4/7)
http://wechatinchina.com/thread-461154-1-1.html
09/02/11
Applications (5/7)https://read01.com/Pnd3D.html
09/02/11
http://wechatinchina.com/thread-461154-1-1.html
Applications (6/7)
AR and VR Applications @@
Applications (7/7)
09/02/11
http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E
conomy/publish-482.htm
Summary
68
Understand the GPU pipeline in depth
Understand the motivation of of GPU hardware
Understand modern GPU hardware architecture and
specifications
Understand GPU/GPGPU applications
Reference
69
GPU Architecture & CG, Mark Colbert, 2006
Introduction to Graphics Hardware and GPUs, Yannick Francken,
Tom Mertens
GPU Tutorial, Yiyunjin, 2007
Evolution of GPU and Graphics Pipelining, Weijun Xiao
Commercial product website (NVIDIA, ATI, IMG, ARM).
Referencing SIGGRAPH 2005 Course Notes from David Luebke
Adapted from: David Luebke (University of Virginia) and NVIDIA
Jan Verschelde, MCS 572 Lecture 27, Introduction to
Supercomputing, 17 March 2014
Acknowledgement:
Thanks for TA’s help for preparing the material.