Upload
lenhan
View
236
Download
0
Embed Size (px)
Citation preview
Inspur AI Computing Platform
3 GPU Server
4 GPU Server
8 GPU Server
16 GPU Server
NF5280M4 (2CPU + 3 GPU)
NF5280M5(2 CPU + 4 GPU)
GPU Node (2U 4 GPU Only)
NF5288M5 (2 CPU + 8 GPU)
SR GPU BOX (16 P40 GPU Only)
• CSP
Deep Learning
Real-time transcoding
VOD Server
• HPC
Heterogeneous Computing
HPC Cluster
• CSP
Multi GPU, GPU P2P, RDMA supports
High capacity local storage
High performance-price ratio
• HPC
Multi GPU/MIC, GPU P2P, RDMA supports
100G Ethernet, InfiniBand ext.
GPU Server target market
Purpose H/W Requirements
• NVLink is a high-speed interconnect that replaces PCI Express to provide up to 12X faster data sharing between GPUs.
Better support on
GPUDirect2.0
Unified Memory Pool
3
NVIDIA P100 (Pascal)
NVLink High-speed interconnect
NVIDIA GPUDirect Peer-to-Peer (P2P)
Communication
GPUDirect Advantages
• Accelerated communication with network and storage devices
• Peer-to-Peer Transfers between GPUs
• Peer-to-Peer memory access
• RDMA
NF5288M4 NF5288M5
CPU1. 2x Intel® Xeon® Processors E5-2600V4 series
2. TDP up to 145W
1. 2 x SKL-EP CPU
2. TDP up to 165W
Memory16× DDR4 DIMM per node
support DDR4-2400 memory
16× DDR4 DIMM per node
4×Apache Pass supports
PCIE I/O8 x PCI-E3.0 x16 (x8 link) or 4 x PCI-E3.0 x16 (x16
link) and 1 x PCI-E3.0 x24
1 x PCIe 3.0 x8 mezzanine RAID
2 x PCIe 3.0 x16 HHHL Front slot
Storage Support 8 x 3.5’’/2.5’’ SAS/SATA/SSD
Storage Controller:
8 x 2.5” U.2 (8639)
2 x M.2 PCIe & SATA on board
GPU
SupportSupporting up to 4 x GPU and MIC Accelerator Cards Up to 8 x 300W GPU/SXM2
System Fan Redundant Hot swap System Fan, Air CoolingRedundant Hot swap System Fan, Air cooling or
Hybrid cooling.
PSU 2x 1620/2000W PSU 80plus Platinum 2x 3000W PSU 80plus Titanium
NF5288M4 VS NF5288M5
• 2U GPU server for HPC and Machine Learning
• 2 x SKL EP Processors, TDP up to 165W, Support SKL-F SKUs
• Support 8 × Xeon Phi/GPU in a 2U chassis
• Both PCIe AIC & SXM2 GPU are supported
• 8 × 2.5”U.2 storage bays
• GPU TDP up to 300W
• 3000W 1+1 PSU Titanium
SXM2 Configuration PCIe AIC configuration
2017.04
2017.08
Sample
MP
NF5288M5 GPU server for Purley
Front view
Rear view
8×2.5 U.2
2×PCIe×16 HHHL slot 2×3000w PSU
4×10G Ethernet
2× C20 Power connector4×PCIe×16 HHHL slot(only for SXM2 configuration)
Rear I/O
Front I/O
NF5288M5 GPU server for Purley
8×SXM2 NVIDIA GPU
4×PCIe×16 HHHL slot Liquid cooling connector(Optional)
2×Skylake, 165W TDP
5×Redundant dual rotor fan
16×DDR4 2400 DIMM
2×Front PCIe×16 expansion
SXM2 NVIDIA GPU
899.5mm
SXM2 GPU configuration
CPU0 CPU1RAID UPIFront PCIe x16
PCIe switch96-lane
Rear PCIe x16
8x U.2
Rear PCIe x16
Front PCIe x16
PCIe switch96-lane
PCIe switch96-lane
GPU0 GPU2
GPU1 GPU3
GPU4 GPU6
GPU5 GPU7
PCIe
NVLINK
Rear PCIe x16
Rear PCIe x16
8 SXM2 GPU Topologic on NF5288M5
PCIe AIC GPU configuration
8×PCIe dual slot PCIe×16For XEON Phi/GPU
2×Skylake, 165W TDP
5×Redundant dual rotor fan
16×DDR4 2400 DIMM
Coprocessor maintain handle
2×Front PCIe×16 expansion
899.5mm
4 Card per group design
Flexible topologic in 8×PCIe configuration
CPU0 CPU1RAID
PCIe×8UPI
HHHLPCIe×16
CPU1RAID
PCIe×8UPI
SW SW
CPU0 CPU1RAID
PCIe×8UPI
SW SW
Slimline PortPCIe×16
HHHLPCIe×16
HHHL PCIe×16
HHHLPCIe×16
Slimline PortPCIe×16
Slimline PortPCIe×16
Slimline PortPCIe×16
Slimline PortPCIe×16
Slimline PortPCIe×16 SW SW
CPU0
Proposal C
• More expansibility, High CPU to
GPU bandwidth
• RAID mezzanine
• 2×HHHL PCIe×16 in front
• 8×U.2
Proposal B
• High ratio on Xeon Phi/GPU vs
CPU
• RAID mezzanine
• 2×HHHL PCIe×16 in front or
1xHHHL PCIex16 + 4xU.2 or 8xU.2
Proposal A
• All GPUs in same domain
• RAID mezzanine
• 2×HHHL PCIe×16 in front
• 8×U.2
AGX-2 Supports Different GPU Cards
Supports 8 NVIDIA Tesla P100 GPU Cards
(Built-in NVIDIA NVLink)
Supports 8 NVIDIA Tesla P100、P40、P4 GPU Cards(PCIe interface)
GX4 GPU BOX
4*GPUs
2*1600W Power
Supply
NVMe SSD
Expansion
Efficient Thermal
Fan
PCI-e*16
Expansion
PCI-e Switch Chip
PCI-e Expansion
Motherboard
GX4 GPU resource decoupling and pooling
8*GPUs 16*GPUs Scale out
Scale up
Partition design of CPU server and GPU Box
Flexible topology & high scalability Efficient data communications & high TCO revenue
GX4 Flexible GPUs Topology
PCIe
switch
GP
U 0
GP
U 2
GP
U 1
GP
U 3
CPU Server
CPU0 CPU1UPI
PCIe
switch
GP
U 0
GP
U 2
GP
U 1
GP
U 3
PCIe switch
GPU
0
GPU
2
GPU
1
GPU
3
CPU Server
CPU0 CPU1UPI
PCIe switch
GPU
0
GPU
2
GPU
1
GPU
3
PCIe switch
GPU
0
GPU
2
GPU
1
GPU
3
CPU Server
CPU0 CPU1UPI
PCIe switch
GPU
0
GPU
2
GPU
1
GPU
3
BalancedPublic cloud serviceSmall-scale model training
CommonDeep Learning modeltraining
CascadedDeep Learning modeltrainingP2P function enhanced
TCO Revenue —— Tradition
GPU GPU GPU GPU
CPU CPU
GPU GPU GPU GPU
CPU CPU
GPUGPUGPUGPU
CPUCPU
GPU GPU GPU GPU
CPU CPU
IB Switch
Larg
esca
leI/O
R
ed
un
dan
cy
4 sets ofCPU + Memory +Storage
4 * IB cards 1* IB switch 16*GPUs
Tradition GPU ClusterFramework
Purchase Cost
High TCO Benefit——GPU BOX
16-CARD IN ONE SYSTEM
GPU communication needs no network protocol conversion reduce 50%+
I/O redundancy;Compared with tradition framework,
Purchase cost reduce by $15,000+.
1 sets ofCPU + Memory+Storage
0* IB cards 0* IB switch 16*GPUs
Purchase Cost
Low
er I/O
R
ed
un
dan
cy
PCIe switch
GPU 0
GPU 2
GPU 1
GPU 3
CPU Server
CPU0 CPU1UPI
PCIe switch
GPU 0
GPU 2
GPU 1
GPU 3
PCIe switch
GPU 0
GPU 2
GPU 1
GPU 3
PCIe switch
GPU 0
GPU 2
GPU 1
GPU 3
GX4 Support with full range of PCIE accelerators
NVIDIA® Tesla®
P100NVIDIA® Tesla®
P40NVIDIA® Tesla®
P4
Fast Data Swapping
In GPU Memory
Large Amountof Training Data
Higher Efficiencyfor DL Inference
Support various GPU, FPGA, KNL and other PCIE cards, and
reserve NVMe pooling function
Higherprice/performance ratio for HPC
Intel KNL
Better TCO for Inference
FPGA
GX4 GPU Specifications
GPU BOX Specifications Model Number SF0204P1GPU 4*PCIe P100/P40/P4/KNL/FPGA
Size 435mmx87.5mmx740mm
Management Chip AST2500BCM58522
U.2 16 direct-connect U.2(Without GPU)
PCIe Support 1 standard PCI-e*16 slot;4 mini PCI-e* 4 cable
I/O RJ45 management port, serial port
Power Supply 1600w 1+1 redundant power supply
Outlet Rear-end outlet
Head node Specifications
Model Number NF5280M5
CPU Support 2*Intel Next Generation Processer Platform - Skylake
MEM1. 24 x DDR4 DIMM and 12 x Apache Pass; 2. Support RDIMM, LRDIMM, NVdimm3. Support 2400, 2666 MT/s
Storage1. Maximum support 3.5* 12 + 2.5 *4 , including 3 front NVMe2. Maximum support2.5* 24 + 2.5*4+3.5*4, including 6 front
NVMe
PCI-e Support up to 4 GPU BOX