Inspur GPU Server - 株式会社キング・テック2017-7-21 · Inspur AI Computing Platform 3 GPU Server 4 GPU Server 8 GPU Server 16 GPU Server NF5280M4 (2CPU + 3 GPU) NF5280M5

Inspur GPU Server

Inspur AI Computing Platform

3 GPU Server

4 GPU Server

8 GPU Server

16 GPU Server

NF5280M4 (2CPU + 3 GPU)

NF5280M5(2 CPU + 4 GPU)

GPU Node (2U 4 GPU Only)

NF5288M5 (2 CPU + 8 GPU)

SR GPU BOX (16 P40 GPU Only)

• CSP

Deep Learning

Real-time transcoding

VOD Server

• HPC

Heterogeneous Computing

HPC Cluster

• CSP

Multi GPU, GPU P2P, RDMA supports

High capacity local storage

High performance-price ratio

• HPC

Multi GPU/MIC, GPU P2P, RDMA supports

100G Ethernet, InfiniBand ext.

GPU Server target market

Purpose H/W Requirements

• NVLink is a high-speed interconnect that replaces PCI Express to provide up to 12X faster data sharing between GPUs.

Better support on

GPUDirect2.0

Unified Memory Pool

3

NVIDIA P100 (Pascal)

NVLink High-speed interconnect

NVIDIA GPUDirect Peer-to-Peer (P2P)

Communication

GPUDirect Advantages

• Accelerated communication with network and storage devices

• Peer-to-Peer Transfers between GPUs

• Peer-to-Peer memory access

• RDMA

NF5288M4 NF5288M5

CPU1. 2x Intel® Xeon® Processors E5-2600V4 series

2. TDP up to 145W

1. 2 x SKL-EP CPU

2. TDP up to 165W

Memory16× DDR4 DIMM per node

support DDR4-2400 memory

16× DDR4 DIMM per node

4×Apache Pass supports

PCIE I/O8 x PCI-E3.0 x16 (x8 link) or 4 x PCI-E3.0 x16 (x16

link) and 1 x PCI-E3.0 x24

1 x PCIe 3.0 x8 mezzanine RAID

2 x PCIe 3.0 x16 HHHL Front slot

Storage Support 8 x 3.5’’/2.5’’ SAS/SATA/SSD

Storage Controller:

8 x 2.5” U.2 (8639)

2 x M.2 PCIe & SATA on board

GPU

SupportSupporting up to 4 x GPU and MIC Accelerator Cards Up to 8 x 300W GPU/SXM2

System Fan Redundant Hot swap System Fan, Air CoolingRedundant Hot swap System Fan, Air cooling or

Hybrid cooling.

PSU 2x 1620/2000W PSU 80plus Platinum 2x 3000W PSU 80plus Titanium

NF5288M4 VS NF5288M5

• 2U GPU server for HPC and Machine Learning

• 2 x SKL EP Processors, TDP up to 165W, Support SKL-F SKUs

• Support 8 × Xeon Phi/GPU in a 2U chassis

• Both PCIe AIC & SXM2 GPU are supported

• 8 × 2.5”U.2 storage bays

• GPU TDP up to 300W

• 3000W 1+1 PSU Titanium

SXM2 Configuration PCIe AIC configuration

2017.04

2017.08

Sample

MP

NF5288M5 GPU server for Purley

Front view

Rear view

8×2.5 U.2

2×PCIe×16 HHHL slot 2×3000w PSU

4×10G Ethernet

2× C20 Power connector4×PCIe×16 HHHL slot(only for SXM2 configuration)

Rear I/O

Front I/O

NF5288M5 GPU server for Purley

8×SXM2 NVIDIA GPU

4×PCIe×16 HHHL slot Liquid cooling connector(Optional)

2×Skylake, 165W TDP

5×Redundant dual rotor fan

16×DDR4 2400 DIMM

2×Front PCIe×16 expansion

SXM2 NVIDIA GPU

899.5mm

SXM2 GPU configuration

NF5288M5 SXM2 Liquid Solution

Water out

LQ2

Water in

LQ4

CPU0 CPU1RAID UPIFront PCIe x16

PCIe switch96-lane

Rear PCIe x16

8x U.2

Rear PCIe x16

Front PCIe x16

PCIe switch96-lane

PCIe switch96-lane

GPU0 GPU2

GPU1 GPU3

GPU4 GPU6

GPU5 GPU7

PCIe

NVLINK

Rear PCIe x16

Rear PCIe x16

8 SXM2 GPU Topologic on NF5288M5

PCIe AIC GPU configuration

8×PCIe dual slot PCIe×16For XEON Phi/GPU

2×Skylake, 165W TDP

5×Redundant dual rotor fan

16×DDR4 2400 DIMM

Coprocessor maintain handle

2×Front PCIe×16 expansion

899.5mm

4 Card per group design

Flexible topologic in 8×PCIe configuration

CPU0 CPU1RAID

PCIe×8UPI

HHHLPCIe×16

CPU1RAID

PCIe×8UPI

SW SW

CPU0 CPU1RAID

PCIe×8UPI

SW SW

Slimline PortPCIe×16

HHHLPCIe×16

HHHL PCIe×16

HHHLPCIe×16





Slimline PortPCIe×16 SW SW

CPU0

Proposal C

• More expansibility, High CPU to

GPU bandwidth

• RAID mezzanine

• 2×HHHL PCIe×16 in front

• 8×U.2

Proposal B

• High ratio on Xeon Phi/GPU vs

CPU

• RAID mezzanine

• 2×HHHL PCIe×16 in front or

1xHHHL PCIex16 + 4xU.2 or 8xU.2

Proposal A

• All GPUs in same domain

• RAID mezzanine

• 2×HHHL PCIe×16 in front

• 8×U.2

AGX-2 Supports Different GPU Cards

Supports 8 NVIDIA Tesla P100 GPU Cards

（Built-in NVIDIA NVLink）

Supports 8 NVIDIA Tesla P100、P40、P4 GPU Cards（PCIe interface）

GX4 GPU BOX

4*GPUs

2*1600W Power

Supply

NVMe SSD

Expansion

Efficient Thermal

Fan

PCI-e*16

Expansion

PCI-e Switch Chip

PCI-e Expansion

Motherboard

GX4 GPU resource decoupling and pooling

8*GPUs 16*GPUs Scale out

Scale up

Partition design of CPU server and GPU Box

Flexible topology & high scalability Efficient data communications & high TCO revenue

GX4 Flexible GPUs Topology

PCIe

switch

GP

U 0

GP

U 2

GP

U 1

GP

U 3

CPU Server

CPU0 CPU1UPI

PCIe

switch

GP

U 0

GP

U 2

GP

U 1

GP

U 3

PCIe switch

GPU

0

GPU

2

GPU

1

GPU

3

CPU Server

CPU0 CPU1UPI

PCIe switch

GPU

0

GPU

2

GPU

1

GPU

3

PCIe switch

GPU

0

GPU

2

GPU

1

GPU

3

CPU Server

CPU0 CPU1UPI

PCIe switch

GPU

0

GPU

2

GPU

1

GPU

3

BalancedPublic cloud serviceSmall-scale model training

CommonDeep Learning modeltraining

CascadedDeep Learning modeltrainingP2P function enhanced

TCO Revenue —— Tradition

GPU GPU GPU GPU

CPU CPU

GPU GPU GPU GPU

CPU CPU

GPUGPUGPUGPU

CPUCPU

GPU GPU GPU GPU

CPU CPU

IB Switch

Larg

esca

leI/O

R

ed

un

dan

cy

4 sets ofCPU + Memory +Storage

4 * IB cards 1* IB switch 16*GPUs

Tradition GPU ClusterFramework

Purchase Cost

High TCO Benefit——GPU BOX

16-CARD IN ONE SYSTEM

GPU communication needs no network protocol conversion reduce 50%+

I/O redundancy；Compared with tradition framework,

Purchase cost reduce by $15,000+.

1 sets ofCPU + Memory+Storage

0* IB cards 0* IB switch 16*GPUs

Purchase Cost

Low

er I/O

R

ed

un

dan

cy

PCIe switch

GPU 0

GPU 2

GPU 1

GPU 3

CPU Server

CPU0 CPU1UPI

PCIe switch

GPU 0

GPU 2

GPU 1

GPU 3

PCIe switch

GPU 0

GPU 2

GPU 1

GPU 3

PCIe switch

GPU 0

GPU 2

GPU 1

GPU 3

GX4 Support with full range of PCIE accelerators

NVIDIA® Tesla®

P100NVIDIA® Tesla®

P40NVIDIA® Tesla®

P4

Fast Data Swapping

In GPU Memory

Large Amountof Training Data

Higher Efficiencyfor DL Inference

Support various GPU, FPGA, KNL and other PCIE cards, and

reserve NVMe pooling function

Higherprice/performance ratio for HPC

Intel KNL

Better TCO for Inference

FPGA

GX4 GPU Specifications

GPU BOX Specifications Model Number SF0204P1GPU 4*PCIe P100/P40/P4/KNL/FPGA

Size 435mmx87.5mmx740mm

Management Chip AST2500BCM58522

U.2 16 direct-connect U.2(Without GPU)

PCIe Support 1 standard PCI-e*16 slot；4 mini PCI-e* 4 cable

I/O RJ45 management port, serial port

Power Supply 1600w 1+1 redundant power supply

Outlet Rear-end outlet

Head node Specifications

Model Number NF5280M5

CPU Support 2*Intel Next Generation Processer Platform - Skylake

MEM1. 24 x DDR4 DIMM and 12 x Apache Pass; 2. Support RDIMM, LRDIMM, NVdimm3. Support 2400, 2666 MT/s

Storage1. Maximum support 3.5* 12 + 2.5 *4 , including 3 front NVMe2. Maximum support2.5* 24 + 2.5*4+3.5*4, including 6 front

NVMe

PCI-e Support up to 4 GPU BOX

Documents

Inspur GPU Server - 株式会社キング・テック2017-7-21 · Inspur AI Computing Platform 3 GPU Server 4 GPU Server 8 GPU Server 16 GPU Server NF5280M4 (2CPU + 3 GPU) NF5280M5