Chung, Suk won / AI & HPC Category Manager · 2018-10-31 · HPE enables AI from “intelligent edge to core data center” Intelligent Edge (inference) Cost optimized Storage Performance

Chung, Suk won / AI & HPC Category Manager

PART 1. 시작하기전에:

ReviewWhat is Deep Learning?

What is Machine Learning?

*Total: 42 Pages / 3 Parts

More complex function (function of functions)

s

be

ba

f

f1f2

f3

F: (s,be,ba,f) -> price

price = F(s,be,ba,f) = f3(f2(f1(s,be,ba,f)))

Rule-based AI, traditional ML and DL

4

Artificial Intelligence

Rule-based AI Machine Learning

if (s==100 and be==2and ba==2 and f==9)

then price = $1000000;

else if (…) then …else if (…) then …

Traditional ML Deep Learning

List of features:

- ‘s’ : surface

- ‘be’ : # bedrooms

- ‘ba’ : # bathrooms

- ‘f’ : # floors

Define a ‘Model’ :

F: (s,be,ba,f) -> price

price = F(s,be,ba,f) = w1*s + w2*be + w3*ba + w4*f

Train a model find the best values of w1, w2, w3, w4

Tasks complex for human, but easily formalized through rules

Traditional machine learningRequires feature engineering

5

Training DataMachine learning alg

orithmFeature engineering

DataLearned model (pred

iction function)Feature extraction Prediction

Training

Prediction

Deep learning

Artificial neural networks

Machine learning

Artificial intelligence

Deep learning

6

Training Data Deep learning algorithm

DataLearned model (transformation and prediction f

unction)Prediction

Training

Prediction (inference)

Deep learning

Artificial neural networks

Machine learning

Artificial intelligence

Efficient data representations, no more feature engineering

Deep Learning

7

InferenceApplying this capability

to new data

Trained ModelNew capability

optimized for

performance

New DataApp or ServiceFeaturing Capability

“cat”

“?”

“dog” “cat”

Training Dataset

“dog”

“cat”

“dog”

“cat”

“dog”

“cat”

TrainingLearning a new capability

from existing data

UntrainedNeural Network Model

정리: AI algorithms

8

Artificial

Intelligence

1. Top-DownDeductive

Laws/Rules

Handcrafted

2. Bottom-UpInductive

Input Ranking

Machine Learningand statistics

ClusteringUnlabeled Data

Unsupervised

Predict Category

Predict Point Regression

ClassificationLabeled Data

Supervised

o Neural Network

. Deep Learning

. etc.

o Decision Tree

. Random Forest

. etc.

o Bayesian

. Naïve Bayes

.etc.

o K-methods

. K-means

. etc.

o Regression

. Linear

. etc.

o etc.

Reinforcement

PART 2. AI 비즈니스트렌드

AI Business TrendWhat is your tool?

Where can we utilize it?What can we expect in future?


Where would the road take us?

10

Advances in artificial intelligence will transform modern life by reshaping transportation, health, science, finance, and the

military.

“High-level machine intelligence” (HLMI) is achieved when unaided machines can ac- complish every task better and

more cheaply than human workers.

Grace et al , When Will AI Exceed Human Performance? Evidence from AI Experts

Writing a bestseller –2049

Driving a truck - 2027

Math Research - 2060

Surgeon -2043

Retail - 2031 Full Automation of labor – 2140

AI Business Growth 예측

HPE’s View: What is the Market Size for AI?IDC forecasts spending on AI-focused hardware, software, and services to reach $58bn by 2021

HPE’s View: AI Global IT Spend by Industry: Who’s got the money??Sample AI Use Cases Across Different Industry Verticals

Banking & Securities,

19%

Government, 17%

Manufacturing &

Natural Res., 17%

Comms, Media &

Services, 16%

Retail, 7%

Insurance, 7%

Utilities, 5%

Healthcare Providers,

4%

Transportation, 4% Education, 2%Wholesale Trade, 2%

AI 기반 이미지/비디오 분석

0M

200M

400M

600M

800M

1,000M

2016 2020

10억개의 보안 카메라가 전세계에 설치(2020)

일간 300억개의 프레임 분석 필요

실제 세계에서의 기존 비디오 분석은 신뢰성이 떨어지는 문제

74%

9…

2010 2011 2012 2013 2014 2015 2016

Accuracy

이미지분류

Human

Hand-coded CV

Deep Learning

AI 기능 지능형 비디오 분석은

인간의 인지 능력을 뛰어 넘었음

GPU 기반 비디오 분석의 성능 가속

Nvidia P4/T4

Max. Cameras Per System720p/15fps/h.264

Detection Detection + Attribute Intrusion Line Crossing

HPE Edgeline EL1000 9 6 9 9

HPE Edgeline EL4000 36 24 36 36

이미지 자동 분석 기반의 품질 검사기준 구성 품질 검사

케이블 연결이 안됨

배터리부재

잘못된 위치에케이블 연결

Augmented Reality를적용한 파이프 변형 여부 검사

육안 판별이 불가능한초미세 영역에 대한 분석

2016 StackGAN (Generative Adversarial Network)

17

새로운 데이터를 생성하는 인공지능과 생성된 데이터가 진짜 인지 혹은 가짜인지를 판별하는 두 인공지능이 서로경쟁하며 진짜와 같은 가상의 결과물을 생성

PART 3. 시스템과솔루션

InfrastructureHow can we make it?


Deep learning frameworks : WHAT is these?

19

Optimized linear algebra libraries, many

support BLAS interface, hardware specific

Hardware-specific libraries for basic operations for

deep neural networks (BLAS + FFT, convolutions, etc)

Deep learning and machine learning frameworks

High-level APIs

UI, development tools

(TensorFlow, CNTK, Theano, MXNet)

cuBLAS, MKL, OpenBLAS, rocBLAS, MIOpenGEMM

Accelerator-specific drivers and softwareNVIDIA drivers, CUDA, ROCm

Brew (Caffe2)TF Layers (TensorFlow)

Gluon (MXNet)

NVIDIA DIGITS (Caffe, Torch, TensorFlow)

CNTK

MIOpen

Most popular frameworks

20https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

Software Affiliated company

Supported HW Written in Interface Good for

x86NVIDIA GPUsAMD GPUs

C++, Python Python, C++, Java, Go, Swift

All use cases

x86NVIDIA GPUs

C++ Python Natural language processing

x86NVIDIA GPUs

C++, Python Python, C++, Scala, Julia, Perl, R

All use cases

x86NVIDIA GPUsAMD GPUs

C++ Python, bash Image processing

x86NVIDIA GPUs

C++ bash Speech recognition

https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

Functional & Applications ViewAn End to End Data Pipeline

Analytic Services

“IoT”Edge Processing of data in motion

“Fast Data”Core Processing of data in motion

“Big Data”Analysis of data at rest

“AI”Deep Learning/Machine Learning

Parallel Data Flow Management

“Data Lake”

Distributed Data Flow Mgmt.

Parallel Analytic FrameworkData Acquisition

HPC Storage

Data Science ToolchainsData Flow Design, Data Science Workbench, Model Management, Application Deployment

Business Systems

Services and Solutions

Local Data Mgmt.

Container Management

Analytic Services

Model Serving

Model Serving

Models

Edge Infrastructure Management

Deep Learning

NoSQLHPC Storage

21

HPE enables AI from “intelligent edge to core data center”

Intelligent Edge(inference)

Cost optimized Storage Performance optimized storage

Core data center(deep learning)

TRAINING DATAEDGE

DATA

HPE Apollo 6500 Gen10

WekaIO

NVIDIA® Tesla® GPU acceler

ators

HPE System Management

Software

IOTDevices

HPE Aruba

HPE Edgeline

HPE Networking

HPE Apollo

HPE Pointnext

HPE OneView

HPE DMF

HPE Apollo

HPE ProLiant

HPE Apollo

HPE ProLiant

HPE Synergy

InfiniBand

Confidential

Qumulo

Scality

Ceph

HPE Organizational ViewAn End to End Data Pipeline

Analytic Services

NoSQL

Parallel Data Flow Mgmt

“Data Lake”

Parallel Analytic Framework

HPC Storage

Model

ServingModels

Deep Learning

Distributed Data Flow MgmtData Acquisition

Local Data Mgmt

Container Management

Analytic Services

Model S

erving

Edge Infrastructure Mgmt

Aruba

“IoT”Edge Processing of data in motion

“Fast Data”Core Processing of data in motion

“Big Data”Analysis of data at rest

“AI”Deep Learning/Machine Learning

Data Science Toolchains

Business Systems

Services and Solutions

HPE Storage

Enterprise Solutions and Performance

HPE Storage

Enterprise Solutions and Performance

HPC & AI BU

HPC & AI BU

Pointnext

23

인공 신경망의 예

InputHiddenLayer 1

HiddenLayer 2

Output InputHiddenLayer 1

HiddenLayer 2 Output

Convolutional Images

Fully connected Speech, text, sensor

Recurrent Speech, text, sensor

InputHiddenLayer 1

Output

많이사용되는 Deep Learning 모델

Name Type Model size(# params)

Model size (MB)GFLOPs

(forward pass)

AlexNet CNN 60,965,224 233 MB 0.7

GoogleNet CNN 6,998,552 27 MB 1.6

VGG-16 CNN 138,357,544 528 MB 15.5

VGG-19 CNN 143,667,240 548 MB 19.6

ResNet50 CNN 25,610,269 98 MB 3.9

ResNet101 CNN 44,654,608 170 MB 7.6

ResNet152 CNN 60,344,387 230 MB 11.3

Eng Acoustic Model RNN 34,678,784 132 MB 0.035

TextCNN CNN 151,690 0.6 MB 0.009

Application 별추천사항

Infrastructure

Frameworks

Typical layers

Data type

Data

제조Verticals 정유 & 가스 자율주행음성 소셜미디어

Speech Images Sensor dataVideo

Small Moderate Large

CNNFully-connect

edRNN

TensorFlow Caffe 2 CNTK …

x86 GPUs FPGAs TPU ? …

…

Torch

Neural Network sits here

54

HPE Deep Learning Cookbook

벤치마크 테스트데이터 제공

52

Deep learning워크로드에대한적용가이드

– 8개의Deep Learning 프레임워크기반의 11개워크로드에대해서 8종의HPE 하드웨어구성에대한정보제공

벤치마크및아키텍처툴에대한오

픈소스화

– Deep Learning벤치마크도구를GitGub에공개예정

– Deep Learning성능분석도구

– Hpe.com에표준아키텍처정보공개

– 워크로드에 대한 성능 예측치를 제공하여 최적의 시스템 사이징 근거 자료 제공

Benchmarking Suite Architecture

29

Benchmarking S

uite

Default param

eters

Benchmarks s

pecification

TensorFlow la

uncher

Caffe launche

r

Caffe2 launch

er

TensorRT lau

ncher

TF CNN Benchmark

Caffe

Caffe2

Benchmarks

TensorRT Benchm

arks

MXNet launch

er

MXNet

Benchmarks

TensorFlow

Caffe2

TensorRT

MXNet

PyTorch launc

her

PyTorch Benchmar

ksPyTorch

NVCNN

NVCNN Horovod

Tensor2Tensor

ONNX Logo was taken from onnx.ai

Which benchmarks to run

Configures benchmarks, runs one at a time

Mediators between experimenter and frameworks

Runs inference and training

Standard or custom frameworks

HPE Apollo 6500 XL270d 서버Deep Learning 훈련전용 8 GPU 서버

HPE ProLiant XL270d Gen10- SMX2 NVLINK Type

HPE ProLiant XL270d Gen10- PCIe Type

Redhat/CentOS/Suse/Ubuntu/windows

Framework/Library 설치지원

PCI 4:1 8:1 설정펌웨어에서가능

GPU 연결아키텍처

NVLink 기반 연결 PCIe 기반 연결

Choice of accelerator topologies to suit your specific workloads

32

S

W

S

W

CPU

2

CPU

1

S

W

S

W

GPU 1 GPU 4

GPU 3 GPU 2

GPU 5 GPU 8

GPU 7 GPU 6

S

W

S

W

CPU

2

CPU

1

S

W

S

W

GPU 1 GPU 4

GPU 3 GPU 2

GPU 5 GPU 8

GPU 7 GPU 6

Enhanced performance with hybrid-cube mesh accelerator topology using NVLink 2.0 for deep learning / AI and

HPC applications

Traditional PCIe with 4:1 topology for most HPC applications, as they do not rely on GPU:GPU commu

nications heavily

PCIe accelerators with 8:1 topology suits select HPC

and deep learning training, for easiest and most efficient GPUDirect enabled code

NVLink 2.0 PCIe 4:1 PCIe 8:1

S

W

S

W

CPU

2

CPU

1

S

W

S

W

GPU 1 GPU 4

GPU 3 GPU 2

GPU 5 GPU 8

GPU 7 GPU 6

DEEP LEARNING 계산요구특징 (이미지예시)

CONVOLUTION FULLY CONNECTED FULLY CONNECTED CONVOLUTION

(연산 성능) (메모리 대역폭) (메모리 대역폭) (연산 성능)

WEIGHT UPDA

TEDNN WEIGHTS

(GPU간 데이터 교환 성능)

많은 수의GPU Core

3D 스택킹 메모리TSV High BW 메모리

NVLink

추측 오차보정

많은 수의GPU Core

Infiniband를통한 Server Node간통신성능향샹

대형 신경망 모델 (Task 병렬화) 소형신경망모델 (Data 병렬화)

Batch

Predictions

Flower

Errors

Node B

Node A

Node A

Node B

Batch

Batch

Predictions

Flower

Errors

Predictions

House

Errors

4개의 100Gbps Infiniband 연결로대역폭개선

Deep Learning (Training) Workloads Require High Speed FabricRDMA is Key, bandwidth is Key

35

50% Better Performance & Linear Scaling with RDMA

Data Source: Courtesy of Mellanox Benchmark Test

6.5X Faster Training with 100GbE than

10GbE

Higher is better Higher is better

HPE ConfidentialHPE Confidential – Customer NDA required

HPE InfiniBand EDR 제품군

HPE EDR IB/EN 100Gb Adapters Mellanox EDR 36p Switches Mellanox EDR Modular Switches

216-port (12U+1U shelf)



‒ Mellanox CX-4 ASIC

‒ 1 포트 또는 2 포트

‒ EDR IB , 100GbE 지원

Mellanox IB ED

R 36p 스위치

‒ 36 개의 QSFP28 포트

‒ Managed , unmanage

d 모델

HPE Apollo 6000용 EDR 100Gb

A6000 새쉬 내장형 (24 downlink , 12 uplink ports)

HPE ICE-XA 용 Single, Dual,

Performance Dual Port EDR

HPE SGI 8600 Premium EDR IB Switch (18

downlink, 3 crosslink, 36 uplink ports)

37

“Make AI Work”IT를위한 AI 현업을위한 AI 미래를위한 AI

HPE InfoSight24x7 자원모니터링, 장애를예측하고선제적으로대응

HPE Aruba IntroSpect지능형엣지에서공격을탐지하는시스템

HPE Pointnext 컨설팅워크샵, 개발지원, 사용량기반 IT 서비스

AI 전용인프라GPU 컴퓨팅서버, 고성능스토리지/네트워크

검증되고 통합된 AI 솔루션

HPE Deep Learning Cookbook

미래를위한 AI 인프라Hewlett Packard Labs 을통한선진컴퓨팅기술 (Dot Product

Engine, Optical Computing 등)

Documents

Chung, Suk won / AI & HPC Category Manager · 2018-10-31 · HPE enables AI from “intelligent edge to core data center” Intelligent Edge (inference) Cost optimized Storage Performance