Big Data와 Machine Learning - IBM...• 같은Spark 업무부하를Intel Haswell (E5-2690 v3 24 core) 대비1/2 의비용으로처리 • 2.3배더나은가격대비성능 • 같은rack

© 2016 International Business Machines Corporation 1© 2009 IBM Corporation

Big Data와 Machine Learning

STG Technical Sales

유부선 상무

© 2016 International Business Machines Corporation 2

IBM Watson on Power in the cognitive computing era

IBM Watson을 통해 보는 “Cognitive Computing이란 무엇인가?”


체스 (또는 바둑…)

• 유한하고 수학적으로 잘 정의된 공간

• 유한한 경우의 수와 상태

• 분명하고 모호하지 않은 수학적 규칙에 기반

인간의 언어 (영어…)

• 중의적, 문맥적, 암시적

• 인간의 인지에 기반

• 같은 뜻을 나타내는 무한한 표현들

체스가 어려운가 영어가 어려운가 ?


On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.

Buttons

TV remote controls,Shirts, Telephones

Mt

Everest

He was first

Edmund

Hillary

Category: Common Bonds

Watson은 대답할 수 있는데… 당신은 할 수 있는가 ?


DeepQA: The Technology Behind Watson

. . .

Answer

Scoring

Models

Answer &

Confidence

Question

Evidence

Sources

Models

Models

Models

Models

ModelsPrimary

Search

Candidate

Answer

Generation

Hypothesis

Generation

Hypothesis and

Evidence Scoring

Final Confidence

Merging &

Ranking

Synthesis

Answer

Sources

Question &

Topic

Analysis

Question

Decomposition

Evidence

Retrieval

Deep

Evidence

Scoring

Hypothesis

Generation

Hypothesis and Evidence

Scoring

Learned Models

help combine and

weigh the Evidence

Natural Language Processing, Machine Learning and Reasoning Algorithms.


Memorial Sloan Kettering 병원에서 Watson이 하는 일은 ?

Source : https://www.anthem.com/medicalpolicies/policies/mp_pw_a053323.htm

https://www.anthem.com/medicalpolicies/policies/mp_pw_a053323.htm


NETFLIX는 어떻게 더 많은 매출을 위해 노력하는가 ?

75%


The “NETFLIX PRIZE” Contest

Goal : Training data를 사용하여 ‘추천 시스템’을 구축하되, 이를 실제Qualifying data에 대입했을 때 기존 Netflix의 추천 시스템 대비 error율을10% 이상 향상시켜야 함

Prize : 10% 이상 향상시킨 최초의 팀이 1백만달러를 수상

참가

• 186개국 41,305개팀 51,051명

• 5,169개팀에서 44,014건을 제출

Netflix가 제공한 data

• Training set: 99,072,112 < user, movie > pairs with ratings

• Probe set: 1,408,395 < user, movie > pairs with ratings

• Qualifying set of 2,817,131 < user, movie > pairs with no ratings

• 480,189 users (실제 data, 단, 무의미한 일련번호로 변환하여 제공)

• 17,770 movies (실제 data, 단, 무의미한 일련번호로 변환하여 제공)

• ratings on integer scale 1 to 5


Netflix prize의 도전 과제

user2와 동일영화에 대해동일한rating- 강한가중치

user2와 동일영화에 대해비슷한rating- 중간가중치

“Nearest Neighbors in Action”

엄청난 양의 data -거의 1억건

매우 밀도가 낮은data - 전체 user-movie 표에서 rating 점수가 채워진 칸은고작 1.2%

일반 대중의 rating 점수는 기분, 상황에따라 변화가 큼

Training set과qualifying set의통계학적 특성이다름


정확한 분석, 더 많은 기회


운명의 2009년, 누가 이겼나 ?

최고 향상도는 "The Ensemble" 팀

• 기존 Cinematch 알고리즘에 비해 향상도 10.10% 기록 (Quiz RMSE 0.8553)

그러나 최종 승자는 “BellKor’s Pragmatic Chaos” 팀

• 기존 Cinematch 알고리즘에 비해 향상도 10.09% 기록 (Quiz RMSE 0.8554)

• 실제 정확도는 Ensemble 팀보다 근소하게 뒤짐

• BellKor 팀이 Ensemble 팀보다 20분 먼저 결과물을 제출한 것이 이유

AI를 위한 학습 시간은 큰 부담입니다. 그 학습 시간 단축을 위한 Facebook의연구는 투자 가치가 있습니다... 기업적 맥락에서 보면, 시스템을 구축해놓은뒤에도 몇 달 동안은 "아직도 학습 중이네"라는 소리를 듣게 되실 겁니다.

Mike Schroepfer, Facebook CTO


Recommendation Engine - Matrix Factorization

단어의 동시 발생 행렬을 rating matrix로인수분해

의미를 포함하는 단어 특성을 획득

man – woman =

king – queen =


GPU의 위력

cuMF on 1 server w/ 4 GPUs (Nvidia K80) vs. 32 ~ 50 nodes of CPU cores

• 6-10x as fast

• 33-100x as cost-efficient (cuMF costs $2.5 per hour on Softlayer)


ImageNet Large Scale Visual Recognition Challenge

“Machine Learning 세계의 신춘문예”

University of North Carolina, Stanford University, University of Michigan가 매년합동 개최

object detection and image classification 경쟁 대회

1천여개의 종류에대한 120만 장의이미지를 제공

3개월간 더 정확하고오류가 낮은 결과를도출하는 것이 목표

점점 더 많은 GPU가사용되는 경향


CPU와 GPU의 비교

Intel Xeon E5-2690 v3:

Clock speed: 2.6 GHz

12 cores

Up to 0.6 GFLOPS double precision

Memory size: 768 GB

Bandwidth: 68 GB/sec

NVIDIA Tesla K80:

Clock speed: 560MHz

4992 CUDA cores (2496 per GPU)

8.73 TFLOPS single precision

2.91 TFLOPS double precision

Memory size: 24 GB (12GB per GPU)

Bandwidth: 480 GB/sec

GPUCPU


CPU와 GPU의 협업

GPU CPU

Use GPU to Parallelize

Compute-Intensive Functions

Rest of SequentialCPU Code

+


GPU 성능의 핵심은 massive parallelization

Host

Kernel 1

Kernel 2

GPU

Grid 1

Block

(0, 0)

Block

(1, 0)

Block

(2, 0)

Block

(0, 1)

Block

(1, 1)

Block

(2, 1)

Grid 2

Block (1, 1)

Thread

(0, 1)

Thread

(1, 1)

Thread

(2, 1)

Thread

(3, 1)

Thread

(4, 1)

Thread

(0, 2)

Thread

(1, 2)

Thread

(2, 2)

Thread

(3, 2)

Thread

(4, 2)

Thread

(0, 0)

Thread

(1, 0)

Thread

(2, 0)

Thread

(3, 0)

Thread

(4, 0)

/* Main function, executed on host (CPU) */

int main( void) {

/* 1. allocate memory on GPU */

/* 2. Copy data from Host to GPU */

/* 3. Execute GPU kernel *//* 4. Copy data from GPU back to Host */

/* 5. Free GPU memory */

return(0);

}

/* 3. Execute GPU kernel */

/* Calculate number of blocks and threads */

int threadsPerBlock = 256;

int blocksPerGrid =(numElements + threadsPerBlock -

1) / threadsPerBlock;

/* Launch the Vector Add CUDA Kernel */

vectorAdd<<<blocksPerGrid,

threadsPerBlock>>>(d_A, d_B, d_C,

numElements);

/* Wait for all the threads to complete */

cudaDeviceSynchronize();


IBM Watson + NVIDIA GPU

“IBM POWER 아키텍처와 NVIDIA

Tesla 플랫폼의 결합은 Watson의 deep

learning 기능의 확장을 더욱 강력하게해줄 것입니다. 하드웨어 가속은영상과 자연어 인식에 대한 심층분석을 Watson이 단시간 안에 할 수있도록 하기 위해 필수적인 것입니다.”

2015년 11월

Rob High, IBM Fellow and IBM

Watson CTO

Source : http://ibmresearchnews.blogspot.kr/2015/11/accelerating-watsons-

performance-

with.html?cm_mc_uid=13727683789814586059128&cm_mc_sid_50200000=14

58890594

http://ibmresearchnews.blogspot.kr/2015/11/accelerating-watsons-performance-with.html?cm_mc_uid=13727683789814586059128&cm_mc_sid_50200000=1458890594


PCIe Connection GPU CPU

Graphics Memory

System Memory

GPU

Graphics Memory

System Memory

GPU

Graphics Memory

NV

Lin

k

16+16 GB/s

40+

40

GB

/s

기존 GPU 연결 NVLink를 이용한 CPU-GPU 연결

현재 GPU 활용의 가장 큰 난관은 GPU 간,

GPU와 서버 CPU간의 PCIe 구간에서의데이터 통신 병목 발생

GPU의 숙제를 해결해 줄 NVLink Interconnect의 진화

/* Main function, executed on host (CPU) */

int main( void) {

/* 1. allocate memory on GPU */

/* 2. Copy data from Host to GPU */

/* 3. Execute GPU kernel */

/* 4. Copy data from GPU back to Host */

/* 5. Free GPU memory */

return(0);

}


이제는 협업의 시대 - OpenPOWER Foundation

‘OpenPOWER Foundation’의 Chairman & Google 수석 엔지니어인 Gordon이 POWER8 기반 시스템 아키텍쳐 공개

• IT업계 전반의 폭넓은 혁신 유도

• 현재의 데이타센타 기술의 문제점을 해결하는 보다 나은 대안을 제시

• POWER 기술 관련 생태계 활성화

OpenPOWER Foundation 결성 목적

OpenPOWER Foundation 현황

• 2013년 IBM / Google / Mellanox / NVIDIA / TYAN 5개 회사로 시작

• 2016년 3월 현재 200개 이상으로 확대 및 강화

• 한국에서는 삼성전자 / SK Hynix 2개사가 메모리 분야에서 참여

OpenPOWER와의 협업으로 설계/생산된 새로운 POWER8


Google 의 발표

구글이 Rackspace와 공동으로 POWER9을 이용하여설계한 Open Compute server "Zaius"

50+ 새로운 플랫폼 및 솔루션 개발

SuperMicro Xilinx / Samsung / Nallatech/ IBM

-Card combines CAPI, FPGA,

NVMeFlash to enable “IBM Engine for

NoSQL”

Edico Genome Solution

NVIDIA GPU 장착된 IBM Power S824L 발표

2015년 하반기 출시

패턴 추출에서 x86보다

8배 빠름

미국 에너지성 $325M 슈퍼컴 계약- IBM/Mellanox/NVIDIA 와 체결

향후 공급될 Sierra와 Summit system은 POWER와 GPU의조합으로 100 PF 이상의 성능을구현

성능은 기존 수퍼컴인 Titan 및Sequoia보다 4x-8x 향상

비즈니스 성과 창출

OpenPOWER Foundation의 성과


Netflix Prize의 행사 중단

2010년 3월, Netflix는 후속 Prize 경쟁을 취소한다고 발표

원인은 data의 개인정보 침해 여지

• Dataset이 모두 의미없는 숫자로 변환되어 제공되었으므로 개인정보유출은 없었음

그러나 Texas 대학의 두 data 연구자가 그런 무의미한 dataset을 인터넷에공개된 “Internet Movie Database”(imdb)의 영화 평점과 비교함으로써 어느숫자가 어느 user인지 식별이 가능함을 입증

Netflix Prize를 위해 변환 data를 공개하는 것이 미국 공정거래법과 비디오사생활 보호법을 위반한 것이라는 소송이 4명의 Netflix 사용자에 의해제출됨

• 결국 원고와의 합의로 소 취하로 마무리


Peter Norvig, Director of Research,

Google :

“Google does not have better algorithms,

only more data.”


Hadoop과 Spark의 최적화된 IBM Power Systems S812LC

1-socket, 2U

Up to 10 cores (2.9-3.3Ghz)

1 TB Memory (32 DIMMs)

115GB/sec memory bandwidth

14 * LFF (HDD/SSD) 74TB storage (6TB*12 + 1TB*2)

4 PCIe slots, 1 CAPI enabled

Default 3 year 9x5 warranty, 100% CRU

빅데이터 분석, 성능은 올라가고 ↑ 가격은 떨어지고 ↓

• 2.3배 높은 가격 당 성능!

• 94% 적은 상면 (Spark workloads in the same rack space as Intel Xeon E5-2690 v3 systems)

• 1.94배 높은 성능 per system (10 core S812LC vs 24 core DL380)

Target Customers

• Managed Service Providers (MSPs/CSPs)

• Hadoop / Spark 등의 빅데이터 분석을 필요로 하는 워크로드

• 기존 x86 서버에 비해 향상된 성능과 처리량을 필요로 하는 Linux 워크로드

• 저렴한 비용으로 오픈 소스 스택을 관리하고 싶은 고객

Collaboration

with


Power S812LC vs. x86의 성능 차이 - SPECint_rate2006 (CPU only 측정)

Source : http://www.spec.org/cpu2006/results/rint2006.html

Vendor Model Threads Cores ChipsSPECint_rate2006

Peak Peak/Core

Cisco UCS C220 M4 (Intel Xeon E5-2630 v3 @ 2.40GHz) 32 16 2 693 43.3

Dell PowerEdge R730 (Intel Xeon E5-2630 v3, 2.40 GHz) 32 16 2 686 42.9

HP ProLiant DL380 Gen9 (2.40 GHz, Intel Xeon E5-2630 v3) 32 16 2 692 43.3



IBM Power S812LC (2.92 GHz, 10 core, Red Hat) 40 10 1 642 64.2

IBM Power S824 (3.5 GHz, 24 core, RHEL) 192 24 4 1720 71.7

http://www.spec.org/cpu2006/results/rint2006.html


• 10개의 SparkBench 벤치마크에서 S812LC는x86 대비 약 2배의 성능을 입증

• 같은 Spark 업무 부하를 Intel Haswell (E5-2690

v3 24 core) 대비 1/2 의 비용으로 처리

• 2.3배 더 나은 가격 대비 성능

• 같은 rack 공간에서 Intel Haswell (E5-2690 v3

24 core) 대비 94% 더 많은 Spark 업무 부하를처리

• 1.94배 더 나은 시스템 당 성능

1.94

1

0

0.5

1

1.5

2

2.5

POWER8 x86

Rela

tive P

erf

orm

an

ce

IBM S812LC

10c/80t

• All results are based on IBM Internal Testing of 10 SparkBench benchmarks consisting of SQL RDD Relation, Twitter, Pageview

Streaming, PageRank, Logistic Regression, SVD++, TriangleCount, SVM, MF, SQL Hive

• IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.9GHz, 256 GB memory, Ubuntu 15.04, Spark 1.4, OpenJDK 1.8

• Intel Xeon HP DL380; 24 cores / 48 threads, E5-2690 v3; 2.3GHz , 256 GB memory. Ubuntu 15.04, Spark 1.4, OpenJDK 1.8

• Pricing is based on list prices of HP DL380 and estimated prices of IBM Power S82LC

HP DL380

E5-2690 v3

24c/48t

2.3

1

0

0.5

1

1.5

2

2.5

POWER8 x86

Rela

tive P

erf

orm

an

ce p

er

$

IBM S812LC

10c/80tHP DL380

E5-2690 v3

24c/48t

Power S812LC vs. x86의 성능 차이 - SparkBench (CPU + memory 측정)


OpenPOWER Foundation + Open Data Platform(ODPi)

ODPi는 기업을 위한 Apache

Hadoop기반의 빅데이터 기술 발전과

혁신을 유도하는 빅데이터 협업 기구

IBM은 파워 아키텍쳐를 중심으로 최초로하드웨어 분야의 개방형 혁신을 이끄는 개방형 커뮤니티 전 세계 20개국, 약 130여개

의 회원사 가입

http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwjJlc_ftuLJAhUMno4KHV-sDNoQjRwIBw&url=http://www.imeimage.com/teradata-logo&bvm=bv.110151844,d.dGY&psig=AFQjCNEfarVpLXSGP6xmcDLcHWlB55nisg&ust=1450425638626402

http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwjJlc_ftuLJAhUMno4KHV-sDNoQjRwIBw&url=http://www.imeimage.com/teradata-logo&bvm=bv.110151844,d.dGY&psig=AFQjCNEfarVpLXSGP6xmcDLcHWlB55nisg&ust=1450425638626402


IBM Open Platform with Apache Hadoop

• IBM Open Platform with Hadoop & Spark 은 100% Open Source이고, 100% ODPi

표준을 준수하며, 100% 무료 (기술 지원은 유료 옵션)

• Ambari, Hadoop, YARN, Spark, Hive, Hbase, Knox, Avro, Flume, Pig, Sqoop, Zookeeper,

Nagios 등을 포함하여 패키징

• Open Source의 혜택을 향상된 품질과 지원으로 제공


IBM Open Platform with Apache Hadoop + IBM BigInsights

Text Analytics

POSIX Distributed File

System

Multi-workload, Multi-

tenant scheduling

BigInsights

Enterprise Mgmt

Machine Learning

with Big R

Big R

BigInsights

Data Scientist

BigInsights

Analyst

Big SQL

BigSheets

Big SQL

BigSheets

for Apache

Hadoop v4.1

Free 100%

Open Source

based on

ODPi core

Choose

from 3

Value Add

Services

Apache Open Source Components

HDFS

YARN

MapReduce

Ambari HBase

Spark

Flume

Hive Pig

Sqoop

HCatalog

Solr/Lucene

IBM Open Platform with Apache Hadoop

More info: https://developer.ibm.com/hadoop/docs/getting-started/faqs/

https://developer.ibm.com/hadoop/docs/getting-started/faqs/


IBM Data Engine for Hadoop and Spark

• Single vendor 지원

• Spark 업무 부하에 대해 Xeon 대비 최대 2배

더 좋은 가성비*

• 통합된 “ready to run” cluster 형태로 제공

• OpenPOWER 산출물인 IBM S812LC 서버

Hadoop 또는 Spark

업무에 최적화된 구성

서버당 14개의 SATA

drive를 장착한S812LC 서버에 기반

IBM BigInsights와IBM Open Platform의preload는 옵션

손쉬운 설치와 관리

분석 업무의 변화에따라 변경 및 확장

OpenPOWER의 산출물과 IBM Open Platform with Apache Hadoop이 결합된고성능, 고집적 스토리지를 위한 통합 클러스터 오퍼링

• All results are based on IBM Internal Testing of 3 SparkBench benchmarks

consisting of SQL RDD Relation, Logistic Regression, SVM


요약

새로운 기회의 창출Machine

Learning

Machine Learning에 최적화된시스템을 위한 협업 생태계

GPU &

OpenPOWER

Collect first, analyze laterData !

새로운

기회

새로운

자원

새로운

협업

Q&A

Documents

Big Data와 Machine Learning - IBM...• 같은Spark 업무부하를Intel Haswell (E5-2690 v3 24 core) 대비1/2 의비용으로처리 • 2.3배더나은가격대비성능 • 같은rack