Deep Learning技術の最近の動向とPreferred Networksの取り組み

Deep LearningPreferred Networks

2014/12/9

Preferred Networks

21Web ACM SIGMOD 58 @

(@delta2323_)

2012.3 PFI

2014.10 PFN

http://delta2323.github.io

2

Preferred NetworksPreferred InfrastructurePFI2006

l

Preferred NetworksPFN2014

l IoT

l

l

l

3

(NTT)Deep Learning

(PFI) iPS(CiRA)iPS http://blogs.wsj.com/japanrealtime/

2014/10/01/ntt-toyota-seek-deep-learning-expertise/

WSJ 2014/10/1

Deep Learning(1/2)

5

+

(8) () ()

Deep Learning

Deep Learning(2/2)

6Intel ITPRO EXPO AWARD 2014

UI

Deep Learning

PFN

7

Sense

Organize

Analyze

Action

S3 06/3

Simple DB 07/12

EBS 08/8Glacier 12/8

RDS 09/10

Dynamo DB 09/10Aurora 14/11

EMR 09/4

Red Shift 12/11Kinesis 13/11

Storage

Database

Analytics

AWS

8

Data Citation Index(Reuter)

GIGADB

DOI

BioinformaticsPLoS One etc.

9

* Dimensionality Reduction by Learning an Invariant Mapping Raia Hadsell, Sumit Chopra, Yann LeCun, CVPR, 2006

10

11

(0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )

/SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/

RF/ANN

K-means/Spectral Clustering/MMC/LSI/LDA/GM

HMM/MRF/CRF

-

-

- AROW, NHERD

- space, n-gram, mecab, mecab-n-gram

- F

-

13

*

Jubatus

***

Jubatus Jubatus Casual Talks #3 **

14

* http://www.briscola.co.jp/media/press/pdf/briscola_press_20140212.pdf ** http://blog.jubat.us/2014/07/jubatus-casual-talks-3.html *** http://itpro.nikkeibp.co.jp/article/NEWS/20140212/536349/

Deep Learning

PFN

15

x1

xN

h1

hH

Neural Network

kM

k1

yM

y1

f1 f2

f3

W2/b2W1/b1

tM

t1

Forward

Backward

W1:1 b1:1 W2:2 b2:216

Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k) f3i(k) = exp(ki)/_{j} exp(kj)

17

(0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )

/SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/

RF/ANN

K-means/Spectral Clustering/MMC/LSI/LDA/GM

HMM/MRF/CRF

n-gram/BoWSIFT/SURF/HOG/PHOW/BoVW

18

19

2012 Deep Learning

ILSVRC2012

Supervison [Krizhevsky+ 12]

2

4(26%16%)

NN

Deep Learning

Neural Net

DL

16%(12) 11%(13) 6.6%(14)

DL

Google/FaceBook/Microsoft/Baidu

DL

Siri/Google

GoogLeNet

20

Google Brain [Le+ 13]

Deep Learning() Neural Network (ReLU/CNN/NiN)

NN 1

(DropOut/DAE)

(GPGPU)

NN

21

Deep Learning (1/3) Neural Network

ReLU[Nair+ 10]

CNN [LeCun+ 89]

Network in Network[Lin+ 13]

MaxOut

Disentangling [GoodFellow+09][Bengio 14]

SigmoidReLU [Bengio 14]

22

Deep Learning (2/3) Neural Network Layerwise Pretrain[Bengio+07] Auto-Encoder

(SGD)

AdaGrad[Duchi+ 11]

Nesterovs Method/RMSProp

DropOut[Hinton+ 12]

Denoising Auto-Encoder

[ IBIS13] [Srivastava+14]

SGD 23

GPGPU

(DistBelief)

ImageNet1400

Sports1M100

DistBelief[Jeffrey+12]

ImageNet [Deng+ 09] [Karpathy+ 14] 24

Deep Learning (3/3)

Deep Learning (1/3)

NN

GWAS[Puniyani+10], PheWAS

x 1 x N

h 1 h H

k Mk 1

y My 1

k Mk 1

y My 1

k Mk 1

y My 1

25

1

2

3

(QSAR)

* http://blog.kaggle.com/2012/10/31/merck-competition-results-deep-nn-and-gpus-come-out-to-play/

19Deep NN[Dahl+14]

Gr. Merck *

* Fig. 2

26

Deep Learning (2/3)

+ / + +

27

x 1 x N

x 1 x N

h 1 h H

k Mk 1

y My 1

x 1 x N

[Jefferey+14]

Deep Learning (3/3)

[Kiela+14]

2+

CNN

28

Deep Learning

NN

29

GoogLeNetprototxt 2000ILSVRC14

GoogLeNet

30

Deep Learning (1/2)

DNN

Pretrain//

SGD [Dauphin+14]

Do Deep Nets Really Need to be Deep? [Ba+13]

ILSVRC14 GoogLeNet

Deep Learning (2/2) = NN

/Layer/Node//

/Iteration/

ReLUInf

NNDSL,

GoogLeNetprototxt Caffe

2000https://github.com/BVLC/caffe/pull/1367/files31

Deep Learning

[Vinyal+ to appear]

Recurrent NN, LSTM

[Sutskever+14], [Karpathy+ 14]

DNNShallow NN

Model Compression[Bucilua+06] / Distilled NetworkDark Knowledge[Hinton+14]

Deep Learning

Layerwise Pretrain [Arora+13]

Deep (Directed) Generative Model

Generative Stochastic Network[Bengio+13], Generative

Variational Auto-Encoder[Kingma+13]

1

32

Deep Learning

33

Ustream /Slideshare / Research Blog

http://www.slideshare.net/pfi/deep-learning-22350063 http://www.slideshare.net/beam2d/deep-learning20140130 http://www.slideshare.net/beam2d/deep-learning-22544096

Deep Learning

PFN

34

35

Sense

Organize

Analyze

Action

S3 06/3

Simple DB 07/12

EBS 08/8Glacier 12/8

RDS 09/10

Dynamo DB 09/10Aurora 14/11

EMR 09/4

Red Shift 12/11Kinesis 13/11

Storage

Database

Analytics

AWS

Sense

Organize

Analyze

Action

36

(1/3)

37

(2/3)

l

l ///

l /

l //

PFI (Ustream)

38

http://www.slideshare.net/shoheihido/120913-pfi-dist http://www.slideshare.net/shoheihido/130328-slideshare http://www.slideshare.net/shoheihido/ss-25510340

(3/3)

XXX

XXX

39

[+12]

40

41

Unify & GeneralizeSensing , Organize, Analyze, Action

Security Privacy Heterogeneity Distributed Intelligence

CiscoGE

Cisco : Internet of Everything(IoE)

IoE1014

4000

l

l 76105%

l

2013990

42

- White Paper Embracing the Internet of Everything To Capture Your Share of $14.4 Trillion - Industrial Internet: Pushing the Boundaries of Minds and Machines - The Industrial Internet@Work

GE : Industrial Internet

Industrial InternetGDP

20100150

CTMRI40025000

Deep Learning

PFN

43

II

44

Organize (1/2) GGRNA

Google-like full text search engine

(DBCLS)

NCBIRefSeq 13(Zoo)

Sedue

Nucl.AcidsRes.2012[Naito+12]

45

Organize (2/2)GGGenome

DDBJ Release 92.0

(hg19)(mm10)12

RESTful API

ACGTGATC

ACTA ATC

d (ACGTGATC, ACTAATC) = 3

GGRNA/GGGenome

[]

2U1CPU 62 3.46GHz/192GB

GGGenome RefSeq 61 8.6GB 52.4GB

DDBJ 92.0 150.8GB 932.2GB

hg19 3.1GB 19.0GB

GGRNA RefSeq 61 32.4GB 210.3GB

DDBJ() 92.0 559.2GB 3192.8GB

[]

47

48

SQL

R&D

x 1 x Nx 1 x N

h 1 h H

k Mk 1

y My 1

k Mk 1

y My 1

49

SQL

R&D

()

DB+ ()

50

1

1

Sense, Organize, Analyze, Action

Deep

Learning

51

(1/5)

[Arora+13]Arora, Sanjeev, et al. "Provable bounds for learning some deep representations." arXiv preprint

arXiv:1310.6343 (2013).

[Ba+13]Ba, Lei Jimmy, and Rich Caurana. "Do Deep Nets Really Need to be Deep?." arXiv preprint arXiv:

1312.6184 (2013).

[Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural

information processing systems 19 (2007): 153.

[Bengio+13]Bengio, Yoshua, and Eric Thibodeau-Laufer. "Deep generative stochastic networks trainable

by backprop." arXiv preprint arXiv:1306.1091 (2013).

[Bengio14]Bengio, Yoshua. "How auto-encoders could provide credit assignment in deep networks via

target propagation." arXiv preprint arXiv:1407.7906 (2014).

[Bucilua+06]Bucilu, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. "Model compression."

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.

ACM, 2006. 52

(2/5)

[Dahl+14]Dahl, George E., Navdeep Jaitly, and Ruslan Salakhutdinov. "Multi-task Neural Networks for

QSAR Predictions." arXiv preprint arXiv:1406.1231 (2014).

[Dauphin+14]Dauphin, Yann N., et al. "Identifying and attacking the saddle point problem in high-

dimensional non-convex optimization." Advances in Neural Information Processing Systems. 2014.

[Duchi+11]Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online

learning and stochastic optimization." The Journal of Machine Learning Research 12 (2011): 2121-2159.

[Deng+09]Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and

Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

[GoodFellow+09]Goodfellow, Ian, et al. "Measuring invariances in deep networks." Advances in neural

information processing systems. 2009.

[Hinton+12]Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature

detectors." arXiv preprint arXiv:1207.0580 (2012).

53

(3/5)

[Hinton+14]Geoffrey Hinton, Oriol Vinyals, Jeff Dean,Distilling the Knowledge in a Neural Network,Deep

Learning and Representation Learning Workshop: NIPS 2014

[Jeffrey+12]Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in Neural Information

Processing Systems. 2012.

[Jeffrey+14]Large Scale Deep Learning CIKM keynote, 2014,http://static.googleusercontent.com/

media/research.google.com/ja//people/jeff/CIKM-keynote-Nov2014.pdf

[Karpathy+ 14]Karpathy, Andrej, et al. "Large-scale video classification with convolutional neural

networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.

[Kingma+13]Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint

arXiv:1312.6114 (2013).

[Klela+14]Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal

Semantics, D. Kiela and Leon Bottou EMNLP 2014

54

(4/5)

[Krizhevsky+ 12]Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with

deep convolutional neural networks." Advances in neural information processing systems. 2012.

[Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,

Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.

[LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne

Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in

neural information processing systems 2, NIPS 1989, 396-404

[Lin+13]Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." arXiv preprint arXiv:1312.4400

(2013).

[Nair+ 10]Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann

machines." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.

[Naito+12]Yuki Naito and Hidemasa Bono, GGRNA: an ultrafast, transcript-oriented search engine for

genes and transcripts, Nucl. Acids Res. (2012) 40(W1):W592-W596 55

(5/5)

[Puniyani+10]K. Puniyani, S. Kim, and E. P. Xing, Multi-population GWA mapping via multi-task

regularized regression, Bioinformatics, vol. 26, no. 12, pp. i208-i216, Jun. 2010

[Srivastava+14]Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from

overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.

[Sutskever+14]Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. "Sequence to sequence learning with neural

networks." Advances in Neural Information Processing Systems. 2014.

[13] , , 16

(IBIS2013)

[+12], Edge-Heavy Data: CPS

GICTF 2012,http://www.gictf.jp/doc/20120709GICTF.pdf

56

Deep Learning(1/2)

59

Deep Learning(2/2)

60

Deep Learning

DL : Deep Learning

NN : Neural Network

DNN : Deep Neural Network

CNN : Convolutional Neural Network

RNN : Recurrent Neural Network

LSTM : Long Short-Term Memory

ReLU : Rectified Linear Unit

NiNNetwork in Network

AEAuto-Encoder

DAEDenoising Auto-Encoder

GWAS : Genome-Wide Association Study

PheWAS : Phenome-Wide Association Study

QSAR :Quantitative StructureActivity

Relationship

BoVW : Bug of Visual Word

SIFT :Scale-Invariant Feature Transform

SURF :Speeded Up Robust Features

HOG :Histogram of Oriented Gradients

PHOW :Pyramid Histogram Of visual Words

63

(1/3)

NetNeural Net(NN)

Node = NeuronUnit Node

LayerNode

x1

xN

h1

hH

kM

k1

yM

y1

tM

t1

Forward

Backward

Net Node Layer

(2/3)Layer

NodeLayer

Y=(WX)

W

X

X

WX

Y

W

64

minibatch j

(3/3)

Epoch 1

Epoch N

Epoch 2

Epoch i

Epoch i

minibatch 1

2

minibatch 2

minibatch M

minibatch j

1

B

Epoch (Iteration)1 NetN

SolverNet minibatchNN

65

Deep Learning(1/3)

Perceptron [Rosenblatt58]

PerceptronsPerceptron [Minsky+ 69]

Back Propagation [Bryson+ 69]

BPNN [Werbos74]

Neocognitron[Fukushima80] / Hopfield Network[Hopfield82] / PDP model[Rumelhart+

86] / Gradient Descent[Rumelhart+ 86]

[McClelland+86]

1960

1970

1980

1990

1NN

AI

2NN

AI66

Deep Learning(2/3)

Bayesian Network [Pearl85] Support Vector Machine [Corinna, Vapnik95]

SVMBoostingNN / Vanishing Gradient

Problem[Hochreiter91]

RBM[Smolensky86]/CNN[LeCun89]/

RNN[Jordan86, Elman90]/ LSTM[Hochreiter+

97]

Greedy Layerwise Pretrain[Bengio+07] Deep Belief Network[Hinton+06]

Deep Bolzmann Machine[Salakhutdinov+09]

1990

2000

NN

67

Deep Learning(3/3)

ILSVRCSupervision[Krizhevsky+ 2012]

Google Brain[Le, Ng, Jeffrey+ 2012] Deep Learning

DL GoogLeNet[Szegedy+ 2014]

2010

3NN

2014

68

(1/2)

Sense

Organize

69

(2/2)

Organize

//

///

1000

Analyze /

70

(1/4) [Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural

information processing systems 19 (2007): 153.

[Bryson+69]Bryson, Arthur E., and Ho Yu Chi. "Applied optimal control." (1969).

[Cortes+95]Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3

(1995): 273-297.

[Elman+90]Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.

[Fukushima80]Fukushima, Kunihiko. "Neocognitron: A self-organizing neural network model for a

mechanism of pattern recognition unaffected by shift in position." Biological cybernetics 36.4 (1980):

193-202.

[Hinton+06]Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep

belief nets." Neural computation 18.7 (2006): 1527-1554.

[Hochreiter91]Hochreiter, Sepp. "Untersuchungen zu dynamischen neuronalen Netzen." Master's thesis,

Institut fur Informatik, Technische Universitat, Munchen (1991). 71

(2/4)

[Hochreiter+97]Hochreiter, Sepp, J. urgen Schmidhuber, and Corso Elvezia. "LONG SHORT-TERM

MEMORY." Neural Computation 9.8 (1997): 1735-1780.

[Hopfield82]Hopfield, John J. "Neural networks and physical systems with emergent collective

computational abilities." Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.

[Jordan86]Jordan, Michael I. Serial Order: A Parallel Distributed Processing Approach. No. ICS-8604.

CALIFORNIA UNIV SAN DIEGO LA JOLLA INST FOR COGNITIVE SCIENCE, 1986.

[LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne

Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in

neural information processing systems 2, NIPS 1989, 396-404,

[Krizhevsky+ 12] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with

deep convolutional neural networks." Advances in neural information processing systems. 2012.

72

(3/4)

[Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,

Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.

[McClelland+86]McClelland, James L., David E. Rumelhart, and PDP Research Group. "Parallel

distributed processing." Explorations in the microstructure of cognition 2 (1986).

[Minsky+69]Minsky, Marvin, and Seymour Papert. "Perceptron: an introduction to computational

geometry." The MIT Press, Cambridge, expanded edition 19 (1969): 88.

[Pearl85]Pearl, Judea. "BAYESIAN NETWCRKS: A MODEL CF SELF-ACTIVATED MEMORY FOR

EVIDENTIAL REASONING." (1985).

[Rosenblatt58]Rosenblatt, Frank. "The perceptron: a probabilistic model for information storage and

organization in the brain." Psychological review 65.6 (1958): 386.

[Rumelhart+86]Rumelhart, David E., James L. McClelland, and PDP Research Group. "Parallel

distributed processing, volume 1: Foundations." MIT Press, Cambridge, MA 19 (1986): 67-70.

73

(4/4)

[Salakhutdinov+09]Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann

machines." International Conference on Artificial Intelligence and Statistics. 2009.

[Smolensky86],Smolensky, Paul. "Information processing in dynamical systems: Foundations

of harmony theory." (1986): 194.

[Szegedy+14],Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:

1409.4842 (2014).

[Werbos74]Werbos, Paul. "Beyond regression: New tools for prediction and analysis in the

behavioral sciences." (1974).

74

Technology

Deep Learning技術の最近の動向とPreferred Networksの取り組み