75
Deep Learning技術の最近の動向 Preferred Networksの取り組み 2014/12/9 株式会社 Preferred Networks 野健太 <[email protected]> 21回先端的データベースとWeb技術動向講演会 ACM SIGMOD 本部第58回部会 @東京学産技術研究所

Deep Learning技術の最近の動向とPreferred Networksの取り組み

Embed Size (px)

Citation preview

  • Deep LearningPreferred Networks

    2014/12/9

    Preferred Networks

    21Web ACM SIGMOD 58 @

  • (@delta2323_)

    2012.3 PFI

    2014.10 PFN

    http://delta2323.github.io

    2

  • Preferred NetworksPreferred InfrastructurePFI2006

    l

    Preferred NetworksPFN2014

    l IoT

    l

    l

    l

    3

  • (NTT)Deep Learning

    (PFI) iPS(CiRA)iPS http://blogs.wsj.com/japanrealtime/

    2014/10/01/ntt-toyota-seek-deep-learning-expertise/

    WSJ 2014/10/1

  • Deep Learning(1/2)

    5

    +

    (8) () ()

    Deep Learning

  • Deep Learning(2/2)

    6Intel ITPRO EXPO AWARD 2014

    UI

  • Deep Learning

    PFN

    7

  • Sense

    Organize

    Analyze

    Action

    S3 06/3

    Simple DB 07/12

    EBS 08/8Glacier 12/8

    RDS 09/10

    Dynamo DB 09/10Aurora 14/11

    EMR 09/4

    Red Shift 12/11Kinesis 13/11

    Storage

    Database

    Analytics

    AWS

    8

  • Data Citation Index(Reuter)

    GIGADB

    DOI

    BioinformaticsPLoS One etc.

    9

  • * Dimensionality Reduction by Learning an Invariant Mapping Raia Hadsell, Sumit Chopra, Yann LeCun, CVPR, 2006

    10

  • 11

    (0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )

    /SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/

    RF/ANN

    K-means/Spectral Clustering/MMC/LSI/LDA/GM

    HMM/MRF/CRF

  • 12

  • -

    -

    - AROW, NHERD

    - space, n-gram, mecab, mecab-n-gram

    - F

    -

    13

  • *

    Jubatus

    ***

    Jubatus Jubatus Casual Talks #3 **

    14

    * http://www.briscola.co.jp/media/press/pdf/briscola_press_20140212.pdf ** http://blog.jubat.us/2014/07/jubatus-casual-talks-3.html *** http://itpro.nikkeibp.co.jp/article/NEWS/20140212/536349/

  • Deep Learning

    PFN

    15

  • x1

    xN

    h1

    hH

    Neural Network

    kM

    k1

    yM

    y1

    f1 f2

    f3

    W2/b2W1/b1

    tM

    t1

    Forward

    Backward

    W1:1 b1:1 W2:2 b2:216

    Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k) f3i(k) = exp(ki)/_{j} exp(kj)

  • 17

    (0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )

    /SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/

    RF/ANN

    K-means/Spectral Clustering/MMC/LSI/LDA/GM

    HMM/MRF/CRF

  • n-gram/BoWSIFT/SURF/HOG/PHOW/BoVW

    18

  • 19

    2012 Deep Learning

    ILSVRC2012

    Supervison [Krizhevsky+ 12]

    2

    4(26%16%)

    NN

    Deep Learning

  • Neural Net

    DL

    16%(12) 11%(13) 6.6%(14)

    DL

    Google/FaceBook/Microsoft/Baidu

    DL

    Siri/Google

    GoogLeNet

    20

    Google Brain [Le+ 13]

  • Deep Learning() Neural Network (ReLU/CNN/NiN)

    NN 1

    (DropOut/DAE)

    (GPGPU)

    NN

    21

  • Deep Learning (1/3) Neural Network

    ReLU[Nair+ 10]

    CNN [LeCun+ 89]

    Network in Network[Lin+ 13]

    MaxOut

    Disentangling [GoodFellow+09][Bengio 14]

    SigmoidReLU [Bengio 14]

    22

  • Deep Learning (2/3) Neural Network Layerwise Pretrain[Bengio+07] Auto-Encoder

    (SGD)

    AdaGrad[Duchi+ 11]

    Nesterovs Method/RMSProp

    DropOut[Hinton+ 12]

    Denoising Auto-Encoder

    [ IBIS13] [Srivastava+14]

    SGD 23

  • GPGPU

    (DistBelief)

    ImageNet1400

    Sports1M100

    DistBelief[Jeffrey+12]

    ImageNet [Deng+ 09] [Karpathy+ 14] 24

    Deep Learning (3/3)

  • Deep Learning (1/3)

    NN

    GWAS[Puniyani+10], PheWAS

    x 1 x N

    h 1 h H

    k Mk 1

    y My 1

    k Mk 1

    y My 1

    k Mk 1

    y My 1

    25

    1

    2

    3

  • (QSAR)

    * http://blog.kaggle.com/2012/10/31/merck-competition-results-deep-nn-and-gpus-come-out-to-play/

    19Deep NN[Dahl+14]

    Gr. Merck *

    * Fig. 2

    26

  • Deep Learning (2/3)

    + / + +

    27

    x 1 x N

    x 1 x N

    h 1 h H

    k Mk 1

    y My 1

    x 1 x N

    [Jefferey+14]

  • Deep Learning (3/3)

    [Kiela+14]

    2+

    CNN

    28

  • Deep Learning

    NN

    29

    GoogLeNetprototxt 2000ILSVRC14

    GoogLeNet

  • 30

    Deep Learning (1/2)

    DNN

    Pretrain//

    SGD [Dauphin+14]

    Do Deep Nets Really Need to be Deep? [Ba+13]

    ILSVRC14 GoogLeNet

  • Deep Learning (2/2) = NN

    /Layer/Node//

    /Iteration/

    ReLUInf

    NNDSL,

    GoogLeNetprototxt Caffe

    2000https://github.com/BVLC/caffe/pull/1367/files31

  • Deep Learning

    [Vinyal+ to appear]

    Recurrent NN, LSTM

    [Sutskever+14], [Karpathy+ 14]

    DNNShallow NN

    Model Compression[Bucilua+06] / Distilled NetworkDark Knowledge[Hinton+14]

    Deep Learning

    Layerwise Pretrain [Arora+13]

    Deep (Directed) Generative Model

    Generative Stochastic Network[Bengio+13], Generative

    Variational Auto-Encoder[Kingma+13]

    1

    32

  • Deep Learning

    33

    Ustream /Slideshare / Research Blog

    http://www.slideshare.net/pfi/deep-learning-22350063 http://www.slideshare.net/beam2d/deep-learning20140130 http://www.slideshare.net/beam2d/deep-learning-22544096

  • Deep Learning

    PFN

    34

  • 35

    Sense

    Organize

    Analyze

    Action

    S3 06/3

    Simple DB 07/12

    EBS 08/8Glacier 12/8

    RDS 09/10

    Dynamo DB 09/10Aurora 14/11

    EMR 09/4

    Red Shift 12/11Kinesis 13/11

    Storage

    Database

    Analytics

    AWS

  • Sense

    Organize

    Analyze

    Action

    36

  • (1/3)

    37

  • (2/3)

    l

    l ///

    l /

    l //

    PFI (Ustream)

    38

    http://www.slideshare.net/shoheihido/120913-pfi-dist http://www.slideshare.net/shoheihido/130328-slideshare http://www.slideshare.net/shoheihido/ss-25510340

  • (3/3)

    XXX

    XXX

    39

  • [+12]

    40

  • 41

    Unify & GeneralizeSensing , Organize, Analyze, Action

    Security Privacy Heterogeneity Distributed Intelligence

  • CiscoGE

    Cisco : Internet of Everything(IoE)

    IoE1014

    4000

    l

    l 76105%

    l

    2013990

    42

    - White Paper Embracing the Internet of Everything To Capture Your Share of $14.4 Trillion - Industrial Internet: Pushing the Boundaries of Minds and Machines - The Industrial Internet@Work

    GE : Industrial Internet

    Industrial InternetGDP

    20100150

    CTMRI40025000

  • Deep Learning

    PFN

    43

  • II

    44

  • Organize (1/2) GGRNA

    Google-like full text search engine

    (DBCLS)

    NCBIRefSeq 13(Zoo)

    Sedue

    Nucl.AcidsRes.2012[Naito+12]

    45

  • Organize (2/2)GGGenome

    DDBJ Release 92.0

    (hg19)(mm10)12

    RESTful API

    ACGTGATC

    ACTA ATC

    d (ACGTGATC, ACTAATC) = 3

  • GGRNA/GGGenome

    []

    2U1CPU 62 3.46GHz/192GB

    GGGenome RefSeq 61 8.6GB 52.4GB

    DDBJ 92.0 150.8GB 932.2GB

    hg19 3.1GB 19.0GB

    GGRNA RefSeq 61 32.4GB 210.3GB

    DDBJ() 92.0 559.2GB 3192.8GB

    []

    47

  • 48

    SQL

    R&D

    x 1 x Nx 1 x N

    h 1 h H

    k Mk 1

    y My 1

    k Mk 1

    y My 1

  • 49

    SQL

    R&D

    ()

    DB+ ()

  • 50

    1

    1

  • Sense, Organize, Analyze, Action

    Deep

    Learning

    51

  • (1/5)

    [Arora+13]Arora, Sanjeev, et al. "Provable bounds for learning some deep representations." arXiv preprint

    arXiv:1310.6343 (2013).

    [Ba+13]Ba, Lei Jimmy, and Rich Caurana. "Do Deep Nets Really Need to be Deep?." arXiv preprint arXiv:

    1312.6184 (2013).

    [Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural

    information processing systems 19 (2007): 153.

    [Bengio+13]Bengio, Yoshua, and Eric Thibodeau-Laufer. "Deep generative stochastic networks trainable

    by backprop." arXiv preprint arXiv:1306.1091 (2013).

    [Bengio14]Bengio, Yoshua. "How auto-encoders could provide credit assignment in deep networks via

    target propagation." arXiv preprint arXiv:1407.7906 (2014).

    [Bucilua+06]Bucilu, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. "Model compression."

    Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.

    ACM, 2006. 52

  • (2/5)

    [Dahl+14]Dahl, George E., Navdeep Jaitly, and Ruslan Salakhutdinov. "Multi-task Neural Networks for

    QSAR Predictions." arXiv preprint arXiv:1406.1231 (2014).

    [Dauphin+14]Dauphin, Yann N., et al. "Identifying and attacking the saddle point problem in high-

    dimensional non-convex optimization." Advances in Neural Information Processing Systems. 2014.

    [Duchi+11]Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online

    learning and stochastic optimization." The Journal of Machine Learning Research 12 (2011): 2121-2159.

    [Deng+09]Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and

    Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

    [GoodFellow+09]Goodfellow, Ian, et al. "Measuring invariances in deep networks." Advances in neural

    information processing systems. 2009.

    [Hinton+12]Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature

    detectors." arXiv preprint arXiv:1207.0580 (2012).

    53

  • (3/5)

    [Hinton+14]Geoffrey Hinton, Oriol Vinyals, Jeff Dean,Distilling the Knowledge in a Neural Network,Deep

    Learning and Representation Learning Workshop: NIPS 2014

    [Jeffrey+12]Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in Neural Information

    Processing Systems. 2012.

    [Jeffrey+14]Large Scale Deep Learning CIKM keynote, 2014,http://static.googleusercontent.com/

    media/research.google.com/ja//people/jeff/CIKM-keynote-Nov2014.pdf

    [Karpathy+ 14]Karpathy, Andrej, et al. "Large-scale video classification with convolutional neural

    networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.

    [Kingma+13]Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint

    arXiv:1312.6114 (2013).

    [Klela+14]Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal

    Semantics, D. Kiela and Leon Bottou EMNLP 2014

    54

  • (4/5)

    [Krizhevsky+ 12]Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with

    deep convolutional neural networks." Advances in neural information processing systems. 2012.

    [Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,

    Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.

    [LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne

    Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in

    neural information processing systems 2, NIPS 1989, 396-404

    [Lin+13]Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." arXiv preprint arXiv:1312.4400

    (2013).

    [Nair+ 10]Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann

    machines." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.

    [Naito+12]Yuki Naito and Hidemasa Bono, GGRNA: an ultrafast, transcript-oriented search engine for

    genes and transcripts, Nucl. Acids Res. (2012) 40(W1):W592-W596 55

  • (5/5)

    [Puniyani+10]K. Puniyani, S. Kim, and E. P. Xing, Multi-population GWA mapping via multi-task

    regularized regression, Bioinformatics, vol. 26, no. 12, pp. i208-i216, Jun. 2010

    [Srivastava+14]Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from

    overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.

    [Sutskever+14]Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. "Sequence to sequence learning with neural

    networks." Advances in Neural Information Processing Systems. 2014.

    [13] , , 16

    (IBIS2013)

    [+12], Edge-Heavy Data: CPS

    GICTF 2012,http://www.gictf.jp/doc/20120709GICTF.pdf

    56

  • Copyright 2014-

    Preferred Networks All Right Reserved.

  • Deep Learning(1/2)

    59

  • Deep Learning(2/2)

    60

  • 61

  • Deep Learning

    DL : Deep Learning

    NN : Neural Network

    DNN : Deep Neural Network

    CNN : Convolutional Neural Network

    RNN : Recurrent Neural Network

    LSTM : Long Short-Term Memory

    ReLU : Rectified Linear Unit

    NiNNetwork in Network

    AEAuto-Encoder

    DAEDenoising Auto-Encoder

    GWAS : Genome-Wide Association Study

    PheWAS : Phenome-Wide Association Study

    QSAR :Quantitative StructureActivity

    Relationship

    BoVW : Bug of Visual Word

    SIFT :Scale-Invariant Feature Transform

    SURF :Speeded Up Robust Features

    HOG :Histogram of Oriented Gradients

    PHOW :Pyramid Histogram Of visual Words

  • 63

    (1/3)

    NetNeural Net(NN)

    Node = NeuronUnit Node

    LayerNode

    x1

    xN

    h1

    hH

    kM

    k1

    yM

    y1

    tM

    t1

    Forward

    Backward

    Net Node Layer

  • (2/3)Layer

    NodeLayer

    Y=(WX)

    W

    X

    X

    WX

    Y

    W

    64

  • minibatch j

    (3/3)

    Epoch 1

    Epoch N

    Epoch 2

    Epoch i

    Epoch i

    minibatch 1

    2

    minibatch 2

    minibatch M

    minibatch j

    1

    B

    Epoch (Iteration)1 NetN

    SolverNet minibatchNN

    65

  • Deep Learning(1/3)

    Perceptron [Rosenblatt58]

    PerceptronsPerceptron [Minsky+ 69]

    Back Propagation [Bryson+ 69]

    BPNN [Werbos74]

    Neocognitron[Fukushima80] / Hopfield Network[Hopfield82] / PDP model[Rumelhart+

    86] / Gradient Descent[Rumelhart+ 86]

    [McClelland+86]

    1960

    1970

    1980

    1990

    1NN

    AI

    2NN

    AI66

  • Deep Learning(2/3)

    Bayesian Network [Pearl85] Support Vector Machine [Corinna, Vapnik95]

    SVMBoostingNN / Vanishing Gradient

    Problem[Hochreiter91]

    RBM[Smolensky86]/CNN[LeCun89]/

    RNN[Jordan86, Elman90]/ LSTM[Hochreiter+

    97]

    Greedy Layerwise Pretrain[Bengio+07] Deep Belief Network[Hinton+06]

    Deep Bolzmann Machine[Salakhutdinov+09]

    1990

    2000

    NN

    67

  • Deep Learning(3/3)

    ILSVRCSupervision[Krizhevsky+ 2012]

    Google Brain[Le, Ng, Jeffrey+ 2012] Deep Learning

    DL GoogLeNet[Szegedy+ 2014]

    2010

    3NN

    2014

    68

  • (1/2)

    Sense

    Organize

    69

  • (2/2)

    Organize

    //

    ///

    1000

    Analyze /

    70

  • (1/4) [Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural

    information processing systems 19 (2007): 153.

    [Bryson+69]Bryson, Arthur E., and Ho Yu Chi. "Applied optimal control." (1969).

    [Cortes+95]Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3

    (1995): 273-297.

    [Elman+90]Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.

    [Fukushima80]Fukushima, Kunihiko. "Neocognitron: A self-organizing neural network model for a

    mechanism of pattern recognition unaffected by shift in position." Biological cybernetics 36.4 (1980):

    193-202.

    [Hinton+06]Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep

    belief nets." Neural computation 18.7 (2006): 1527-1554.

    [Hochreiter91]Hochreiter, Sepp. "Untersuchungen zu dynamischen neuronalen Netzen." Master's thesis,

    Institut fur Informatik, Technische Universitat, Munchen (1991). 71

  • (2/4)

    [Hochreiter+97]Hochreiter, Sepp, J. urgen Schmidhuber, and Corso Elvezia. "LONG SHORT-TERM

    MEMORY." Neural Computation 9.8 (1997): 1735-1780.

    [Hopfield82]Hopfield, John J. "Neural networks and physical systems with emergent collective

    computational abilities." Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.

    [Jordan86]Jordan, Michael I. Serial Order: A Parallel Distributed Processing Approach. No. ICS-8604.

    CALIFORNIA UNIV SAN DIEGO LA JOLLA INST FOR COGNITIVE SCIENCE, 1986.

    [LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne

    Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in

    neural information processing systems 2, NIPS 1989, 396-404,

    [Krizhevsky+ 12] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with

    deep convolutional neural networks." Advances in neural information processing systems. 2012.

    72

  • (3/4)

    [Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,

    Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.

    [McClelland+86]McClelland, James L., David E. Rumelhart, and PDP Research Group. "Parallel

    distributed processing." Explorations in the microstructure of cognition 2 (1986).

    [Minsky+69]Minsky, Marvin, and Seymour Papert. "Perceptron: an introduction to computational

    geometry." The MIT Press, Cambridge, expanded edition 19 (1969): 88.

    [Pearl85]Pearl, Judea. "BAYESIAN NETWCRKS: A MODEL CF SELF-ACTIVATED MEMORY FOR

    EVIDENTIAL REASONING." (1985).

    [Rosenblatt58]Rosenblatt, Frank. "The perceptron: a probabilistic model for information storage and

    organization in the brain." Psychological review 65.6 (1958): 386.

    [Rumelhart+86]Rumelhart, David E., James L. McClelland, and PDP Research Group. "Parallel

    distributed processing, volume 1: Foundations." MIT Press, Cambridge, MA 19 (1986): 67-70.

    73

  • (4/4)

    [Salakhutdinov+09]Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann

    machines." International Conference on Artificial Intelligence and Statistics. 2009.

    [Smolensky86],Smolensky, Paul. "Information processing in dynamical systems: Foundations

    of harmony theory." (1986): 194.

    [Szegedy+14],Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:

    1409.4842 (2014).

    [Werbos74]Werbos, Paul. "Beyond regression: New tools for prediction and analysis in the

    behavioral sciences." (1974).

    74

  • Copyright 2014-

    Preferred Networks All Right Reserved.