Upload
kenta-oono
View
10.771
Download
0
Embed Size (px)
Citation preview
Deep LearningPreferred Networks
2014/12/9
Preferred Networks
21Web ACM SIGMOD 58 @
(@delta2323_)
2012.3 PFI
2014.10 PFN
http://delta2323.github.io
2
Preferred NetworksPreferred InfrastructurePFI2006
l
Preferred NetworksPFN2014
l IoT
l
l
l
3
(NTT)Deep Learning
(PFI) iPS(CiRA)iPS http://blogs.wsj.com/japanrealtime/
2014/10/01/ntt-toyota-seek-deep-learning-expertise/
WSJ 2014/10/1
Deep Learning(1/2)
5
+
(8) () ()
Deep Learning
Deep Learning(2/2)
6Intel ITPRO EXPO AWARD 2014
UI
Deep Learning
PFN
7
Sense
Organize
Analyze
Action
S3 06/3
Simple DB 07/12
EBS 08/8Glacier 12/8
RDS 09/10
Dynamo DB 09/10Aurora 14/11
EMR 09/4
Red Shift 12/11Kinesis 13/11
Storage
Database
Analytics
AWS
8
Data Citation Index(Reuter)
GIGADB
DOI
BioinformaticsPLoS One etc.
9
* Dimensionality Reduction by Learning an Invariant Mapping Raia Hadsell, Sumit Chopra, Yann LeCun, CVPR, 2006
10
11
(0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )
/SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/
RF/ANN
K-means/Spectral Clustering/MMC/LSI/LDA/GM
HMM/MRF/CRF
12
-
-
- AROW, NHERD
- space, n-gram, mecab, mecab-n-gram
- F
-
13
*
Jubatus
***
Jubatus Jubatus Casual Talks #3 **
14
* http://www.briscola.co.jp/media/press/pdf/briscola_press_20140212.pdf ** http://blog.jubat.us/2014/07/jubatus-casual-talks-3.html *** http://itpro.nikkeibp.co.jp/article/NEWS/20140212/536349/
Deep Learning
PFN
15
x1
xN
h1
hH
Neural Network
kM
k1
yM
y1
f1 f2
f3
W2/b2W1/b1
tM
t1
Forward
Backward
W1:1 b1:1 W2:2 b2:216
Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k) f3i(k) = exp(ki)/_{j} exp(kj)
17
(0, 1, 2.5, -1, )(1, 0.5, -2, 3, )(0, 1, 1.5, 2, )
/SVM/LogReg/PA/CW/ALOW/Nave Bayes/CNB/DT/
RF/ANN
K-means/Spectral Clustering/MMC/LSI/LDA/GM
HMM/MRF/CRF
n-gram/BoWSIFT/SURF/HOG/PHOW/BoVW
18
19
2012 Deep Learning
ILSVRC2012
Supervison [Krizhevsky+ 12]
2
4(26%16%)
NN
Deep Learning
Neural Net
DL
16%(12) 11%(13) 6.6%(14)
DL
Google/FaceBook/Microsoft/Baidu
DL
Siri/Google
GoogLeNet
20
Google Brain [Le+ 13]
Deep Learning() Neural Network (ReLU/CNN/NiN)
NN 1
(DropOut/DAE)
(GPGPU)
NN
21
Deep Learning (1/3) Neural Network
ReLU[Nair+ 10]
CNN [LeCun+ 89]
Network in Network[Lin+ 13]
MaxOut
Disentangling [GoodFellow+09][Bengio 14]
SigmoidReLU [Bengio 14]
22
Deep Learning (2/3) Neural Network Layerwise Pretrain[Bengio+07] Auto-Encoder
(SGD)
AdaGrad[Duchi+ 11]
Nesterovs Method/RMSProp
DropOut[Hinton+ 12]
Denoising Auto-Encoder
[ IBIS13] [Srivastava+14]
SGD 23
GPGPU
(DistBelief)
ImageNet1400
Sports1M100
DistBelief[Jeffrey+12]
ImageNet [Deng+ 09] [Karpathy+ 14] 24
Deep Learning (3/3)
Deep Learning (1/3)
NN
GWAS[Puniyani+10], PheWAS
x 1 x N
h 1 h H
k Mk 1
y My 1
k Mk 1
y My 1
k Mk 1
y My 1
25
1
2
3
(QSAR)
* http://blog.kaggle.com/2012/10/31/merck-competition-results-deep-nn-and-gpus-come-out-to-play/
19Deep NN[Dahl+14]
Gr. Merck *
* Fig. 2
26
Deep Learning (2/3)
+ / + +
27
x 1 x N
x 1 x N
h 1 h H
k Mk 1
y My 1
x 1 x N
[Jefferey+14]
Deep Learning (3/3)
[Kiela+14]
2+
CNN
28
Deep Learning
NN
29
GoogLeNetprototxt 2000ILSVRC14
GoogLeNet
30
Deep Learning (1/2)
DNN
Pretrain//
SGD [Dauphin+14]
Do Deep Nets Really Need to be Deep? [Ba+13]
ILSVRC14 GoogLeNet
Deep Learning (2/2) = NN
/Layer/Node//
/Iteration/
ReLUInf
NNDSL,
GoogLeNetprototxt Caffe
2000https://github.com/BVLC/caffe/pull/1367/files31
Deep Learning
[Vinyal+ to appear]
Recurrent NN, LSTM
[Sutskever+14], [Karpathy+ 14]
DNNShallow NN
Model Compression[Bucilua+06] / Distilled NetworkDark Knowledge[Hinton+14]
Deep Learning
Layerwise Pretrain [Arora+13]
Deep (Directed) Generative Model
Generative Stochastic Network[Bengio+13], Generative
Variational Auto-Encoder[Kingma+13]
1
32
Deep Learning
33
Ustream /Slideshare / Research Blog
http://www.slideshare.net/pfi/deep-learning-22350063 http://www.slideshare.net/beam2d/deep-learning20140130 http://www.slideshare.net/beam2d/deep-learning-22544096
Deep Learning
PFN
34
35
Sense
Organize
Analyze
Action
S3 06/3
Simple DB 07/12
EBS 08/8Glacier 12/8
RDS 09/10
Dynamo DB 09/10Aurora 14/11
EMR 09/4
Red Shift 12/11Kinesis 13/11
Storage
Database
Analytics
AWS
Sense
Organize
Analyze
Action
36
(1/3)
37
(2/3)
l
l ///
l /
l //
PFI (Ustream)
38
http://www.slideshare.net/shoheihido/120913-pfi-dist http://www.slideshare.net/shoheihido/130328-slideshare http://www.slideshare.net/shoheihido/ss-25510340
(3/3)
XXX
XXX
39
[+12]
40
41
Unify & GeneralizeSensing , Organize, Analyze, Action
Security Privacy Heterogeneity Distributed Intelligence
CiscoGE
Cisco : Internet of Everything(IoE)
IoE1014
4000
l
l 76105%
l
2013990
42
- White Paper Embracing the Internet of Everything To Capture Your Share of $14.4 Trillion - Industrial Internet: Pushing the Boundaries of Minds and Machines - The Industrial Internet@Work
GE : Industrial Internet
Industrial InternetGDP
20100150
CTMRI40025000
Deep Learning
PFN
43
II
44
Organize (1/2) GGRNA
Google-like full text search engine
(DBCLS)
NCBIRefSeq 13(Zoo)
Sedue
Nucl.AcidsRes.2012[Naito+12]
45
Organize (2/2)GGGenome
DDBJ Release 92.0
(hg19)(mm10)12
RESTful API
ACGTGATC
ACTA ATC
d (ACGTGATC, ACTAATC) = 3
GGRNA/GGGenome
[]
2U1CPU 62 3.46GHz/192GB
GGGenome RefSeq 61 8.6GB 52.4GB
DDBJ 92.0 150.8GB 932.2GB
hg19 3.1GB 19.0GB
GGRNA RefSeq 61 32.4GB 210.3GB
DDBJ() 92.0 559.2GB 3192.8GB
[]
47
48
SQL
R&D
x 1 x Nx 1 x N
h 1 h H
k Mk 1
y My 1
k Mk 1
y My 1
49
SQL
R&D
()
DB+ ()
50
1
1
Sense, Organize, Analyze, Action
Deep
Learning
51
(1/5)
[Arora+13]Arora, Sanjeev, et al. "Provable bounds for learning some deep representations." arXiv preprint
arXiv:1310.6343 (2013).
[Ba+13]Ba, Lei Jimmy, and Rich Caurana. "Do Deep Nets Really Need to be Deep?." arXiv preprint arXiv:
1312.6184 (2013).
[Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural
information processing systems 19 (2007): 153.
[Bengio+13]Bengio, Yoshua, and Eric Thibodeau-Laufer. "Deep generative stochastic networks trainable
by backprop." arXiv preprint arXiv:1306.1091 (2013).
[Bengio14]Bengio, Yoshua. "How auto-encoders could provide credit assignment in deep networks via
target propagation." arXiv preprint arXiv:1407.7906 (2014).
[Bucilua+06]Bucilu, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. "Model compression."
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, 2006. 52
(2/5)
[Dahl+14]Dahl, George E., Navdeep Jaitly, and Ruslan Salakhutdinov. "Multi-task Neural Networks for
QSAR Predictions." arXiv preprint arXiv:1406.1231 (2014).
[Dauphin+14]Dauphin, Yann N., et al. "Identifying and attacking the saddle point problem in high-
dimensional non-convex optimization." Advances in Neural Information Processing Systems. 2014.
[Duchi+11]Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online
learning and stochastic optimization." The Journal of Machine Learning Research 12 (2011): 2121-2159.
[Deng+09]Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
[GoodFellow+09]Goodfellow, Ian, et al. "Measuring invariances in deep networks." Advances in neural
information processing systems. 2009.
[Hinton+12]Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature
detectors." arXiv preprint arXiv:1207.0580 (2012).
53
(3/5)
[Hinton+14]Geoffrey Hinton, Oriol Vinyals, Jeff Dean,Distilling the Knowledge in a Neural Network,Deep
Learning and Representation Learning Workshop: NIPS 2014
[Jeffrey+12]Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in Neural Information
Processing Systems. 2012.
[Jeffrey+14]Large Scale Deep Learning CIKM keynote, 2014,http://static.googleusercontent.com/
media/research.google.com/ja//people/jeff/CIKM-keynote-Nov2014.pdf
[Karpathy+ 14]Karpathy, Andrej, et al. "Large-scale video classification with convolutional neural
networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.
[Kingma+13]Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint
arXiv:1312.6114 (2013).
[Klela+14]Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal
Semantics, D. Kiela and Leon Bottou EMNLP 2014
54
(4/5)
[Krizhevsky+ 12]Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with
deep convolutional neural networks." Advances in neural information processing systems. 2012.
[Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
[LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne
Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in
neural information processing systems 2, NIPS 1989, 396-404
[Lin+13]Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." arXiv preprint arXiv:1312.4400
(2013).
[Nair+ 10]Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann
machines." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.
[Naito+12]Yuki Naito and Hidemasa Bono, GGRNA: an ultrafast, transcript-oriented search engine for
genes and transcripts, Nucl. Acids Res. (2012) 40(W1):W592-W596 55
(5/5)
[Puniyani+10]K. Puniyani, S. Kim, and E. P. Xing, Multi-population GWA mapping via multi-task
regularized regression, Bioinformatics, vol. 26, no. 12, pp. i208-i216, Jun. 2010
[Srivastava+14]Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from
overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.
[Sutskever+14]Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. "Sequence to sequence learning with neural
networks." Advances in Neural Information Processing Systems. 2014.
[13] , , 16
(IBIS2013)
[+12], Edge-Heavy Data: CPS
GICTF 2012,http://www.gictf.jp/doc/20120709GICTF.pdf
56
Copyright 2014-
Preferred Networks All Right Reserved.
Deep Learning(1/2)
59
Deep Learning(2/2)
60
61
Deep Learning
DL : Deep Learning
NN : Neural Network
DNN : Deep Neural Network
CNN : Convolutional Neural Network
RNN : Recurrent Neural Network
LSTM : Long Short-Term Memory
ReLU : Rectified Linear Unit
NiNNetwork in Network
AEAuto-Encoder
DAEDenoising Auto-Encoder
GWAS : Genome-Wide Association Study
PheWAS : Phenome-Wide Association Study
QSAR :Quantitative StructureActivity
Relationship
BoVW : Bug of Visual Word
SIFT :Scale-Invariant Feature Transform
SURF :Speeded Up Robust Features
HOG :Histogram of Oriented Gradients
PHOW :Pyramid Histogram Of visual Words
63
(1/3)
NetNeural Net(NN)
Node = NeuronUnit Node
LayerNode
x1
xN
h1
hH
kM
k1
yM
y1
tM
t1
Forward
Backward
Net Node Layer
(2/3)Layer
NodeLayer
Y=(WX)
W
X
X
WX
Y
W
64
minibatch j
(3/3)
Epoch 1
Epoch N
Epoch 2
Epoch i
Epoch i
minibatch 1
2
minibatch 2
minibatch M
minibatch j
1
B
Epoch (Iteration)1 NetN
SolverNet minibatchNN
65
Deep Learning(1/3)
Perceptron [Rosenblatt58]
PerceptronsPerceptron [Minsky+ 69]
Back Propagation [Bryson+ 69]
BPNN [Werbos74]
Neocognitron[Fukushima80] / Hopfield Network[Hopfield82] / PDP model[Rumelhart+
86] / Gradient Descent[Rumelhart+ 86]
[McClelland+86]
1960
1970
1980
1990
1NN
AI
2NN
AI66
Deep Learning(2/3)
Bayesian Network [Pearl85] Support Vector Machine [Corinna, Vapnik95]
SVMBoostingNN / Vanishing Gradient
Problem[Hochreiter91]
RBM[Smolensky86]/CNN[LeCun89]/
RNN[Jordan86, Elman90]/ LSTM[Hochreiter+
97]
Greedy Layerwise Pretrain[Bengio+07] Deep Belief Network[Hinton+06]
Deep Bolzmann Machine[Salakhutdinov+09]
1990
2000
NN
67
Deep Learning(3/3)
ILSVRCSupervision[Krizhevsky+ 2012]
Google Brain[Le, Ng, Jeffrey+ 2012] Deep Learning
DL GoogLeNet[Szegedy+ 2014]
2010
3NN
2014
68
(1/2)
Sense
Organize
69
(2/2)
Organize
//
///
1000
Analyze /
70
(1/4) [Bengio+07]Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural
information processing systems 19 (2007): 153.
[Bryson+69]Bryson, Arthur E., and Ho Yu Chi. "Applied optimal control." (1969).
[Cortes+95]Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3
(1995): 273-297.
[Elman+90]Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.
[Fukushima80]Fukushima, Kunihiko. "Neocognitron: A self-organizing neural network model for a
mechanism of pattern recognition unaffected by shift in position." Biological cybernetics 36.4 (1980):
193-202.
[Hinton+06]Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep
belief nets." Neural computation 18.7 (2006): 1527-1554.
[Hochreiter91]Hochreiter, Sepp. "Untersuchungen zu dynamischen neuronalen Netzen." Master's thesis,
Institut fur Informatik, Technische Universitat, Munchen (1991). 71
(2/4)
[Hochreiter+97]Hochreiter, Sepp, J. urgen Schmidhuber, and Corso Elvezia. "LONG SHORT-TERM
MEMORY." Neural Computation 9.8 (1997): 1735-1780.
[Hopfield82]Hopfield, John J. "Neural networks and physical systems with emergent collective
computational abilities." Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.
[Jordan86]Jordan, Michael I. Serial Order: A Parallel Distributed Processing Approach. No. ICS-8604.
CALIFORNIA UNIV SAN DIEGO LA JOLLA INST FOR COGNITIVE SCIENCE, 1986.
[LeCun+89] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne
Hubbard, Lawrence D Jackel,Backpropagation applied to handwritten zip code recognition,Advances in
neural information processing systems 2, NIPS 1989, 396-404,
[Krizhevsky+ 12] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with
deep convolutional neural networks." Advances in neural information processing systems. 2012.
72
(3/4)
[Le+13]Le, Quoc V. "Building high-level features using large scale unsupervised learning." Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
[McClelland+86]McClelland, James L., David E. Rumelhart, and PDP Research Group. "Parallel
distributed processing." Explorations in the microstructure of cognition 2 (1986).
[Minsky+69]Minsky, Marvin, and Seymour Papert. "Perceptron: an introduction to computational
geometry." The MIT Press, Cambridge, expanded edition 19 (1969): 88.
[Pearl85]Pearl, Judea. "BAYESIAN NETWCRKS: A MODEL CF SELF-ACTIVATED MEMORY FOR
EVIDENTIAL REASONING." (1985).
[Rosenblatt58]Rosenblatt, Frank. "The perceptron: a probabilistic model for information storage and
organization in the brain." Psychological review 65.6 (1958): 386.
[Rumelhart+86]Rumelhart, David E., James L. McClelland, and PDP Research Group. "Parallel
distributed processing, volume 1: Foundations." MIT Press, Cambridge, MA 19 (1986): 67-70.
73
(4/4)
[Salakhutdinov+09]Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann
machines." International Conference on Artificial Intelligence and Statistics. 2009.
[Smolensky86],Smolensky, Paul. "Information processing in dynamical systems: Foundations
of harmony theory." (1986): 194.
[Szegedy+14],Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:
1409.4842 (2014).
[Werbos74]Werbos, Paul. "Beyond regression: New tools for prediction and analysis in the
behavioral sciences." (1974).
74
Copyright 2014-
Preferred Networks All Right Reserved.