Upload
yonghoon-kwon
View
2.170
Download
3
Embed Size (px)
Citation preview
2015
CVL
Wee
kly
Sem
inar
1
Deep LearningBasic Theory and other Application
권용훈
2015
CVL
Wee
kly
Sem
inar
2
Table of Contents
1. Representation Learning
2. Background
3. Concepts and Principles
4. Applications
2015
CVL
Wee
kly
Sem
inar
3
2015
CVL
Wee
kly
Sem
inar
4
2015
CVL
Wee
kly
Sem
inar
5
• upheaval in pattern recognition due to deep learning
Trends in pattern recognition
2015
CVL
Wee
kly
Sem
inar
6
1. Representation Learning
2015
CVL
Wee
kly
Sem
inar
7
Representation Learning
Computer understand information by itself
Car
2015
CVL
Wee
kly
Sem
inar
8
Representation Learning
Computer understand information by itself
Car
HOW?
2015
CVL
Wee
kly
Sem
inar
9
Representation Learning
H Hidden variable
V Visible variable Observable in the real world
Non-Observable in the real world
2015
CVL
Wee
kly
Sem
inar
10
Representation LearningAll
Everything that can be expressed by hidden variable
2015
CVL
Wee
kly
Sem
inar
11
Representation Learning
V Visible variable
All
H Hidden variable
2015
CVL
Wee
kly
Sem
inar
12
Representation LearningAll
HV
2015
CVL
Wee
kly
Sem
inar
13
Representation LearningAll
HV
HVV
2015
CVL
Wee
kly
Sem
inar
14
Representation LearningAll
HV
HVV
...
2015
CVL
Wee
kly
Sem
inar
15
Representation Learning
Connect a number of Visible variables(V) and Hidden variable(H)
…
H VV
V VV
…
……
2015
CVL
Wee
kly
Sem
inar
16
Representation Learning
Connecting the structural relationships that make something
…
H VV
V VV
…
……
2015
CVL
Wee
kly
Sem
inar
17
Representation Learning
V -> H -> X(something)• Expression• Summary• Encoding• Abstraction
V V V V V V V V V V
H H H H H
X
2015
CVL
Wee
kly
Sem
inar
18
Representation Learning
One hidden variable is connected all of v
V V V V V V V V V V
H H H H H
X
v v vv v v v v v
h
X
2015
CVL
Wee
kly
Sem
inar
19
Representation Learning
Single layer
X
h h h h h
v v vv v v v v v
2015
CVL
Wee
kly
Sem
inar
20
Representation Learning
Multi layer
X
v v vv v v v v v
h h h h h
h h h
……
2015
CVL
Wee
kly
Sem
inar
21
Representation Learning
Intuitive Interpretation of Multi Layer
X
v v vv v v v v v
h h h h h
h h h
2015
CVL
Wee
kly
Sem
inar
22
Representation Learning
Intuitive Interpretation of Multi Layer
X
v v vv v v v v v
h h h h h
h h h
abstraction
abstraction
2015
CVL
Wee
kly
Sem
inar
23
Representation Learning
Intuitive Interpretation of Multi Layer
X
v v vv v v v v v
h h h h h
h h h
abstraction
abstraction
2015
CVL
Wee
kly
Sem
inar
24
2. Background
2015
CVL
Wee
kly
Sem
inar
25
Neural networks history• Deep learning is all about deep neural networks
• 1949 : Hebbian learning• Donald Hebb : the father of neural networks
• 1958 : (single layer) Perceptron• Frank Rosenblatt
- Marvin Minsky, 1969
• 1986 : Multilayer Perceptron(Back propagation)• David Rumelhart, Geoffrey Hinton, and Ronald Williams
• 2006 : Deep Neural Networks• Geoffrey Hinton and Ruslan Salakhutdinov
2015
CVL
Wee
kly
Sem
inar
26
Why neural networks?
• Weakness in kernel machine(SVM …):• It does not scale well with sample size.• Based on matching local templates.
• the training data is referenced for test data• Local representation VS distributed representation
• N N(Neural Network) -> Kernel machine -> Deep NN
2015
CVL
Wee
kly
Sem
inar
27
Artificial neural network(ANNs or NN)
Neuron and synapse in brain and ANN Neural networks
• ANNs are computational models inspired by brain• Processing units(nodes vs. neurons)• Connections(weights vs. synapses)
2015
CVL
Wee
kly
Sem
inar
28
Artificial Neural Network(ANN)𝑥1
𝑥2𝑥3
𝑥𝑛
𝑤1𝑤2𝑤3
𝑤𝑛
…
…
Input Output
bias
2015
CVL
Wee
kly
Sem
inar
29
Artificial Neural Network(ANN)𝑥1
𝑥2𝑥3
𝑥𝑛
𝑤1𝑤2
𝑤3
𝑤𝑛
……
𝑧=∑𝑖=1
𝑛
𝑤 𝑖𝑥 𝑖+𝑏 ; 𝑦=𝐻 (𝑧 ) 𝑦Input Output
Activation functionb
2015
CVL
Wee
kly
Sem
inar
30
Deep Neural Network
Input …
…
……
Output
2015
CVL
Wee
kly
Sem
inar
31
Deep Neural Network
Input …
…
……
Output
2015
CVL
Wee
kly
Sem
inar
32
Deep Neural Network
Input …
…
……
Output
2015
CVL
Wee
kly
Sem
inar
33
Training Deep Neural Network
Iteratively update W along error gradient -> gradient descent
Input …
…
…… Output
X y
tTarget
Given training set {()}, Find W that minimizes
𝑤11(1)
𝑤12(1)
𝑤1𝑛(1)
𝑤11(2)
𝑤𝑖𝑗(𝑘)
2015
CVL
Wee
kly
Sem
inar
34
Gradient descent
[http://darkpgmr.tistory.com/133]
gradient ascent <-> gradient descentFind local optimum(global optimum-x)
2015
CVL
Wee
kly
Sem
inar
35
Backpropagation
Input …
…
…… Output
X y
tTarget
• Using chain rule, propagate error derivatives Backwards to compute each nodes contribution to error, • Compute error derivative of each weight using
𝑤11(1)
𝑤12(1)
𝑤1𝑛(1)
𝑤11(2)
𝑤𝑖𝑗(𝑘)
𝛿❑𝐼
𝛿❑( 𝐼−1 )
1
𝛿❑𝑘𝑖=(∑𝑤 𝑖𝑗
𝑘𝛿 𝑗𝑘+1)𝛿′
2015
CVL
Wee
kly
Sem
inar
36
3. Concepts and Principles
2015
CVL
Wee
kly
Sem
inar
37
shallow learning
Paradigm shift on pattern recognitionShallow learning Deep learning
feature extraction by domain experts(SIFT, SURF, orb...)
automatic feature extraction from data
separate modules(feature extractor + trainable classifier)
unified model : end-to-end learning(trainable feature + trainable classifier)
deep learning
2015
CVL
Wee
kly
Sem
inar
38
Inferior temporal (IT) cortex
[DiCarlo 12]
• Visual : ventral stream
2015
CVL
Wee
kly
Sem
inar
39
Representations in deep networks and brain
• Core visual object recognition kernel analysis on neural and DNNs representation[Cadieu 14]
Feedback
2015
CVL
Wee
kly
Sem
inar
40
• The human brain has at least 5 to 10 layers for visual processing• “Hierarchical model” is necessary for human-level intelligence
Why deep?
2015
CVL
Wee
kly
Sem
inar
41
What good comes from “deep”?“Deep”means more layers
• The representation gets more hierarchical and abstract.• It increases the model complexity, which leads to higher accuracy.
𝑥1𝑥2
shallow net-works
𝑦 1𝑦 2
𝑤1 𝑤2
2015
CVL
Wee
kly
Sem
inar
42
What good comes from “deep”?“Deep”means more layers
• The representation gets more hierarchical and abstract.• It increases the model complexity, which leads to higher accuracy.
𝑥1𝑥2
deep networks
𝑦 1𝑦 2
𝑤(1) 𝑤(2) 𝑤(3) 𝑤(4 )
h(1) h(2) h(3)
2015
CVL
Wee
kly
Sem
inar
43
Pre-training• Backpropagation may not work well with deep network
• vanishing gradient problem• lower layers may not learn much about the task.
vanishing gradient
Backward error information vanishing
good initialization is crucial
Pre-training
2015
CVL
Wee
kly
Sem
inar
44
• Neural network has been around since 60’s, but...• Deep NN was difficult to train, due to
• Lack of dataset large enough to train it• Lack of computing power• Lack of efficient training algorithms & techniques
• Now we have all of the above• Readily available large-scale dataset• GPU, multicore/cluster systems• DBN [Hinton 06], ReLU(Rectified linear unit), dropout, …
• Still, more thorough theoretical analysis needed to understand why it works well (or not)
Deep Learning : So Now Why?
2015
CVL
Wee
kly
Sem
inar
45
Deep belief networks(DBNs)
Generative fine tuning discriminative fine tuning
• probabilistic generative model• supervised fine-tuning
• generative: up-down algorithm• discriminative: backpropagation
2015
CVL
Wee
kly
Sem
inar
46
Updating Weights• How much to update?
• Learning rate()• = • Fixed or adaptive• Common recipe : reduce learning rate when validation
error stops decreasing
Error Learning rates reduced
Epoch
2015
CVL
Wee
kly
Sem
inar
47
Updating Weights• How much to update?
• Learning rate()• = • Fixed or adaptive• Common recipe : reduce learning rate when validation
error stops decreasing• Momentum(v)
• Forces GD to keep moving in previous direction
number
2015
CVL
Wee
kly
Sem
inar
48
Updating Weights
• How much to update?• After every training sample(online learning)• After iterating over entire training set(full-batch)• After some training samples(mini-batch)
• Stochastic gradient descent• Faster convergence than full-batch• Efficient computation on GPUs
2015
CVL
Wee
kly
Sem
inar
49
Regularization
• Ways to avoid Overfitting• Weight decay• Weight sharing(CNN)• Early stopping• Model averaging(various model)• Dropout(more on this later)• Pre-training(good initialization)• Adding noise to training data
2015
CVL
Wee
kly
Sem
inar
50
Dropout
• Consider a neural net with one hidden layer• Each time we present a training example, randomly omit each hidden unit with probability 0.5• Randomly sampling from different architectures. All architectures share weights
An Efficient way to average many large neural nets.
Random value > 0.5 Random value < 0.5
2015
CVL
Wee
kly
Sem
inar
51
Other Training Details
• Choice of nonlinear function• Logistic function• Tanh• ReLU(Rectified linear unit)
• F(x) = max(0, x)• Non-saturating• Faster convergence [Nair 10]
Both suffer from saturation problem(slow convergence due to near-0 gradient)
2015
CVL
Wee
kly
Sem
inar
52
Other Training Details
• Softmax and cross-entropy[Ref.]• Normally used instead of squared error loss• Appropriate for representing probability distribution•
• Input preprocessing(pre –processing)• Zero-mean, unit-variance input data yields better shaped error surface
2015
CVL
Wee
kly
Sem
inar
53
𝑥𝑡
h𝑡
¿
𝑥0
h0
𝑥2
h2
𝑥1
h1
𝑥𝑡
h𝑡
…
[http://karpathy.github.io/2015/05/21/rnn-effectiveness]
Recurrent Neural Network(RNN)
2015
CVL
Wee
kly
Sem
inar
54
• Bidirection Neural Network utilize in the past and future context for every point in the sequence
• Two Hidden Layer(Forwards and Backwards) shared same output layer
Visualized of the amount of input information for prediction by different network structures
[Schuster 97]
Bidirection Recurrent Neural network(BRNN)
2015
CVL
Wee
kly
Sem
inar
55
Long short-term memory• Long short-term memory (LSTM) works successfully with sequential data.
• hand writing and speech, etc..• LSTM can model very long term sequential patterns.
• Longer memory has a stabilizing effect.
A node itself is a deep network.
2015
CVL
Wee
kly
Sem
inar
56
RNN LSTM
• RNN forget the previous input(vanishing gradient)
• LSTM remember previous data and reminder if it wants
Problem of RNN
2015
CVL
Wee
kly
Sem
inar
57
h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )
𝜎
𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)
𝐶𝑡 −1 𝐶𝑡
𝑓 𝑡=𝜎 (𝑊 𝑓 ∙ [h𝑡− 1 , 𝑥𝑡 ]+𝑏 𝑓 )
𝑓 𝑡
[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
Step-by-Step LSTM Walk
2015
CVL
Wee
kly
Sem
inar
58
h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )
𝜎
𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)
𝐶𝑡 −1 𝐶𝑡
𝑖𝑡=𝜎 (𝑊 𝑖∙ [h𝑡−1 ,𝑥𝑡 ]+𝑏𝑖)
𝜎
𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛
~𝐶𝑡
~𝐶𝑡= h𝑡𝑎𝑛 (𝑊 𝑐 ∙ [h𝑡 −1 ,𝑥𝑡 ]+𝑏𝑐)[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
Step-by-Step LSTM Walk
2015
CVL
Wee
kly
Sem
inar
h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )
𝜎
𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)
𝐶𝑡 −1 𝐶𝑡
𝐶𝑡= 𝑓 𝑡∗𝐶𝑡− 1+ 𝑖𝑡∗~𝐶𝑡
𝜎
𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛
~𝐶𝑡
ⅹ
+ⅹ
59[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
Step-by-Step LSTM Walk
2015
CVL
Wee
kly
Sem
inar
60
h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )
𝜎
𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)
𝐶𝑡 −1 𝐶𝑡
𝑂𝑡=𝜎 (𝑊 𝑜 ∙ [h𝑡 −1 ,𝑥𝑡 ]+𝑏𝑜)
𝜎
𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛
~𝐶𝑡
ⅹ
+ⅹ
𝜎ⅹ
h𝑡𝑎𝑛
h𝑡
h𝑡
h𝑡=𝑂𝑡∗𝑡𝑎nh(𝐶𝑡 )[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
Step-by-Step LSTM Walk
2015
CVL
Wee
kly
Sem
inar
61
LSTM Regularization with Dropout
• Dropout operator only to non-recurrent connections
[Zaremba14]
Arrow dash applied dropout otherwise solid line is not applied
: hidden state in layer in timestep
dropout operator
Frame-level speech recognition accuracy
2015
CVL
Wee
kly
Sem
inar
62
decode
encode
V1
W1
X2
X1
X1
V1
W1
X2
X1
X1
X2
V2
W2
X3
• Regress from observation to itself (input X1 -> output X1)• ex : data compression(JPEG etc..)
[Lemme 10]
Auto Encoder
output
hidden
input
2015
CVL
Wee
kly
Sem
inar
63
0 1 0 0…
0.05 0.7 0.5 0.01…
0.9 0.1 10− 8…10− 4
cow dog cat bus
original target
output of ensemble
[Hinton 14]
Softened outputs reveal the dark knowledge in the ensemble
dog
dog
training result
cat buscow
dog cat buscow
Dark knowledge
2015
CVL
Wee
kly
Sem
inar
• Distribution of the top layer has more information.• Model size in DNN can increase up to tens of GB
input
target
input
output
Training a DNN
Training a shallow network
64
Dark knowledge
[Hinton 14]
2015
CVL
Wee
kly
Sem
inar
65
0 1 0 0 0 0 0 0 0 0dog
Word EmbeddingLanguage Understanding(semantic)
0 0 1 0 0 0 0 0 0 0 cat
• Word embedding function mapping to high-dimensional vectors
0.3 0.2 0.1 0.5 0.7dog
0.2 0.8 0.3 0.1 0.9cat
one hot vector representation
[Vinyals 14]Nearest neighbors a few words
Word Embedding
2015
CVL
Wee
kly
Sem
inar
66
: time sequence : gain : bias : weight value of the between neuron and : external input for neuron : non-linear function() : rate of change activation post synaptic neuron
Input NodesHidden Nodes
Output Nodes(subset of hidden nodes)
𝜏 𝑖( 𝑑𝑦𝑖𝑑𝑡 )=− 𝑦 𝑖+∑𝑊 𝑗𝑖𝜎 (𝑔 𝑗 ( 𝑦 𝑗−𝑏 𝑗 ) )+𝐼 𝑖
Update Equation
Continuous-Time RNN(CTRNN)
• Dynamic system model of biological neural network(walk, bike, etc..)
• Ordinary differential equations to model the effects on a neuron of the training(using Generic Algorithm)
2015
CVL
Wee
kly
Sem
inar
67
4. Applications
2015
CVL
Wee
kly
Sem
inar
68
Convolutional Neural Network(CNN)
• Handwritten digit recognition [LeCun 98]• (Convolution-Subsampling) N + (Full connection) M
• Neural network that makes use of prior knowledge about im-ages
Features extraction Classification
2015
CVL
Wee
kly
Sem
inar
69
Convolutional Neural Network(CNN)
• Incorporate prior knowledge about images• Locality : each pixel is only related to small neighborhood of pixels -> local connectivity
2015
CVL
Wee
kly
Sem
inar
70
Convolutional Neural Network(CNN)
• Incorporate prior knowledge about images• Locality : each pixel is only related to small neighborhood of pixels -> local connectivity• Stationarity : image statistics are invariant over all image locations -> Shared Weights
2015
CVL
Wee
kly
Sem
inar
71
Convolutional Neural Network(CNN)
• Convolution kernels with learned parameters• Learn multiple kernels(filter)• Still much fewer parameters than fully connected model
2015
CVL
Wee
kly
Sem
inar
72
Convolutional Neural Network(CNN)
• Subsampling(pooling)• NxN -> 1• Max pooling, Average pooling• Invariance to small translation• Larger receptive fields in upper layers
2015
CVL
Wee
kly
Sem
inar
73
Convolutional Neural Network(CNN)
• Backpropagation• Convolution layer
• dE/dW : Error summed and propagated from all nodes in which current weight W occurs
• Pooling layer• Max pooling : error propagated back to max node only• Average pooling : error uniformly propagated back to all pooled nodes
2015
CVL
Wee
kly
Sem
inar
74
Application : Image Classification
• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC, 2010~)
• Image classification / localization• 1,200,000(1.2M) labeled images, 1000 classes• 2012 : CNN won the contest by large margin• CNN has been dominating the contest since..
• 2012 : 15.3% (top-5 error), 2nd(26.2%)• 2013 : 11.2%• 2014 : 6.7%
2015
CVL
Wee
kly
Sem
inar
75
ImageNet Challenge
[Krizhevsky 12]
2015
CVL
Wee
kly
Sem
inar
76
Super Vision Team
Geoffrey Hinton (right) Alex Krizhevsky, and Ilya Sutskever (left)
2015
CVL
Wee
kly
Sem
inar
77
[Krizhevsky 12]• Deep : 5conv. Layers + 3 fully connected• Trained using 2GPUs• Top-5 error : 15.3 % vs 26.2%(2nd place, non-CNN)
ImageNet Challenge 2012
2015
CVL
Wee
kly
Sem
inar
78
[Krizhevsky 12]• ReLU • Overfitting prevention
• Data augmentation• Random translation, horizontal flip• Color perturbation
• Dropout• Randomly sets node activation to 0• Has an effect of simultaneously learning multiple architectures• Reduces co-adaptation between neurons[Hinton 12]
ImageNet Challenge 2012
2015
CVL
Wee
kly
Sem
inar
79
[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer
ImageNet Challenge 2013awesome performance!!
2015
CVL
Wee
kly
Sem
inar
80
[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer
ImageNet Challenge 2013
2015
CVL
Wee
kly
Sem
inar
81
[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer
ImageNet Challenge 2013
2015
CVL
Wee
kly
Sem
inar
82
[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer
ImageNet Challenge 2013
Dead Filter
2015
CVL
Wee
kly
Sem
inar
83
[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer
ImageNet Challenge 2013
2015
CVL
Wee
kly
Sem
inar
84
[Howard 13]
ImageNet Challenge 2013
• Utilize entire input image instead of cropping out edges (as opposed to [krizhevsky 12])
[Sermanet 13]• Multi-scale training• Efficient computation of dense localization
2015
CVL
Wee
kly
Sem
inar
85
ImageNet Challenge 2014
[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction
2015
CVL
Wee
kly
Sem
inar
86
[Lin 14]: “Network–in–network”
ImageNet Challenge 2014
• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging
2015
CVL
Wee
kly
Sem
inar
87
ImageNet Challenge 2014
[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging
2015
CVL
Wee
kly
Sem
inar
88
ImageNet Challenge 2014
CNN NIN
[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging
2015
CVL
Wee
kly
Sem
inar
89
ImageNet Challenge 2014
[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging
2015
CVL
Wee
kly
Sem
inar
90
ImageNet Challenge 2014
[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging• Equivalent to 1x1 convolution
2015
CVL
Wee
kly
Sem
inar
91
ConvolutionPoolingSoftmaxOther
ImageNet Challenge 2014
[Szegedy 14] : “GoogLeNet”
2015
CVL
Wee
kly
Sem
inar
92
ImageNet Challenge 2014
[Szegedy 14] : “GoogLeNet”• 22-layer network trained on 16k CPU cores [Dean 12]• 9 “Inception” modules (multi-scale convolution)• Average pooling• Auxiliary classifiers• 12x fewer parameters than [krizhevsky 12]-60,000,000
2015
CVL
Wee
kly
Sem
inar
93
ImageNet Challenge 2014
[Szegedy 14] : “GoogLeNet”• Inception” modules (multi-scale convolution)
• Heterogeneous concatenation of multi-scale convolution• [Arora 14] “cluster correlated neurons together”• 1x1 convolution used for dimension reduction
2015
CVL
Wee
kly
Sem
inar
94
ImageNet Challenge 2014
[Simonyan 14](Oxford Univ)• Very deep CNN:
Deeper netsInitialized w/shallower net
• 3x3 convolutions• 2x more parameters than [Krizhevsky 12]• Multi-GPU• Multi-scale training
2015
CVL
Wee
kly
Sem
inar
95
ImageNet Challenge 2014
[Wu 15] (Baidu)• Beats GoogLeNet: 6.67 % -> 5.98%• Custom-built supercomputer: 4GPUs x 36 nodes (Nvidia Tesla K40m)• Aggressive data augmentation• Multi-scale training with high-res image
2015
CVL
Wee
kly
Sem
inar
96
ImageNet Challenge 2014
[Wu 15] (Baidu)• Beats GoogLeNet: 6.67 % -> 5.98%• Custom-built supercomputer: 4GPUs x 36 nodes (Nvidia Tesla K40m)• Aggressive data augmentation• Multi-scale training with high-res image
Data augmentation
2015
CVL
Wee
kly
Sem
inar
97
ImageNet Challenge 2015
• ImageNet Challenge 2015 will be open (November 13, 2015) submission deadline• 2012 non-CNN : 26.2%(top-5 error)• 2012 AlexNet : 15.3%• 2013 Clarifai : 11.2%• 2014 GoogLeNet : 6.7%• (pre-2015): (Google) 4.9%
• Beyond human-level performance
[ImageNet Challenge]
2015
CVL
Wee
kly
Sem
inar
98
ImageNet Challenge
• Common recipes• Deep (many conv layers), ReLU, dropout• Random crop training (translation, horizontal flip)• Multi-scale or Random-scale training• Color perturbation• Multi-crop testing• Multi-model averaging
• Focus gradually moving away from classification to classification + localization
2015
CVL
Wee
kly
Sem
inar
99
Auto Caption
Auto Caption(Google)
Neural Talk(Stanford Univ.)http://cs.stanford.edu/people/karpathy/deepimagesent/
2015
CVL
Wee
kly
Sem
inar
100
• Text-image multimodal learning• Learn mapping between image and word space • Generate sentence describing image & find image matching given sentence
CNN(convolutional neural net) + RNN(recurrent neural net)
Auto CaptionShow and Tell : A Neural Image Caption Generator
[Vinyals 14]
describing true sentence
2015
CVL
Wee
kly
Sem
inar
101
[Karpathy 14]
[Girshick 13]
• Generate dense, free-from descriptions of images
Infer region word alignments use to R-CNN + BRNN + MRF
Image Segmentation(Graph Cut + Disjoint union)
Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption
2015
CVL
Wee
kly
Sem
inar
102
[Karpathy 14]Infer region word alignments use to R-CNN + BRNN + MRF
Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption
𝑆𝑘𝑙=∑𝑡 ∈𝑔𝑙
∑𝑖∈𝑔𝑘
𝑚𝑎𝑥(0 ,𝑣 𝑖𝑇 𝑆𝑡)Result BRNN
Result RNN
𝑔𝑙𝑔𝑘
• and with their additional Multiple Instance Learning
hⅹ4096 maxrix(h is 1000~1600)
t-dimensional word dictionary
2015
CVL
Wee
kly
Sem
inar
103
[Karpathy 14]
Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption
Smoothing with an MRF
• Best region independently align each other• Similarity regions are arrangement nearby
• Argmin can found dynamic programming
(word, region)
2015
CVL
Wee
kly
Sem
inar
104
Auto Caption
• Generation Methods on Auto Caption1) Compose descriptors directly from recognized content2) Retrieve relevant existing text given recognized content
Related Works
• Compose descriptions given recognized content Yao et al. (2010), Yang et al. (2011), Li et al. ( 2011), Kulkarni et al. (2011)
• Generation as retrieval Farhadi et al. (2010), Ordonez et al (2011), Gupta et al (2012), Kuznetsova et al (2012)
• Generation using pre-associated relevant text Leong et al (2010), Aker and Gaizauskas (2010), Feng and Lapata (2010a)
• Other (image annotation, video description, etc) Barnard et al (2003), Pastra et al (2003), Gupta et al (2008), Gupta et al (2009), Feng and Lapata (2010b), del Pero et al (2011), Krishnamoorthy et al (2012), Barbu et al (2012), Das et al (2013)
2015
CVL
Wee
kly
Sem
inar
105
Other Vision Applications
• Face recognition [Taigman 14] – Deep Face• 97.25% (state-of –the art nearing human perfor-
mance)• 4.4M faces of 4K people• 3D face alignment + locally connected neural net-
work
2015
CVL
Wee
kly
Sem
inar
106
Other Vision Applications• Sequence convert to sequence learning• Sequence representation 1000D -> PCA 2D
sensitive to word order[Sutskever 14]
Invariant to active voice and passive voice
2015
CVL
Wee
kly
Sem
inar
107
Other Vision Applications
• Data regularities are captured in multimodal vector space• possible in a multimodal representation (in an Euclidean
space)
[Kiros 14]
vec(QS rank) + vec(gist) = vec(world ranking 2)
2015
CVL
Wee
kly
Sem
inar
108
• Divided to five part of human body(two arms, two legs, trunk)• Modeling movements of these individual part and layer composed of 9
layers(BRNN, fusion layer, fully connection layer)
[Yong 15]
Hierarchical RNN for skeleton based Action Recognition
2015
CVL
Wee
kly
Sem
inar
109
Leading experts in deep learning
2015
CVL
Wee
kly
Sem
inar
110
Summary
• Deep architectures perform better than existing shallow ones because they learn hierarchical representation of data
• Now it’s possible to train deep neural networks thanks to the availability of:• Large-scale training data• High-performance computing devices• Newly developed training algorithms & techniques
• Common rules of thumb for improving performance of DNN:• Make it deeper & larger (ensuring that it does not overfitting)• Use ReLU for faster convergence & dropout as regularization• Apply various data augmentation schemes to increase effective amount of training data• Average predictions from multiple models & input crops
2015
CVL
Wee
kly
Sem
inar
111
Resourceshttp://deeplearning.net/
• “Learning deep architectures for AI” by Y. Bengio 2009• “Deep learning in neural networks : An overview” by J. Schmidhuber 2014• “Maching Learning to Deep Learning by 곽동민
• DBN (Science paper's code) : Hinton (Matlab)• http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
• convolutional neural networks : LeCun• Alex Krizhevsky: Hinton (python, C++)
• https://code.google.com/p/cuda-convnet/• Caffe: UC Berkeley (C++)
• http://caffe.berkeleyvision.org/• pylearn2: Bengio (python)
• https://github.com/lisa-lab/pylearn2• CURRENNT: Weninger et al (Munchen) (C++)
• http://sourceforge.net/projects/currennt/• Libraries : Torch(http://torch.ch/), Theano(http://deeplearning.net/software/theano/)
2015
CVL
Wee
kly
Sem
inar
112
THANK YOU