Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Visual Recognition And Search Columbia University, Spring 20131
EECS 6890 – Topics in Information Processing
Spring 2013, Columbia University
http://rogerioferis.com/VisualRecognitionAndSearch
Class 3 Feature Coding and Pooling
Liangliang Cao, Feb 7, 2013
Visual Recognition And Search Columbia University, Spring 20132
Problem
婴儿婴儿婴儿婴儿 bebé아가아가아가아가ब�चा
People may have difficulties
to understand different texts,
but NOT images.
Can you understand the following?
Photo courtesy to luster
The Myth of Human Vision
Visual Recognition And Search Columbia University, Spring 20133
Can the computer vision system
recognize objects or scenes like
human?
Problem
Visual Recognition And Search Columbia University, Spring 20134
Why This Problem Important?
Searching enginesTraditional companies Mobile Apps
Visual Recognition And Search Columbia University, Spring 20135
Problem
http://www.vision.caltech.edu
Examples of Object Recognition Dataset
Visual Recognition And Search Columbia University, Spring 20136
http://groups.csail.mit.edu/vision/SUN/
Problem
Visual Recognition And Search Columbia University, Spring 20137
Overview of Classification Model
Coding
Pooling
Histogram of SIFT
Uncertainty-
Based
Quantization
Sparse
Coding
Fisher vector/
Supervector
Vector quantization
Histogramaggregation
Soft quantization
Soft quantization
GMM probability estimation
Histogram aggregation
Max pooling GMM adptation
Coding: to map local features into a compact representation
Pooling: to aggregate these compact representation together
Visual Recognition And Search Columbia University, Spring 20138
Outline
• Histogram of local features
• Bag of words model
• Soft quantization and sparse coding
• Fisher vector and supervector
Outlines
Visual Recognition And Search Columbia University, Spring 20139
Histogram of Local Features
Visual Recognition And Search Columbia University, Spring 201310
Bag of Words Models
• Powerful local features
– DoG
– Hessian, Harris
– Dense-sampling
Recall of Last Class
Non-fixed number oflocal regions per image!
Visual Recognition And Search Columbia University, Spring 201311
Bag of Words Models
• Histograms can provide a fixed size representation of
images
• Spatial pyramid/gridding can enhance histogram
presentation with spatial information
Recall of Last Class (2)
Visual Recognition And Search Columbia University, Spring 201312
Bag of Words Models
Histogram of Local Features
…..
frequency
codewords dim = # codewords
Visual Recognition And Search Columbia University, Spring 201313
Bag of Words Models
Histogram of Local Features (2)
dim = #codewords x #grids
……
Visual Recognition And Search Columbia University, Spring 201314
…
Local Feature Quantization
Bag of Words Models
Slide courtesy to Fei-Fei Li
Visual Recognition And Search Columbia University, Spring 201315
Local Feature Quantization
Bag of Words Models
…
Visual Recognition And Search Columbia University, Spring 201316
Local Feature Quantization
Bag of Words Models
- Vector quantization- Dictionary learning
…
Visual Recognition And Search Columbia University, Spring 201317
Dictionary for Codewords
Histogram of Local Features
Pix
ture
court
esy t
o F
ei-F
eiLi
Visual Recognition And Search Columbia University, Spring 201318
Bag of Words Models
Most slides in this section are courtesy to Fei-Fei Li
Visual Recognition And Search Columbia University, Spring 201319
ObjectObject Bag of Bag of ‘‘wordswords’’
Visual Recognition And Search Columbia University, Spring 201320
Bag of Words Models
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
Underlining Assumptions - Text
Visual Recognition And Search Columbia University, Spring 201321
Bag of Words Models
Underlining Assumptions - Image
Visual Recognition And Search Columbia University, Spring 201322
categorycategory
decisiondecision
learninglearning
feature detection& representation
KK--meansmeans
image representation
category modelscategory models
(and/or) classifiers(and/or) classifiers
recognitionrecognition
Visual Recognition And Search Columbia University, Spring 201323
Bag of Words Models
Borrowing Techniques from Text Classification
• PLSA
• Naïve Bayesian Model
• wn: each patch in an image
– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
– w = [w1,w2,…,wN]
• dj: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
No
tati
on
s
Visual Recognition And Search Columbia University, Spring 201324
Hoffman, 2001
w
N
d z
D
w
N
c z
D
π
Blei et al., 2001
Probabilistic Latent Semantic Analysis (pLSA)
Latent Dirichlet Allocation (LDA)
Bag of Words Models
Visual Recognition And Search Columbia University, Spring 201325
w
N
d z
D
Bag of Words Models
Probabilistic Latent Semantic Analysis (pLSA)
“face”
Sivic et al. ICCV 2005
Visual Recognition And Search Columbia University, Spring 201326
wN
d z
D
Observed codeworddistributions
Codeword distributionsper theme (topic)
Theme distributionsper image
Slide credit: Josef Sivic
∑=
=
K
k
jkkiji dzpzwpdwp1
)|()|()|(
Bag of Words Models
Parameter estimated by EM or Gibbs sampling
Visual Recognition And Search Columbia University, Spring 201327
)|(maxarg dzpzz
=∗
Slide credit: Josef Sivic
Bag of Words Models
Recognition using pLSA
Visual Recognition And Search Columbia University, Spring 201328
w
N
c z
D
π
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
“beach”
Bag of Words Models
Scene Recognition using LDA
Visual Recognition And Search Columbia University, Spring 201329
Bag of Words Models
Spatial-Coherent Latent Topic Model
Cao and Fei-Fei, ICCV 2007
Visual Recognition And Search Columbia University, Spring 201330
Bag of Words Models
Simultaneous Segmentation and Recognition
Visual Recognition And Search Columbia University, Spring 201331
But these models suffer from
- Loss of information in quantization of “visual words”
- Loss of spatial information
Bag of Words Models
Pros and Cons
Images differ from texts!
Better coding
Better pooling
Bag of Words Models are good in
- Modeling prior knowledge
- Providing intuitive interpretation
Visual Recognition And Search Columbia University, Spring 201332
Soft Quantization and Sparse Coding
Visual Recognition And Search Columbia University, Spring 201333
Soft Quantization
Hard Quantization
Visual Recognition And Search Columbia University, Spring 201334
Soft Quantization
Model the uncertainty across multiple codewords
Uncertainty-Based Quantization
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search Columbia University, Spring 201335
Soft Quantization
Intuition of UNC
Hard quantization
Soft quantization based on uncertainty
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search Columbia University, Spring 201336
Soft Quantization
Improvement of UNC
Visual Recognition And Search Columbia University, Spring 201337
Soft Quantization
Hard quantization can be viewed as an “extremely
sparse representation”
A more general but hard to solve representation
In practice we consider
Sparse Coding-Based Quantization
s.t.
Sparse coding
Visual Recognition And Search Columbia University, Spring 201338
Soft Quantization
Hard quantization can be viewed as an “extremely
sparse representation”
A more general but hard to solve representation
In practice we consider
Sparse Coding-Based Quantization
s.t.
Sparse coding
Visual Recognition And Search Columbia University, Spring 201339
Soft Quantization
Yang et al obtain good
recognition accuracy by
combining sparse
coding with spatial
pyramid and dictionary
training.
Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009
More details will be in
group presentation.
Visual Recognition And Search Columbia University, Spring 201340
Fisher Vector and Supervector
One of the most powerful image/video classification techniques
Thanks to Zhen Li and Qiang Chen constructive suggestions to this section
Visual Recognition And Search Columbia University, Spring 201341
Fisher Vector and Supervector
Winning Systems
2009 2010
Classification task
2011 2012
2010 2011
Large Scale Visual
Recognition Challenge
Visual Recognition And Search Columbia University, Spring 201342
Fisher Vector and Supervector
Literature
These papers are not very easy to read.
Let me take a simplified perspective via coding&pooling framework
[5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010[6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011.
Fisher Vector
[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010
Supervector
Visual Recognition And Search Columbia University, Spring 201343
Fisher Vector and Supervector
• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Coding without Information Loss
Visual Recognition And Search Columbia University, Spring 201344
Fisher Vector and Supervector
• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Coding without Information Loss
Visual Recognition And Search Columbia University, Spring 201345
Fisher Vector and Supervector
An Intuitive Illustration
Coding
Visual Recognition And Search Columbia University, Spring 201346
Fisher Vector and Supervector
An Intuitive Illustration
CodingComponent 1 Component 2 Component 3
Visual Recognition And Search Columbia University, Spring 201347
Fisher Vector and Supervector
An Intuitive Illustration
Component 1 Component 2 Component 3
+ +
+ + +
Pooling
+
Visual Recognition And Search Columbia University, Spring 201348
Fisher Vector and Supervector
Implementation of Supervector
In speech (speaker identification), supervector refer to
stacked means of adaptive GMMs.
Supervector =
Visual Recognition And Search Columbia University, Spring 201349
Origin distribution
Fisher Vector and Supervector
Interpretation with Supervector
New distribution
Picture from Reynolds, Quatieri, and Dunn, DSP, 2001
Visual Recognition And Search Columbia University, Spring 201350
In practice, a normalization process using the
covariance matrix often improves the performance
Moreover, we can subtract the original mean vector for
the ease of normalization
Fisher Vector and Supervector
Normalization of Supervector
The representation is also called Hierarchical Gaussianization
(HG).
Visual Recognition And Search Columbia University, Spring 201351
Fisher Vector and Supervector
Fisher Vector
Now we can define the Fisher Kernel
where is called Fisher information matrix
[Jaakkola and Haussler , NIPS 98] suggested X can be
described by the derivative subject to
Let be the probability density function with para
Visual Recognition And Search Columbia University, Spring 201352
Fisher Vector and Supervector
Fisher Vector with GMM
Let
Consider the Gaussian Mixture Model
We consider
With GMM, Fisher vectors can be obtained:
The Fisher vector
Visual Recognition And Search Columbia University, Spring 201353
Fisher Vector and Supervector
• Supervector
• Fisher vector
Comparison
Visual Recognition And Search Columbia University, Spring 201354
Fisher Vector and Supervector
Comparison
Diagonal covariance matrix
Diagonal covariance with same derivationPosterior estimation of
The two representations are almost the same even with different motivations.
Visual Recognition And Search Columbia University, Spring 201355
Fisher Vector and Supervector
• Learn from existing code
http://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)
• Learn from public GMM code
• Be careful with pitfalls
– Probability is comparable to machine’s rounding error:
compute logP instead of P
– Try different normalization strategy
– Try to make the code efficient
How to Code Your Own
Visual Recognition And Search Columbia University, Spring 201356
Summary
Coding
Pooling
Histogram of SIFT
Uncertainty-
Based
Quantization
Sparse
Coding
Fisher vector/
Supervector
Vector quantization
Histogramaggregation
Soft quantization
Soft quantization
GMM probability estimation
Histogram aggregation
Max pooling GMM adptation
Visual Recognition And Search Columbia University, Spring 201357
• Read the deformable part model
• Enjoy the Rogerio’s talk next week ☺
• Project proposal deadline Feb 19
• Project presentation Feb 21
Todo Before Next Class
Visual Recognition And Search Columbia University, Spring 201358
• Simplest model: histogram of visual words
• Models with good illustration: PLSA, LDA, S-LTM, …
• Soft quantization: soft quantization and sparse coding
• Very good performance: fisher vector or super vector
Summary