Coding and Pooling llcao - IBM...exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value Underlining Assumptions -Text Visual Recognition And Search 21 Columbia

Visual Recognition And Search Columbia University, Spring 20131

EECS 6890 – Topics in Information Processing

Spring 2013, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch

Class 3 Feature Coding and Pooling

Liangliang Cao, Feb 7, 2013


Problem

婴儿婴儿婴儿婴儿 bebé아가아가아가아가ब�चा

People may have difficulties

to understand different texts,

but NOT images.

Can you understand the following?

Photo courtesy to luster

The Myth of Human Vision


Can the computer vision system

recognize objects or scenes like

human?

Problem


Why This Problem Important?

Searching enginesTraditional companies Mobile Apps


Problem

http://www.vision.caltech.edu

Examples of Object Recognition Dataset


http://groups.csail.mit.edu/vision/SUN/

Problem


Overview of Classification Model

Coding

Pooling

Histogram of SIFT

Uncertainty-

Based

Quantization

Sparse

Coding

Fisher vector/

Supervector

Vector quantization

Histogramaggregation

Soft quantization

Soft quantization

GMM probability estimation

Histogram aggregation

Max pooling GMM adptation

Coding: to map local features into a compact representation

Pooling: to aggregate these compact representation together


Outline

• Histogram of local features

• Bag of words model

• Soft quantization and sparse coding

• Fisher vector and supervector

Outlines


Histogram of Local Features


Bag of Words Models

• Powerful local features

– DoG

– Hessian, Harris

– Dense-sampling

Recall of Last Class

Non-fixed number oflocal regions per image!


Bag of Words Models

• Histograms can provide a fixed size representation of

images

• Spatial pyramid/gridding can enhance histogram

presentation with spatial information

Recall of Last Class (2)


Bag of Words Models


…..

frequency

codewords dim = # codewords


Bag of Words Models

Histogram of Local Features (2)

dim = #codewords x #grids

……


…

Local Feature Quantization

Bag of Words Models

Slide courtesy to Fei-Fei Li



Bag of Words Models

…



Bag of Words Models

- Vector quantization- Dictionary learning

…


Dictionary for Codewords


Pix

ture

court

esy t

o F

ei-F

eiLi


Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li


ObjectObject Bag of Bag of ‘‘wordswords’’


Bag of Words Models

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

Underlining Assumptions - Text


Bag of Words Models

Underlining Assumptions - Image


categorycategory

decisiondecision

learninglearning

feature detection& representation

KK--meansmeans

image representation

category modelscategory models

(and/or) classifiers(and/or) classifiers

recognitionrecognition


Bag of Words Models

Borrowing Techniques from Text Classification

• PLSA

• Naïve Bayesian Model

• wn: each patch in an image

– wn = [0,0,…1,…,0,0]T

• w: a collection of all N patches in an image

– w = [w1,w2,…,wN]

• dj: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

No

tati

on

s


Hoffman, 2001

w

N

d z

D

w

N

c z

D

π

Blei et al., 2001

Probabilistic Latent Semantic Analysis (pLSA)

Latent Dirichlet Allocation (LDA)

Bag of Words Models


w

N

d z

D

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005


wN

d z

D

Observed codeworddistributions

Codeword distributionsper theme (topic)

Theme distributionsper image

Slide credit: Josef Sivic

∑=

=

K

k

jkkiji dzpzwpdwp1

)|()|()|(

Bag of Words Models

Parameter estimated by EM or Gibbs sampling


)|(maxarg dzpzz

=∗

Slide credit: Josef Sivic

Bag of Words Models

Recognition using pLSA


w

N

c z

D

π

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Bag of Words Models

Scene Recognition using LDA


Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007


Bag of Words Models

Simultaneous Segmentation and Recognition


But these models suffer from

- Loss of information in quantization of “visual words”

- Loss of spatial information

Bag of Words Models

Pros and Cons

Images differ from texts!

Better coding

Better pooling

Bag of Words Models are good in

- Modeling prior knowledge

- Providing intuitive interpretation


Soft Quantization and Sparse Coding


Soft Quantization

Hard Quantization


Soft Quantization

Model the uncertainty across multiple codewords

Uncertainty-Based Quantization

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Improvement of UNC


Soft Quantization

Hard quantization can be viewed as an “extremely

sparse representation”

A more general but hard to solve representation

In practice we consider

Sparse Coding-Based Quantization

s.t.

Sparse coding


Soft Quantization

Hard quantization can be viewed as an “extremely

sparse representation”

A more general but hard to solve representation

In practice we consider

Sparse Coding-Based Quantization

s.t.

Sparse coding


Soft Quantization

Yang et al obtain good

recognition accuracy by

combining sparse

coding with spatial

pyramid and dictionary

training.

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

More details will be in

group presentation.


Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chen constructive suggestions to this section



Winning Systems

2009 2010

Classification task

2011 2012

2010 2011

Large Scale Visual

Recognition Challenge



Literature

These papers are not very easy to read.

Let me take a simplified perspective via coding&pooling framework

[5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010[6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011.

Fisher Vector

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010

Supervector



• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Coding without Information Loss



• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Coding without Information Loss



An Intuitive Illustration

Coding




CodingComponent 1 Component 2 Component 3




Component 1 Component 2 Component 3

+ +

+ + +

Pooling

+



Implementation of Supervector

In speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =


Origin distribution


Interpretation with Supervector

New distribution

Picture from Reynolds, Quatieri, and Dunn, DSP, 2001


In practice, a normalization process using the

covariance matrix often improves the performance

Moreover, we can subtract the original mean vector for

the ease of normalization


Normalization of Supervector

The representation is also called Hierarchical Gaussianization

(HG).



Fisher Vector

Now we can define the Fisher Kernel

where is called Fisher information matrix

[Jaakkola and Haussler , NIPS 98] suggested X can be

described by the derivative subject to

Let be the probability density function with para



Fisher Vector with GMM

Let

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

The Fisher vector



• Supervector

• Fisher vector

Comparison



Comparison

Diagonal covariance matrix

Diagonal covariance with same derivationPosterior estimation of

The two representations are almost the same even with different motivations.



• Learn from existing code

http://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)

• Learn from public GMM code

• Be careful with pitfalls

– Probability is comparable to machine’s rounding error:

compute logP instead of P

– Try different normalization strategy

– Try to make the code efficient

How to Code Your Own


Summary

Coding

Pooling

Histogram of SIFT

Uncertainty-

Based

Quantization

Sparse

Coding

Fisher vector/

Supervector

Vector quantization

Histogramaggregation

Soft quantization

Soft quantization

GMM probability estimation

Histogram aggregation

Max pooling GMM adptation


• Read the deformable part model

• Enjoy the Rogerio’s talk next week ☺

• Project proposal deadline Feb 19

• Project presentation Feb 21

Todo Before Next Class


• Simplest model: histogram of visual words

• Models with good illustration: PLSA, LDA, S-LTM, …

• Soft quantization: soft quantization and sparse coding

• Very good performance: fisher vector or super vector

Summary

Documents

Coding and Pooling llcao - IBM...exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value Underlining Assumptions -Text Visual Recognition And Search 21 Columbia