Multimedia Analysis - Cornell Universitychenlab.ece.cornell.edu/Publication/Tsuhan/20061103PCM(distr).pdf · 03/11/2006 · Special issues: networked multimedia 2001, multimedia

Multimedia Analysis:Marriage of Signal Processing and Machine Learning

Tsuhan Chen 陈祖翰陈祖翰陈祖翰陈祖翰Professor, Carnegie Mellon University

[email protected]

Chen2006

A 10-Year Journey…

IEEE Multimedia Signal Processing (MMSP) Technical Committee, 1996~

MMSP WorkshopsPrinceton 1997, Los Angeles 1998, Copenhagen 1999, Cannes 2001, St. Thomas 2002, Siena 2004, Shanghai 2005, Victoria 2006

International Conf on Multimedia (ICME)New York 2000, Tokyo 2001, Lausanne 2002, Baltimore 2003, Taipei 2004, Amsterdam 2005, Toronto 2006, Beijing 2007

IEEE Transactions on Multimedia, March 1999~Special issues: networked multimedia 2001, multimedia database 2002, multimodal interface 2003, streaming media 2004, MPEG-21 2005

SPS Distinguished Lecturer 2007

Chen2006

Multimedia Analysis: Bits vs. Content

images face possible all of number>>

population worldhistory human36524606030 ××××××>>

Number of all possible 16×12 images812162 ××=

[Baker/Kanade]

Reconstruct

It’s Bayesian in machine learning, a.k.a., prior…

Chen2006

Thoughts

“The most compelling shapes are those near to our

hearts: people’s faces, a gracefully moving body, a

natural scene with rustling leaves and flowing water.

Evolution has tuned us to these sights…”[Lengyel, 1998]

Multimedia analysis… more than processing bits…

it’s all about the content …

it’s signal processing + machine learning[Chen, 2006]

Chen2006

Content-Based Information Retrieval

Many Interesting Applications…

Chen2006

Hand-Drawn

Query

Retrieved Trademarks

Logos [Leung&Chen ICME’02]

Chen2006

Hand-Drawn Sketches

User sketches a query

Query

Sketch

Similar

Sketch

Page Stored in

Database

[Leung&Chen ICME’03]

Chen2006

3D Objects [Zhang&Chen ACM MM’01]

Chen2006

Chen2006

3D Protein Structures [Chen&Chen ICIP’02]

Chen2006

Content-Based Information Retrieval (CBIR)

Multimedia

Database f1

f2

Low-Level

Feature Space

Feature

Extraction

Query

Feature

Extraction

2

1

f

f

Indexing

Indexed

Feature

Database

Retrieval

Results

Simila

rity

Measure

Chen2006

High-Level Semantics

e.g. hierarchical attribute tree

Architecture Anatomy

Buildings

(133) (73)

Building_Materials/

Misc

Landmarks

Body-parts

Children

Female MaleMonsters/

Androids/

Aliens

(32)

(87)

(14)

(47)

(1)

(6) (8)

(12)

BIG CHALLENGE:

How to bridge the gap

between low-level

features and high-

level semantics?

Chen2006

Possible Solutions

Multimedia

Database f1

f2

Low-level

Feature space

Feature

Extraction

Query

Feature

Extraction

2

1

f

f

Indexing

Indexed

Feature

Database

Retrieval

Results

Simila

rity

Measure

Hidden

Annotation

Relevance

Feedback

Chen2006

Semantic Information

Hidden annotation

Object #n has Attribute #k � explicit

Relevance feedback

Objects #m and #n are (not) similar � implicit

Q: How to represent and propagate semantic information?

Q: How to use explicit/implicit semantic information to improve retrieval?

Chen2006

Semantic Information as Probabilities

pNK…pN3pN2pN1Object

N

p2K…p23p22p21Object

2


1

Attribute

K…

Attribute

3

Attribute

2

Attribute

1

…… … … …

pnk : Attribute Probabilities

Chen2006

Object

N

0101Object

2

Object

1

Attribute

K…

Attribute

3

Attribute

2

Attribute

1

…… … … …

Q: How to propagate?

When an object is annotated, pnk is set to 0/1

Annotate one object…

A: Based on low- level features

Chen2006

Semantic Propagation

pprior

f1

f2

pi k

pprior

f1

f2

pi k

“Biased Kernel Regression”

Chen2006

pprior

f1 f

2

pi k

Semantic Propagation (cont.)

Chen2006

Q: Which to annotate next?

Semantic Propagation (cont.)

pNK…pN3pN2pN1Object

N

0101Object

2


1

Attribute

K…

Attribute

3

Attribute

2

Attribute

1

…… … … …

Chen2006

Active Learning

Choose the most uncertain object to annotate

Uncertainty determined by the entropy of attribute probabilities

“Selective sampling”

May want to consider density in feature space too

Teacher Student

Teacher Student

Annotator Retrieval System

Passive Learning

Active Learning

Chen2006

Recap…

Maintain attribute probabilities of each object

Set an attribute probability to 1/0 when annotated

Propagate probabilities to non-annotated objects

Choose the most uncertain object in the database to annotate next

Use probabilities to estimate uncertainty

Use probabilities to measure semantic distance…

Chen2006

Result

0

0.1

0.2

0.3

0.4

0.5

0 300 600 900 1200 1500 1800

Number of Annotated Models

Av

era

ge

Ma

tch

ing

Err

or

(Err

)

Random Sampling

Our algorithm

3D Objects

(1750 total)

[Zhang&Chen T-MM’02]

Chen2006

Relevance Feedback

Relevance feedback

Ask for user’s feedback during the retrieval

“Object #i is (not) similar to the query”

“Objects #m and #n are (not) similar”

Implicit semantic information

Use feedback to improve retrieval

Way 1: Move the query point

Way 2: Weigh the features

Way 3: “Warp” the feature space

Chen2006

An Example

Retrieved Results

(inside circle)

Query

1f

2f

Chen2006

Retrieved Results

(inside circle)

Query

1f

2f

Positive feedback

Negative feedback

An Example

Chen2006

Move the Query Point

New Retrieved Results

(inside circle)

New Query

1f

2f

Positive feedback

Negative feedback

Chen2006

Feature Weighting


(inside ellips)

Query

1f

2f

Positive feedback

Negative feedback

Chen2006

Feature Space Warping


(inside circle)

Query

'1f

'2f

Positive feedback

Negative feedback

Retrieved Results

(inside circle)

Query

1f

2f

Positive feedback

Negative feedback

Before Warping After Warping

Chen2006

Feature Space Warping

Feature Space

Query

Positive

Feedback

Negative

Feedback

( ) iq

M

j

ijipi vvcuv

−= ∑

=1

expγ This is also semantic propagation!!!

Chen2006

Experiment Result

D ata b a s e P e rfo rm an ce I n c re a s e w ith

G am m a= 0 .3 , c= 6*p i

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

0 1 2 3 4

n u m b e r o f fe e d b a c k i te ra tio n s

% p

erf

orm

an

ce i

ncre

ase

T= 3

T= 7

T= 1 2

Performance Improvement

3D Objects

(1750 total)

[Bang&Chen ICIP’02]

Chen2006

Semantic Propagation is the Key

Without semantic propagation, hidden annotation and relevance feedback are not very useful

With enough relevance feedback, can we can accomplish information retrieval without low-level features at all?

Chen2006

Pushing Content to Extreme

Content-Free Information Retrieval…

Chen2006

Content-Free Information Retrieval (CFIR)

With enough relevance feedback, retrieval is based more and more on feedback, less and less on features

In the extreme case, retrieval based on feedback only

Retrieval based on user history

e.g., Amazon.com

Chen2006

Example -- How CFIR Works

1001

?110

0110

User History

Chen2006

1001

0110

0110

User History

Example -- How CFIR Works

and are more similar than and

Chen2006

CBIR vs. CFIR

Will user U like image X ?

Two different approaches:

Look at what U likes

� Characterize images � Content-based IR

Look at which users like X

� Characterize users � Content-free IR

Chen2006

Bayesian Framework

Retrieval based on

1

1

)1(

)1|1(

)1,...,1|1(1

−

=

=

==Π∝

===

F

i

ji

F

k

jji

xP

xxP

xxxP

k

F

Pair-wise conditional probability matrix

jixxP ji ,)1|1(~

∀==

User History

Chen2006

Experiment Results

20 40 60 80 1000

0.5

1

1.5

2

2.5

3

Recall (%)

Pre

cis

ion

(%

)

Inverse varianceOne-class SVMBayesian product rule

Random

20 40 60 80 10010

20

30

40

50

60

70

80

Recall (%)

Pre

cis

ion

(%

)

Product rule Max entropy Sum rule

CBIR Methods CFIR Methods

Sample Images

[Liu&Chen ICASSP’05]

Chen2006

Content without User Feedback

Extracting content from nothing…

Unsupervised Image CategorizationUnsupervised Image Categorization

[Caltech face + background dataset]

Unsupervised Image CategorizationUnsupervised Image Categorization

[UIUC car dataset]

“Bag of Words” Representation“Bag of Words” Representation

DoG interest point detector + SIFT descriptor [Lowe]

Vector Quantization

Count

Words

Codebook

Graphical ModelGraphical Model

count count

zface

w

zcar

w

words words

Topic Appearance

w

w

w

w

: topic

: word

Need to handle background…Need to handle background…

count

words

Solution: Add a hidden layer � PLSA

Probabilistic Latent Semantic Analysis (PLSA)Probabilistic Latent Semantic Analysis (PLSA)

• Learning: Maximum likelihood estimation using EM algorithm

d z w

z w

z w

d z w

z w

z w

doc 1

doc 2

Document CharacteristicTopic Appearance

• Hofmann 01, Monay and Gatica-Perez 04, Sivic et al. 05, Quelhas et al. 05

• Can model complex scenes

: image

: topic

: word

• Inference:

– Categorization

– Segmentation

Problem with PLSAProblem with PLSA

“Bag of Words” is the problem…“Bag of Words” is the problem…

Picasso, 1943

Dali, 1936

• As long as the parts are present, the exact position does not matter too much

Not so for general objects!

Enforcing ClusteringEnforcing Clustering

• A number of S =10 fixed spatial distributions

d z

s

w

x

Nd

D

µ ΣS

[Liu&Chen ICIP’06]

s1 to s9s10appearance

position

Enforcing Clustering: Semantic-ShiftEnforcing Clustering: Semantic-Shift

d z w

xµΣ

Nd D

Document CharacteristicTopic AppearanceLocation Semantics

: image

: topic

: word appearance

: word position

[Liu&Chen CVPR’06]

Representing Location SemanticsRepresenting Location Semantics

d z w

xµΣ

Nd D

• Assume single foreground object

• Location semantics

– Foreground:

– Background:

Complement of foreground distribution

Learning in Semantic-ShiftLearning in Semantic-Shift

: bag of word- position pairs

Document Characteristic

Topic Appearance

posterior

Learning in Semantic-ShiftLearning in Semantic-Shift

Location semantics Topic appearanceposterior

Learning all 3 terms simultaneously…

Completely unsupervised…

This is why “semantic-shift”

ResultsResults

PLSA PLSA Semantic ShiftSemantic Shift0.7 � 0.85 0.9 � 0.98

[Liu&Chen CVPR’06]

Future WorkFuture Work

• Intra- object modeling

• Video: spatial- temporal modeling

• Training the codewords in the loop

• Multiple ojects

Chen2006

Conclusions

Machine learning can bridge the gap between low-level features (bits) and high-level semantics (content)

Hidden annotation and relevance feedback can help; semantic propagation is the key

Active learning

“Content-free” information retrieval is possible

Bayesian framework

Content extraction without user feedback is possible

Unsupervised learning; graphical models

Chen2006

Afterthoughts…Feng-Shui (风水风水风水风水)

Ancient Chinese room arrangement technique

Way 1 (low-level):Write down all the rules

Too many and do not generalize

Way 2 (high-level):Imagine how a dragon would move through the room to arrange it in a livable manner

Intuitive and creative

Done by some Feng-Shui masters

Chen2006

Advanced Multimedia Processing Lab

Please visit us at:

http://amp.ece.cmu.edu

Documents

Multimedia Analysis - Cornell Universitychenlab.ece.cornell.edu/Publication/Tsuhan/20061103PCM(distr).pdf · 03/11/2006 · Special issues: networked multimedia 2001, multimedia