Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multimedia Analysis:Marriage of Signal Processing and Machine Learning
Tsuhan Chen 陈祖翰陈祖翰陈祖翰陈祖翰Professor, Carnegie Mellon University
Chen2006
A 10-Year Journey…
IEEE Multimedia Signal Processing (MMSP) Technical Committee, 1996~
MMSP WorkshopsPrinceton 1997, Los Angeles 1998, Copenhagen 1999, Cannes 2001, St. Thomas 2002, Siena 2004, Shanghai 2005, Victoria 2006
International Conf on Multimedia (ICME)New York 2000, Tokyo 2001, Lausanne 2002, Baltimore 2003, Taipei 2004, Amsterdam 2005, Toronto 2006, Beijing 2007
IEEE Transactions on Multimedia, March 1999~Special issues: networked multimedia 2001, multimedia database 2002, multimodal interface 2003, streaming media 2004, MPEG-21 2005
SPS Distinguished Lecturer 2007
Chen2006
Multimedia Analysis: Bits vs. Content
images face possible all of number>>
population worldhistory human36524606030 ××××××>>
Number of all possible 16×12 images812162 ××=
[Baker/Kanade]
Reconstruct
It’s Bayesian in machine learning, a.k.a., prior…
Chen2006
Thoughts
“The most compelling shapes are those near to our
hearts: people’s faces, a gracefully moving body, a
natural scene with rustling leaves and flowing water.
Evolution has tuned us to these sights…”[Lengyel, 1998]
Multimedia analysis… more than processing bits…
it’s all about the content …
it’s signal processing + machine learning[Chen, 2006]
Chen2006
Content-Based Information Retrieval
Many Interesting Applications…
Chen2006
Hand-Drawn
Query
Retrieved Trademarks
Logos [Leung&Chen ICME’02]
Chen2006
Hand-Drawn Sketches
User sketches a query
Query
Sketch
Similar
Sketch
Page Stored in
Database
[Leung&Chen ICME’03]
Chen2006
3D Objects [Zhang&Chen ACM MM’01]
Chen2006
Chen2006
3D Protein Structures [Chen&Chen ICIP’02]
Chen2006
Content-Based Information Retrieval (CBIR)
Multimedia
Database f1
f2
Low-Level
Feature Space
Feature
Extraction
Query
Feature
Extraction
2
1
f
f
Indexing
Indexed
Feature
Database
Retrieval
Results
Simila
rity
Measure
Chen2006
High-Level Semantics
e.g. hierarchical attribute tree
Architecture Anatomy
Buildings
(133) (73)
Building_Materials/
Misc
Landmarks
Body-parts
Children
Female MaleMonsters/
Androids/
Aliens
(32)
(87)
(14)
(47)
(1)
(6) (8)
(12)
BIG CHALLENGE:
How to bridge the gap
between low-level
features and high-
level semantics?
Chen2006
Possible Solutions
Multimedia
Database f1
f2
Low-level
Feature space
Feature
Extraction
Query
Feature
Extraction
2
1
f
f
Indexing
Indexed
Feature
Database
Retrieval
Results
Simila
rity
Measure
Hidden
Annotation
Relevance
Feedback
Chen2006
Semantic Information
Hidden annotation
Object #n has Attribute #k � explicit
Relevance feedback
Objects #m and #n are (not) similar � implicit
Q: How to represent and propagate semantic information?
Q: How to use explicit/implicit semantic information to improve retrieval?
Chen2006
Semantic Information as Probabilities
pNK…pN3pN2pN1Object
N
p2K…p23p22p21Object
2
p1K…p13p12p11Object
1
Attribute
K…
Attribute
3
Attribute
2
Attribute
1
…… … … …
pnk : Attribute Probabilities
Chen2006
Object
N
0101Object
2
Object
1
Attribute
K…
Attribute
3
Attribute
2
Attribute
1
…… … … …
Q: How to propagate?
When an object is annotated, pnk is set to 0/1
Annotate one object…
A: Based on low- level features
Chen2006
Semantic Propagation
pprior
f1
f2
pi k
pprior
f1
f2
pi k
“Biased Kernel Regression”
Chen2006
pprior
f1 f
2
pi k
Semantic Propagation (cont.)
Chen2006
Q: Which to annotate next?
Semantic Propagation (cont.)
pNK…pN3pN2pN1Object
N
0101Object
2
p1K…p13p12p11Object
1
Attribute
K…
Attribute
3
Attribute
2
Attribute
1
…… … … …
Chen2006
Active Learning
Choose the most uncertain object to annotate
Uncertainty determined by the entropy of attribute probabilities
“Selective sampling”
May want to consider density in feature space too
Teacher Student
Teacher Student
Annotator Retrieval System
Passive Learning
Active Learning
Chen2006
Recap…
Maintain attribute probabilities of each object
Set an attribute probability to 1/0 when annotated
Propagate probabilities to non-annotated objects
Choose the most uncertain object in the database to annotate next
Use probabilities to estimate uncertainty
Use probabilities to measure semantic distance…
Chen2006
Result
0
0.1
0.2
0.3
0.4
0.5
0 300 600 900 1200 1500 1800
Number of Annotated Models
Av
era
ge
Ma
tch
ing
Err
or
(Err
)
Random Sampling
Our algorithm
3D Objects
(1750 total)
[Zhang&Chen T-MM’02]
Chen2006
Relevance Feedback
Relevance feedback
Ask for user’s feedback during the retrieval
“Object #i is (not) similar to the query”
“Objects #m and #n are (not) similar”
Implicit semantic information
Use feedback to improve retrieval
Way 1: Move the query point
Way 2: Weigh the features
Way 3: “Warp” the feature space
Chen2006
An Example
Retrieved Results
(inside circle)
Query
1f
2f
Chen2006
Retrieved Results
(inside circle)
Query
1f
2f
Positive feedback
Negative feedback
An Example
Chen2006
Move the Query Point
New Retrieved Results
(inside circle)
New Query
1f
2f
Positive feedback
Negative feedback
Chen2006
Feature Weighting
New Retrieved Results
(inside ellips)
Query
1f
2f
Positive feedback
Negative feedback
Chen2006
Feature Space Warping
New Retrieved Results
(inside circle)
Query
'1f
'2f
Positive feedback
Negative feedback
Retrieved Results
(inside circle)
Query
1f
2f
Positive feedback
Negative feedback
Before Warping After Warping
Chen2006
Feature Space Warping
Feature Space
Query
Positive
Feedback
Negative
Feedback
( ) iq
M
j
ijipi vvcuv
−= ∑
=1
expγ This is also semantic propagation!!!
Chen2006
Experiment Result
D ata b a s e P e rfo rm an ce I n c re a s e w ith
G am m a= 0 .3 , c= 6*p i
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 0 0
0 1 2 3 4
n u m b e r o f fe e d b a c k i te ra tio n s
% p
erf
orm
an
ce i
ncre
ase
T= 3
T= 7
T= 1 2
Performance Improvement
3D Objects
(1750 total)
[Bang&Chen ICIP’02]
Chen2006
Semantic Propagation is the Key
Without semantic propagation, hidden annotation and relevance feedback are not very useful
With enough relevance feedback, can we can accomplish information retrieval without low-level features at all?
Chen2006
Pushing Content to Extreme
Content-Free Information Retrieval…
Chen2006
Content-Free Information Retrieval (CFIR)
With enough relevance feedback, retrieval is based more and more on feedback, less and less on features
In the extreme case, retrieval based on feedback only
Retrieval based on user history
e.g., Amazon.com
Chen2006
Example -- How CFIR Works
1001
?110
0110
User History
Chen2006
1001
0110
0110
User History
Example -- How CFIR Works
and are more similar than and
Chen2006
CBIR vs. CFIR
Will user U like image X ?
Two different approaches:
Look at what U likes
� Characterize images � Content-based IR
Look at which users like X
� Characterize users � Content-free IR
Chen2006
Bayesian Framework
Retrieval based on
1
1
)1(
)1|1(
)1,...,1|1(1
−
=
=
==Π∝
===
F
i
ji
F
k
jji
xP
xxP
xxxP
k
F
Pair-wise conditional probability matrix
jixxP ji ,)1|1(~
∀==
User History
Chen2006
Experiment Results
20 40 60 80 1000
0.5
1
1.5
2
2.5
3
Recall (%)
Pre
cis
ion
(%
)
Inverse varianceOne-class SVMBayesian product rule
Random
20 40 60 80 10010
20
30
40
50
60
70
80
Recall (%)
Pre
cis
ion
(%
)
Product rule Max entropy Sum rule
CBIR Methods CFIR Methods
Sample Images
[Liu&Chen ICASSP’05]
Chen2006
Content without User Feedback
Extracting content from nothing…
Unsupervised Image CategorizationUnsupervised Image Categorization
[Caltech face + background dataset]
Unsupervised Image CategorizationUnsupervised Image Categorization
[UIUC car dataset]
“Bag of Words” Representation“Bag of Words” Representation
DoG interest point detector + SIFT descriptor [Lowe]
Vector Quantization
Count
Words
Codebook
Graphical ModelGraphical Model
count count
zface
w
zcar
w
words words
Topic Appearance
w
w
w
w
: topic
: word
Need to handle background…Need to handle background…
count
words
Solution: Add a hidden layer � PLSA
Probabilistic Latent Semantic Analysis (PLSA)Probabilistic Latent Semantic Analysis (PLSA)
• Learning: Maximum likelihood estimation using EM algorithm
d z w
z w
z w
d z w
z w
z w
doc 1
doc 2
Document CharacteristicTopic Appearance
• Hofmann 01, Monay and Gatica-Perez 04, Sivic et al. 05, Quelhas et al. 05
• Can model complex scenes
: image
: topic
: word
• Inference:
– Categorization
– Segmentation
Problem with PLSAProblem with PLSA
“Bag of Words” is the problem…“Bag of Words” is the problem…
Picasso, 1943
Dali, 1936
• As long as the parts are present, the exact position does not matter too much
Not so for general objects!
Enforcing ClusteringEnforcing Clustering
• A number of S =10 fixed spatial distributions
d z
s
w
x
Nd
D
µ ΣS
[Liu&Chen ICIP’06]
s1 to s9s10appearance
position
Enforcing Clustering: Semantic-ShiftEnforcing Clustering: Semantic-Shift
d z w
xµΣ
Nd D
Document CharacteristicTopic AppearanceLocation Semantics
: image
: topic
: word appearance
: word position
[Liu&Chen CVPR’06]
Representing Location SemanticsRepresenting Location Semantics
d z w
xµΣ
Nd D
• Assume single foreground object
• Location semantics
– Foreground:
– Background:
Complement of foreground distribution
Learning in Semantic-ShiftLearning in Semantic-Shift
: bag of word- position pairs
Document Characteristic
Topic Appearance
posterior
Learning in Semantic-ShiftLearning in Semantic-Shift
Location semantics Topic appearanceposterior
Learning all 3 terms simultaneously…
Completely unsupervised…
This is why “semantic-shift”
ResultsResults
PLSA PLSA Semantic ShiftSemantic Shift0.7 � 0.85 0.9 � 0.98
[Liu&Chen CVPR’06]
Future WorkFuture Work
• Intra- object modeling
• Video: spatial- temporal modeling
• Training the codewords in the loop
• Multiple ojects
Chen2006
Conclusions
Machine learning can bridge the gap between low-level features (bits) and high-level semantics (content)
Hidden annotation and relevance feedback can help; semantic propagation is the key
Active learning
“Content-free” information retrieval is possible
Bayesian framework
Content extraction without user feedback is possible
Unsupervised learning; graphical models
Chen2006
Afterthoughts…Feng-Shui (风水风水风水风水)
Ancient Chinese room arrangement technique
Way 1 (low-level):Write down all the rules
Too many and do not generalize
Way 2 (high-level):Imagine how a dragon would move through the room to arrange it in a livable manner
Intuitive and creative
Done by some Feng-Shui masters
Chen2006
Advanced Multimedia Processing Lab
Please visit us at:
http://amp.ece.cmu.edu