Upload
buddy-mathews
View
217
Download
4
Embed Size (px)
Citation preview
Ranking with High-Orderand Missing Information
M. Pawan Kumar
Ecole Centrale Paris
Aseem Behl Puneet Dokania Pritish Mohapatra C. V. Jawahar
PASCAL VOC“Jumping” Classification
Features
Processing
Training
Classifier
PASCAL VOC
Features
Processing
Training
Classifier
Think of a classifier !!!
“Jumping” Classification
✗
PASCAL VOC
Features
Processing
Training
Classifier
Think of a classifier !!!✗
“Jumping” Ranking
Ranking vs. ClassificationRank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1
Ranking vs. ClassificationRank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1 Accuracy = 1= 0.92 = 0.67= 0.81
Ranking vs. Classification
Ranking is not the same as classification
Average precision is not the same as accuracy
Should we use 0-1 loss based classifiers?
Or should we use AP loss based rankers?
• Optimizing Average Precision (AP-SVM)
• High-Order Information
• Missing Information
Yue, Finley, Radlinski and Joachims, SIGIR 2007
Outline
Problem FormulationSingle Input X
Φ(xi)for all i P
Φ(xk)for all k N
Problem FormulationSingle Output R
Rik =
+1 if i is better ranked than k
-1 if k is better ranked than i
Problem FormulationScoring Function
si(w) = wTΦ(xi) for all i P
sk(w) = wTΦ(xk) for all k N
S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w))
Ranking at Test-Time
R(w) = maxR S(X,R;w)
x1
Sort samples according to individual scores s i(w)
x2 x3 x4 x5 x6 x7 x8
Learning FormulationLoss Function
Δ(R*,R(w))
= 1 – AP of rank R(w)
Non-convex
Parameter cannot be regularized
Learning FormulationUpper Bound of Loss Function
Δ(R*,R(w))S(X,R(w);w) + - S(X,R(w);w)
Learning FormulationUpper Bound of Loss Function
Δ(R*,R(w))S(X,R(w);w) + - S(X,R*;w)
Learning FormulationUpper Bound of Loss Function
Δ(R*,R)S(X,R;w) + - S(X,R*;w)maxR
Convex Parameter can be regularized
minw ||w||2 + C ξ
S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for LearningCutting Plane Computation
maxR S(X,R;w) + Δ(R*,R)
x1 x2 x3 x4 x5 x6 x7 x8
Sort positive samples according to scores si(w)
Sort negative samples according to scores sk(w)
Find best rank of each negative sample independently
Optimization for LearningCutting Plane Computation
Train
ing
Tim
e
0-1
AP
5x slowerAP
Slightly faster
Mohapatra, Jawahar and Kumar, NIPS 2014
Experiments
PASCAL VOC 2011Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Images Classes
10 ranking tasks
Cross-validation
Poselets Features
AP-SVM vs. SVMPASCAL VOC ‘test’ Dataset
Differencein AP
Better in 8 classes, tied in 2 classes
AP-SVM vs. SVMFolds of PASCAL VOC ‘trainval’ Dataset
Differencein AP
AP-SVM is statistically better in 3 classes
SVM is statistically better in 0 classes
• Optimizing Average Precision
• High-Order Information (HOAP-SVM)
• Missing Information
Dokania, Behl, Jawahar and Kumar, ECCV 2014
Outline
High-Order Information
• People perform similar actions
• People strike similar poses
• Objects are of same/similar sizes
• “Friends” have similar habits
• How can we use them for ranking? classification
Problem Formulationx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Ψ(x,y) = Ψ1(x,y)
Ψ2(x,y)
Unary Features
Pairwise Features
Learning Formulationx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Δ(y*,y) = Fraction of incorrectly classified persons
Optimization for Learningx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
maxy wTΨ(x,y) + Δ(y*,y)
Graph Cuts (if supermodular)
LP Relaxation, or exhaustive search
Classificationx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
maxy wTΨ(x,y)
Graph Cuts (if supermodular)
LP Relaxation, or exhaustive search
Ranking?x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Use difference of max-marginals
Max-Marginal for Positive Classx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
mm+(i;w) = maxy,yi=+1 wTΨ(x,y)
Best possible score when person i is positive
Convex in w
Max-Marginal for Negative Classx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
mm-(i;w) = maxy,yi=-1 wTΨ(x,y)
Best possible score when person i is negative
Convex in w
Rankingx
Input x = {x1,x2,x3}
Output y = {-1,+1}3
si(w) = mm+(i;w) – mm-(i;w)
Difference-of-Convex in w
Use difference of max-marginals HOB-SVM
Ranking
si(w) = mm+(i;w) – mm-(i;w)
Why not optimize AP directly?
High Order AP-SVM
HOAP-SVM
Problem FormulationSingle Input X
Φ(xi)for all i P
Φ(xk)for all k N
Problem FormulationSingle Input R
Rik =
+1 if i is better ranked than k
-1 if k is better ranked than i
Problem FormulationScoring Function
si(w) = mm+(i;w) – mm-(i;w) for all i P
sk(w) = mm+(k;w) – mm-(k;w) for all k N
S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w))
Ranking at Test-Time
R(w) = maxR S(X,R;w)
x1
Sort samples according to individual scores s i(w)
x2 x3 x4 x5 x6 x7 x8
Learning FormulationLoss Function
Δ(R*,R(w)) = 1 – AP of rank R(w)
Learning FormulationUpper Bound of Loss Function
minw ||w||2 + C ξ
S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for Learning
Difference-of-convex program
Kohli and Torr, ECCV 2006
Very efficient CCCP
Linearization step by Dynamic Graph Cuts
Update step equivalent to AP-SVM
Experiments
PASCAL VOC 2011Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Images Classes
10 ranking tasks
Cross-validation
Poselets Features
HOB-SVM vs. AP-SVMPASCAL VOC ‘test’ Dataset
Differencein AP
Better in 4, worse in 3 and tied in 3 classes
HOB-SVM vs. AP-SVMFolds of PASCAL VOC ‘trainval’ Dataset
Differencein AP
HOB-SVM is statistically better in 0 classes
AP-SVM is statistically better in 0 classes
HOAP-SVM vs. AP-SVMPASCAL VOC ‘test’ Dataset
Better in 7, worse in 2 and tied in 1 class
Differencein AP
HOAP-SVM vs. AP-SVMFolds of PASCAL VOC ‘trainval’ Dataset
HOAP-SVM is statistically better in 4 classes
AP-SVM is statistically better in 0 classes
Differencein AP
• Optimizing Average Precision
• High-Order Information
• Missing Information (Latent-AP-SVM)
Outline
Behl, Jawahar and Kumar, CVPR 2014
Fully Supervised Learning
Weakly Supervised Learning
Rank images by relevance to ‘jumping’
• Use Latent Structured SVM with AP loss– Unintuitive Prediction– Loose Upper Bound on Loss– NP-hard Optimization for Cutting Planes
• Carefully design a Latent-AP-SVM– Intuitive Prediction– Tight Upper Bound on Loss– Optimal Efficient Cutting Plane Computation
Two Approaches
Results
Questions?
Code + Data Available