Learning to Segment with Diverse Data M. Pawan Kumar Stanford
University
Slide 2
Semantic Segmentation car road grass tree sky
Slide 3
Segmentation Models car road grass tree sky MODEL w xy P(x,y;
w) Learn accurate parameters y* = argmax y P(x,y; w) P(x,y; w)
exp(-E(x,y;w)) y* = argmin y E(x,y; w)
Slide 4
Fully Supervised Data
Slide 5
Fully Supervised Data Specific foreground classes, generic
background class PASCAL VOC Segmentation Datasets
Slide 6
Fully Supervised Data Specific background classes, generic
foreground class Stanford Background Datasets
Slide 7
J. Gonfaus et al. Harmony Potentials for Joint Classification
and Segmentation. CVPR, 2010 S. Gould et al. Multi-Class
Segmentation with Relative Location Prior. IJCV, 2008 S. Gould et
al. Decomposing a Scene into Geometric and Semantically Consistent
Regions. ICCV, 2009 X. He et al. Multiscale Conditional Random
Fields for Image Labeling. CVPR, 2004 S. Konishi et al. Statistical
Cues for Domain Specific Image Segmentation with Performance
Analysis. CVPR, 2000 L. Ladicky et al. Associative Hierarchical
CRFs for Object Class Image Segmentation. ICCV, 2009 F. Li et al.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses.
CVPR, 2010 J. Shotton et al. TextonBoost: Joint Appearance, Shape
and Context Modeling for Multi-Class Object Recognition and
Segmentation. ECCV, 2006 J. Verbeek et al. Scene Segmentation with
Conditional Random Fields Learned from Partially Labeled Images.
NIPS, 2007 Y. Yang et al. Layered Object Detection for Multi-Class
Segmentation. CVPR, 2010 Supervised Learning Generic classes,
burdensome annotation
Slide 8
PASCAL VOC Detection Datasets Thousands of images Weakly
Supervised Data Bounding Boxes for Objects
Slide 9
Car Weakly Supervised Data Thousands of images ImageNet,
Caltech Image-Level Labels
Slide 10
B. Alexe et al. ClassCut for Unsupervised Class Segmentation.
ECCV, 2010 H. Arora et al. Unsupervised Segmentation of Objects
Using Efficient Learning. CVPR, 2007 L. Cao et al. Spatially
Coherent Latent Topic Model for Concurrent Segmentation and
Classification of Objects and Scenes. ICCV, 2007 J. Winn et al.
LOCUS: Learning Object Classes with Unsupervised Segmentation.
ICCV, 2005 Weakly Supervised Learning Binary segmentation, limited
data
Slide 11
Diverse Data Car
Slide 12
Diverse Data Learning Avoid generic classes Take advantage of
Cleanliness of supervised data Vast availability of weakly
supervised data
Slide 13
Outline Model Energy Minimization Parameter Learning Results
Future Work
Slide 14
Region-Based Model Pixels Regions Gould, Fulton and Koller,
ICCV 2009 Unary Potential r (i) = w i T r (x) For example, r (x) =
Average [R G B] w water = [0 0 -10] w grass = [0 -10 0] Features
extracted from region r of image x Pairwise Potential rr (i,j) = w
ij T rr (x) For example, rr (x) = constant > 0 w car above
ground > 0
Slide 15
Region-based Model E(x,y) -log P(x,y) = Unaries + Pairwise
E(x,y) = w T (x,y) Best segmentation of an image?Accurate w? x
y
Slide 16
Outline Model Energy Minimization Parameter Learning Results
Future Work Kumar and Koller, CVPR 2010
Slide 17
Besag. On the Statistical Analysis of Dirty Pictures, JRSS,
1986 Boykov et al. Fast Approximate Energy Minimization via Graph
Cuts, PAMI, 2001 Komodakis et al. Fast, Approximately Optimal
Solutions for Single and Dynamic MRFs, CVPR, 2007 Lempitsky et al.
Fusion Moves for Markov Random Field Optimization, PAMI, 2010
Move-Making T. Minka. Expectation Propagation for Approximate
Bayesian Inference, UAI, 2001 Murphy. Loopy Belief Propagation: An
Empirical Study, UAI, 1999 J. Winn et al. Variational Message
Passing, JMLR, 2005 J. Yedidia et al. Generalized Belief
Propagation, NIPS, 2001 Message-Passing Chekuri et al.
Approximation Algorithms for Metric Labeling, SODA, 2001 M. Goemans
et al. Improved Approximate Algorithms for Maximum-Cut, JACM, 1995
M. Muramatsu et al. A New SOCP Relaxation for Max-Cut, JORJ, 2003
Ravikumar et al. QP Relaxations for Metric Labeling, ICML, 2006
Convex Relaxations K. Alahari et al. Dynamic Hybrid Algorithms for
MAP Inference, PAMI 2010 P. Kohli et al. On Partial Optimality in
Multilabel MRFs, ICML, 2008 C. Rother et al. Optimizing Binary MRFs
via Extended Roof Duality, CVPR, 2007 Hybrid Algorithms Which one
is the best relaxation?
Slide 18
Convex Relaxations Time LP 1976 SOCP 2003 QP 2006 Tightness We
expect . Kumar, Kolmogorov and Torr, NIPS, 2007 Use LP!! LP
provably better than QP, SOCP.
Slide 19
Energy Minimization Find Regions Find Labels Fixed Regions LP
Relaxation
Slide 20
Energy Minimization Good region homogenous appearance,
textureBad region inhomogenous appearance, texture Low-level
segmentation for candidate regions Find Regions Find Labels Can we
prune regions? Super-exponential in Number of Pixels
Slide 21
Energy Minimization Spatial Bandwidth = 10 Mean-Shift
Segmentation
Slide 22
Energy Minimization Spatial Bandwidth = 20 Mean-Shift
Segmentation
Slide 23
Energy Minimization Spatial Bandwidth = 30 Mean-Shift
Segmentation
Slide 24
Energy Minimization Combine Multiple Segmentations Car
Slide 25
Dictionary of Regions Select Regions, Assign Classes y r (i)
{0,1}, for i = 0, 1, 2, , C Not Selected Selected regions cover
entire image No two selected regions overlap min r (i)y r (i) + rr
(i,j)y r (i)y r (j) Pixel Regions Kumar and Koller, CVPR 2010
Efficient DD. Komodakis and Paragios, CVPR, 2009 2323 3
Slide 26
Comparison Energy Accuracy IMAGEIMAGE GOULDGOULD OUROUR
Parameters learned using Gould, Fulton and Koller, ICCV 2009
Statistically significant improvement (paired t-test)
Slide 27
Outline Model Energy Minimization Parameter Learning Results
Future Work Kumar, Turki, Preston and Koller, In Submission
Slide 28
Supervised Learning x1x1 y1y1 x2x2 y2y2 P(x,y) exp(-E(x,y)) =
exp(w T (x,y)) P(y|x 1 ) y P(y|x 2 ) y y1y1 y2y2 Well-studied
problem, efficient solutions
Slide 29
Diverse Data Learning xa h Generic Class Annotation
Slide 30
Diverse Data Learning xa h Bounding Box Annotation
Slide 31
Diverse Data Learning x a = Cow h Image Level Annotation
Slide 32
Learning with Missing Information Expectation Maximization A.
Dempster et al. Maximum Likelihood from Incomplete Data via the EM
Algorithm. JRSS, 1977. M. Jamshadian et al. Acceleration of the EM
Algorithm by Using Quasi-Newton Methods. JRSS, 1997. R. Neal et al.
A View of the EM Algorithm that Justifies Incremental, Sparse, and
Other Variants. LGM, 1999. R. Sundberg. Maximum Likelihood Theory
for Incomplete Data from an Exponential Family. SJS 1974. Latent
Support Vector Machine P. Felzenszwalb et al. A Discriminatively
Trained, Multiscale, Deformable Part Model. CVPR, 2008. C.-N. Yu et
al. Learning Structural SVMs with Latent Variables. ICML, 2009.
Computationally Inefficient Only requires an energy minimization
algorithm Hard EM
Slide 33
Latent SVM w T (x i,a i, h i ) w T (x i,a,h) i min i i + w 2 ||
min h i Energy of Ground-truth Energy of Other Labelings
User-defined loss Difference of ConvexCCCP + (a i,a,h) Number of
disagreements Felzenszwalb et al., NIPS 2007, Yu et al., ICML
2008
Slide 34
CCCP Start with an initial estimate w 0 Update Update w t+1 by
solving a convex problem min i i w T (x i,a i,h i ) - w T (x i,a,h)
(a i,a,h) - i h i = min h w t T (x i,a i,h) Felzenszwalb et al.,
NIPS 2007, Yu et al., ICML 2008 + w 2 || Energy Minimization
Slide 35
Generic Class Annotation Generic background with specific
background Generic foreground with specific foreground
Slide 36
Bounding Box Annotation Every row contains the object Every
column contains the object
Slide 37
Image Level Annotation The image contains the object Cow
Slide 38
CCCP Start with an initial estimate w 0 Update Update w t+1 by
solving a convex problem min i i w T (x i,a i,h i ) - w T (x i,a,h)
(a i,a,h) - i h i = min h w t T (x i,a i,h) Felzenszwalb et al.,
NIPS 2007, Yu et al., ICML 2008 + w 2 || Energy Minimization Bad
Local Minimum!!
Slide 39
White sky Grey road EASY Green grass
Slide 40
White sky Blue water Green grass EASY
Slide 41
Cow? Cat? Horse? HARD
Slide 42
Red Sky? Black Mountain? All images are not equal HARD
Slide 43
Real Numbers Imaginary Numbers e i +1 = 0 Math is for losers
!!
Slide 44
Real Numbers Imaginary Numbers e i +1 = 0 Euler was a genius!!
Self-Paced Learning
Slide 45
Easy vs. Hard Easy for human Easy for machine Simultaneously
estimate easiness and parameters
Slide 46
Self-Paced Learning Start with an initial estimate w 0 Update
Update w t+1 by solving a convex problem h i = min h w t T (x i,a
i,h) Kumar, Packer and Koller, NIPS 2010 min I i w T (x i,a i,h i )
- w T (x i,a,h) (a i,a,h) - i + w 2 || vivi - i v i /K v i {0,1} v
i [0,1] v i = 1 for easy examplesv i = 0 for hard examples Biconvex
Optimization Alternate Convex Search
Slide 47
Self-Paced Learning Start with an initial estimate w 0 Update
Update w t+1 by solving a biconvex problem min I i v i w T (x i,a
i,h i ) - w T (x i,a,h) (a i,a,h) - i h i = min h w t T (x i,a i,h)
Kumar, Packer and Koller, NIPS 2010 + w 2 || - i v i /K Decrease K
K/ As Simple As CCCP!!
Slide 48
Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 h x a =
Deer Test Error Image Classification x a = -1 or +1 h = Motif
Position Test Error Motif Finding
Slide 49
Learning to Segment CCCP SPL
Slide 50
Learning to Segment CCCP SPL Iteration 1
Slide 51
Learning to Segment CCCP SPL Iteration 3
Slide 52
Learning to Segment CCCP SPL Iteration 6
Slide 53
Learning to Segment CCCP SPL
Slide 54
Learning to Segment CCCP SPL Iteration 1
Slide 55
Learning to Segment CCCP SPL Iteration 2
Slide 56
Learning to Segment CCCP SPL Iteration 4
Slide 57
Outline Model Energy Minimization Parameter Learning Results
Future Work
Slide 58
Dataset Stanford Background Generic background class 20
foreground classes Generic foreground class 7 background classes
PASCAL VOC 2009 +
Baseline Results for SBD Gould, Fulton and Koller, ICCV 2009
Classes Overlap Score Foreground 36.0% Road 70.1% CLL Average 53.1%
Mountain 0%
Slide 61
Improvement for SBD Classes Difference (SPL-CLL) Input CLLSPL
Road 75.5% (+5.4) CLL Average 53.1% SPL Average 54.3% Foreground
39.1% (+3.1)
Slide 62
Baseline Results for VOC Gould, Fulton and Koller, ICCV 2009
Overlap Score Classes Bird 9.5% Aeroplane 32.1% TV 23.6% CLL
Average 24.7%
Slide 63
Improvement for VOC Input CLLSPL Difference (SPL-CLL) Classes
Aeroplane 41.4% (+9.3) TV 31.3% (+7.7) CLL Average 24.7% SPL
Average 26.9%
Slide 64
Weakly Supervised Dataset ImageNetVOC Detection 2009 + Train -
1564 imagesTrain - 1000 images Bounding Box Data Image-Level
Data
Slide 65
Improvement for SBD Input GenericAll Difference (All-Generic)
Classes Generic Average 54.3% All Average 55.3% Foreground 41.3%
(+2.2) Water 60.1% (+5.0)
Slide 66
Improvement for VOC Difference (All-Generic) Classes Input
GenericAll Generic Average 26.9% All Average 28.8% Motorbike 40.4%
(+6.9) Person 42.2% (+4.9)
Slide 67
Improvement over CCCP Classes Difference (SPL-CCCP) CCCP 24.7%
SPL 28.8% CCCP 53.8% SPL 55.3% No Improvement with CCCP SPL is
Essential!! Difference (SPL-CCCP) Classes
Slide 68
Energy minimization for region-based model Tight LP relaxation
of integer program Self-paced learning Simultaneously select
examples and learn parameters Even weak annotation is useful
Summary
Slide 69
Outline Model Energy Minimization Parameter Learning Results
Future Work
Slide 70
Learning with Diverse Data Noise in LabelsSize of Problem
Slide 71
Learning Diverse Tasks Object Detection Action Recognition Pose
Estimation 3D Reconstruction
Slide 72
Daphne Koller Stephen GouldBen Packer Haithem Turki Dan Preston
Andrew Zisserman Phil Torr Vladimir Kolmogorov
Slide 73
Summary Questions? Energy minimization for region-based model
Tight LP relaxation of integer program Self-paced learning
Simultaneously select examples and learn parameters Even weak
annotation is useful