Learning to Segment with Diverse Data M. Pawan Kumar Stanford University

Semantic Segmentation car road grass tree sky

Segmentation Models car road grass tree sky MODEL w xy P(x,y; w) Learn accurate parameters y* = argmax y P(x,y; w) P(x,y; w) exp(-E(x,y;w)) y* = argmin y E(x,y; w)

Fully Supervised Data

Fully Supervised Data Specific foreground classes, generic background class PASCAL VOC Segmentation Datasets

Fully Supervised Data Specific background classes, generic foreground class Stanford Background Datasets

J. Gonfaus et al. Harmony Potentials for Joint Classification and Segmentation. CVPR, 2010 S. Gould et al. Multi-Class Segmentation with Relative Location Prior. IJCV, 2008 S. Gould et al. Decomposing a Scene into Geometric and Semantically Consistent Regions. ICCV, 2009 X. He et al. Multiscale Conditional Random Fields for Image Labeling. CVPR, 2004 S. Konishi et al. Statistical Cues for Domain Specific Image Segmentation with Performance Analysis. CVPR, 2000 L. Ladicky et al. Associative Hierarchical CRFs for Object Class Image Segmentation. ICCV, 2009 F. Li et al. Object Recognition as Ranking Holistic Figure-Ground Hypotheses. CVPR, 2010 J. Shotton et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. ECCV, 2006 J. Verbeek et al. Scene Segmentation with Conditional Random Fields Learned from Partially Labeled Images. NIPS, 2007 Y. Yang et al. Layered Object Detection for Multi-Class Segmentation. CVPR, 2010 Supervised Learning Generic classes, burdensome annotation

PASCAL VOC Detection Datasets Thousands of images Weakly Supervised Data Bounding Boxes for Objects

Car Weakly Supervised Data Thousands of images ImageNet, Caltech Image-Level Labels

B. Alexe et al. ClassCut for Unsupervised Class Segmentation. ECCV, 2010 H. Arora et al. Unsupervised Segmentation of Objects Using Efficient Learning. CVPR, 2007 L. Cao et al. Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes. ICCV, 2007 J. Winn et al. LOCUS: Learning Object Classes with Unsupervised Segmentation. ICCV, 2005 Weakly Supervised Learning Binary segmentation, limited data

Diverse Data Car

Diverse Data Learning Avoid generic classes Take advantage of Cleanliness of supervised data Vast availability of weakly supervised data

Outline Model Energy Minimization Parameter Learning Results Future Work

Region-Based Model Pixels Regions Gould, Fulton and Koller, ICCV 2009 Unary Potential r (i) = w i T r (x) For example, r (x) = Average [R G B] w water = [0 0 -10] w grass = [0 -10 0] Features extracted from region r of image x Pairwise Potential rr (i,j) = w ij T rr (x) For example, rr (x) = constant > 0 w car above ground > 0

Region-based Model E(x,y) -log P(x,y) = Unaries + Pairwise E(x,y) = w T (x,y) Best segmentation of an image?Accurate w? x y

Outline Model Energy Minimization Parameter Learning Results Future Work Kumar and Koller, CVPR 2010

Besag. On the Statistical Analysis of Dirty Pictures, JRSS, 1986 Boykov et al. Fast Approximate Energy Minimization via Graph Cuts, PAMI, 2001 Komodakis et al. Fast, Approximately Optimal Solutions for Single and Dynamic MRFs, CVPR, 2007 Lempitsky et al. Fusion Moves for Markov Random Field Optimization, PAMI, 2010 Move-Making T. Minka. Expectation Propagation for Approximate Bayesian Inference, UAI, 2001 Murphy. Loopy Belief Propagation: An Empirical Study, UAI, 1999 J. Winn et al. Variational Message Passing, JMLR, 2005 J. Yedidia et al. Generalized Belief Propagation, NIPS, 2001 Message-Passing Chekuri et al. Approximation Algorithms for Metric Labeling, SODA, 2001 M. Goemans et al. Improved Approximate Algorithms for Maximum-Cut, JACM, 1995 M. Muramatsu et al. A New SOCP Relaxation for Max-Cut, JORJ, 2003 Ravikumar et al. QP Relaxations for Metric Labeling, ICML, 2006 Convex Relaxations K. Alahari et al. Dynamic Hybrid Algorithms for MAP Inference, PAMI 2010 P. Kohli et al. On Partial Optimality in Multilabel MRFs, ICML, 2008 C. Rother et al. Optimizing Binary MRFs via Extended Roof Duality, CVPR, 2007 Hybrid Algorithms Which one is the best relaxation?

Convex Relaxations Time LP 1976 SOCP 2003 QP 2006 Tightness We expect . Kumar, Kolmogorov and Torr, NIPS, 2007 Use LP!! LP provably better than QP, SOCP.

Energy Minimization Find Regions Find Labels Fixed Regions LP Relaxation

Energy Minimization Good region homogenous appearance, textureBad region inhomogenous appearance, texture Low-level segmentation for candidate regions Find Regions Find Labels Can we prune regions? Super-exponential in Number of Pixels

Energy Minimization Spatial Bandwidth = 10 Mean-Shift Segmentation

Energy Minimization Combine Multiple Segmentations Car

Dictionary of Regions Select Regions, Assign Classes y r (i) {0,1}, for i = 0, 1, 2, , C Not Selected Selected regions cover entire image No two selected regions overlap min r (i)y r (i) + rr (i,j)y r (i)y r (j) Pixel Regions Kumar and Koller, CVPR 2010 Efficient DD. Komodakis and Paragios, CVPR, 2009 2323 3

Comparison Energy Accuracy IMAGEIMAGE GOULDGOULD OUROUR Parameters learned using Gould, Fulton and Koller, ICCV 2009 Statistically significant improvement (paired t-test)

Outline Model Energy Minimization Parameter Learning Results Future Work Kumar, Turki, Preston and Koller, In Submission

Supervised Learning x1x1 y1y1 x2x2 y2y2 P(x,y) exp(-E(x,y)) = exp(w T (x,y)) P(y|x 1 ) y P(y|x 2 ) y y1y1 y2y2 Well-studied problem, efficient solutions

Diverse Data Learning xa h Generic Class Annotation

Diverse Data Learning xa h Bounding Box Annotation

Diverse Data Learning x a = Cow h Image Level Annotation

Learning with Missing Information Expectation Maximization A. Dempster et al. Maximum Likelihood from Incomplete Data via the EM Algorithm. JRSS, 1977. M. Jamshadian et al. Acceleration of the EM Algorithm by Using Quasi-Newton Methods. JRSS, 1997. R. Neal et al. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. LGM, 1999. R. Sundberg. Maximum Likelihood Theory for Incomplete Data from an Exponential Family. SJS 1974. Latent Support Vector Machine P. Felzenszwalb et al. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. C.-N. Yu et al. Learning Structural SVMs with Latent Variables. ICML, 2009. Computationally Inefficient Only requires an energy minimization algorithm Hard EM

Latent SVM w T (x i,a i, h i ) w T (x i,a,h) i min i i + w 2 || min h i Energy of Ground-truth Energy of Other Labelings User-defined loss Difference of ConvexCCCP + (a i,a,h) Number of disagreements Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008

CCCP Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem min i i w T (x i,a i,h i ) - w T (x i,a,h) (a i,a,h) - i h i = min h w t T (x i,a i,h) Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008 + w 2 || Energy Minimization

Generic Class Annotation Generic background with specific background Generic foreground with specific foreground

Bounding Box Annotation Every row contains the object Every column contains the object

Image Level Annotation The image contains the object Cow

CCCP Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem min i i w T (x i,a i,h i ) - w T (x i,a,h) (a i,a,h) - i h i = min h w t T (x i,a i,h) Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008 + w 2 || Energy Minimization Bad Local Minimum!!

White sky Grey road EASY Green grass

White sky Blue water Green grass EASY

Cow? Cat? Horse? HARD

Red Sky? Black Mountain? All images are not equal HARD

Real Numbers Imaginary Numbers e i +1 = 0 Math is for losers !!

Real Numbers Imaginary Numbers e i +1 = 0 Euler was a genius!! Self-Paced Learning

Easy vs. Hard Easy for human Easy for machine Simultaneously estimate easiness and parameters

Self-Paced Learning Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem h i = min h w t T (x i,a i,h) Kumar, Packer and Koller, NIPS 2010 min I i w T (x i,a i,h i ) - w T (x i,a,h) (a i,a,h) - i + w 2 || vivi - i v i /K v i {0,1} v i [0,1] v i = 1 for easy examplesv i = 0 for hard examples Biconvex Optimization Alternate Convex Search

Self-Paced Learning Start with an initial estimate w 0 Update Update w t+1 by solving a biconvex problem min I i v i w T (x i,a i,h i ) - w T (x i,a,h) (a i,a,h) - i h i = min h w t T (x i,a i,h) Kumar, Packer and Koller, NIPS 2010 + w 2 || - i v i /K Decrease K K/ As Simple As CCCP!!

Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 h x a = Deer Test Error Image Classification x a = -1 or +1 h = Motif Position Test Error Motif Finding

Learning to Segment CCCP SPL

Learning to Segment CCCP SPL Iteration 1

Learning to Segment CCCP SPL

Dataset Stanford Background Generic background class 20 foreground classes Generic foreground class 7 background classes PASCAL VOC 2009 +

Dataset Stanford BackgroundPASCAL VOC 2009 + Train - 572 images Validation - 53 images Test - 90 images Train - 1274 images Validation - 225 images Test - 750 images

Baseline Results for SBD Gould, Fulton and Koller, ICCV 2009 Classes Overlap Score Foreground 36.0% Road 70.1% CLL Average 53.1% Mountain 0%

Improvement for SBD Classes Difference (SPL-CLL) Input CLLSPL Road 75.5% (+5.4) CLL Average 53.1% SPL Average 54.3% Foreground 39.1% (+3.1)

Baseline Results for VOC Gould, Fulton and Koller, ICCV 2009 Overlap Score Classes Bird 9.5% Aeroplane 32.1% TV 23.6% CLL Average 24.7%

Improvement for VOC Input CLLSPL Difference (SPL-CLL) Classes Aeroplane 41.4% (+9.3) TV 31.3% (+7.7) CLL Average 24.7% SPL Average 26.9%

Weakly Supervised Dataset ImageNetVOC Detection 2009 + Train - 1564 imagesTrain - 1000 images Bounding Box Data Image-Level Data

Improvement for SBD Input GenericAll Difference (All-Generic) Classes Generic Average 54.3% All Average 55.3% Foreground 41.3% (+2.2) Water 60.1% (+5.0)

Improvement for VOC Difference (All-Generic) Classes Input GenericAll Generic Average 26.9% All Average 28.8% Motorbike 40.4% (+6.9) Person 42.2% (+4.9)

Improvement over CCCP Classes Difference (SPL-CCCP) CCCP 24.7% SPL 28.8% CCCP 53.8% SPL 55.3% No Improvement with CCCP SPL is Essential!! Difference (SPL-CCCP) Classes

Energy minimization for region-based model Tight LP relaxation of integer program Self-paced learning Simultaneously select examples and learn parameters Even weak annotation is useful Summary

Learning with Diverse Data Noise in LabelsSize of Problem

Learning Diverse Tasks Object Detection Action Recognition Pose Estimation 3D Reconstruction

Daphne Koller Stephen GouldBen Packer Haithem Turki Dan Preston Andrew Zisserman Phil Torr Vladimir Kolmogorov

Summary Questions? Energy minimization for region-based model Tight LP relaxation of integer program Self-paced learning Simultaneously select examples and learn parameters Even weak annotation is useful

Documents

Learning to Segment with Diverse Data M. Pawan Kumar Stanford University