Matrix Factorization with Unknown Noise

PowerPoint Presentation

Matrix Factorization with Unknown NoiseDeyu Meng Deyu Meng, Fernando De la Torre. Robust Matrix Factorization with Unknown Noise. International Conference of Computer Vision (ICCV), 2013.Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, Lei Zhang. Robust principal component analysis with complex noise,International Conference of Machine Learning (ICML), 2014.My lecture is about matrix factorization with unknown noise.1

Structure from Motion(E.g.,Eriksson and Hengel ,2010)

Photometric Stereo(E.g., Zheng et al.,2012)

Face Modeling(E.g., Candes et al.,2012)

(E.g. Candes et al.,2012)Background SubtractionLow-rank matrix factorization are widely used in computer vision.Low-rank matrix factorization aims to factorize the given data matrix into two smaller matrices, which compose of the subspace information underneath data. Such subspace information can always very useful for many practical computer vision tasks, such as structure from motion, photometric stereo, face modeling, background subtraction and et al..2

Complete, clean data (or with Gaussian noise)SVD: Global solution

Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.3

Complete, clean data (or with Gaussian noise)SVD: Global solution

There are always missing data

There are always heavy and complex noise

Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.4L2 norm modelYoung diagram (CVPR, 2008) L2 Wiberg (IJCV, 2007) LM_S/LM_M (IJCV, 2008) SALS (CVIU, 2010) LRSDP (NIPS, 2010) Damped Wiberg (ICCV, 2011) Weighted SVD (Technometrics, 1979) WLRA (ICML, 2003) Damped Newton (CVPR, 2005) CWM (AAAI, 2013) Reg-ALM-L1 (CVPR, 2013)Pros: smooth model, faster algorithm,have global optimum for non-missing dataCons: not robust to heavy outliersThe matrix factorization tasks are mainly solve by two optimization models, utilize L2 and L1 norm, respectively. Here W denotes the weight matrix, indicating the positions of the missing data entries.There are a bunch of methods proposed for both models.The L2 model methods are generally faster to solve attributed the smoothness of the model. When data contain no missing data, SVD can find the global optimum of the problem. But it is not robust to outliers and heavy non-Gaussian noises, and has local minimum in missing data cases.The L1 model methods have attracted much attention very recently due to their robust performance to outliers. However, the L1 model is non-smooth and the related methods generally very slow. Besides, these methods always cannot perform well in Gaussian noise data.

5L2 norm modelL1 norm modelYoung diagram (CVPR, 2008) L2 Wiberg (IJCV, 2007) LM_S/LM_M (IJCV, 2008) SALS (CVIU, 2010) LRSDP (NIPS, 2010) Damped Wiberg (ICCV, 2011) Weighted SVD (Technometrics, 1979) WLRA (ICML, 2003) Damped Newton (CVPR, 2005) CWM (AAAI, 2013) Reg-ALM-L1 (CVPR, 2013) Torre&Black (ICCV, 2001) R1PCA (ICML, 2006) PCAL1 (PAMI, 2008) ALP/AQP (CVPR, 2005) L1Wiberg (CVPR, 2010, best paper award) RegL1ALM (CVPR, 2012)Pros: smooth model, faster algorithm,have global optimum for non-missing dataCons: not robust to heavy outliersPros: robust to extreme outliersCons: non-smooth model, slow algorithm, perform badly in Gaussian noise dataThe matrix factorization tasks are mainly solve by two optimization models, utilize L2 and L1 norm, respectively. Here W denotes the weight matrix, indicating the positions of the missing data entries.There are a bunch of methods proposed for both models.The L2 model methods are generally faster to solve attributed the smoothness of the model. When data contain no missing data, SVD can find the global optimum of the problem. But it is not robust to outliers and heavy non-Gaussian noises, and has local minimum in missing data cases.The L1 model methods have attracted much attention very recently due to their robust performance to outliers. However, the L1 model is non-smooth and the related methods generally very slow. Besides, these methods always cannot perform well in Gaussian noise data.

6 L2 model is optimal to Gaussian noise

L1 model is optimal to Laplacian noise

But real noise is generally neither Gaussian nor Laplacian

Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.7

Saturation and shadow noiseCamera noise

Yale B faces:We propose Mixture of Gaussian (MoG)

Universal approximation property of MoGAny continuous distributionsMoGE.g., a Laplace distribution can be equivalently expressed as a scaled MoG (Mazya and Schmidt, 1996)(Andrews and Mallows, 1974)Our idea is to model the noise as a mixture of Gaussian.This is motivated by the universal approximation ability of MoG to any continuous distributions. For example, a Laplacian can be equivalently expressed as a scaled MoG.By this doing, we expect to extend the effective range of the current L2 and L1 matrix factorization methods.9MLE Model

Use EM algorithm to solve it!

E Step:M Step:

Good measures to estimate groundtruth subspaceWhat L2 and L1 methods optimizeSynthetic experiments Six error measurementsThree noise casesGaussian noiseSparse noiseMixture noiseNow we show some performance comparison in synthetic data experiments.We designed three series of experiments, on data with Gaussian noise, sparse noise and mixture noise, respectively.Six measurements are used for performance assessment. The first two are actually the objective functions of L2 and L1 models. However, it is easy to see that the last four measures are what we really want to use, which assess the accuracy of the output to the groundtruth information.13

Gaussian noise experiments

Sparse noise experiments

Mixture noise experimentsL2 methodsL1 methodsOur method MoG performs similar with L2 methods, better than L1 methods. MoG performs as good as the best L1 method, better than L2 methods. MoG performs better than all L2 and L1 competing methodsOur result shows that in Gaussian noise cases, our method can perform as good as other L2 methods, better than L1 methods. In sparse noise cases, our method perform as good as the best L1 method, better than L2 methods. In mixture noise cases, our method is always best among all competing methods.14Why MoG is robust to outliers?

L1 methods perform well in outlier or heavy noise cases since it is a heavy-tail distribution. Through fitting the noise as two Gaussians, the obtained MoG distribution is also heavy tailed.Face modeling experiments

We also run some face modeling experiments. Like other methods, our method can remove some occlusions and saturations from the face. While our method can perform better in extracting the diffuse component from face. In such cases the small light spanned in face is like outliers and the face details hiding in shadow are like a Gaussian blur in face. Due to such relatively complicated noises, our method can perform better as expected.16Explanation

Saturation and shadow noiseCamera noiseBackground Subtraction

Background Subtraction

SummaryWe propose a LRMF model with a Mixture of Gaussians (MoG) noiseThe new method can well handle outliers like L1-norm methods but using a more efficient way.The extracted noises are with certain physical meaningsThanks!

Documents

Matrix Factorization with Unknown Noise