Click here to load reader

Al Parker January 18, 2009 Accelerating Gibbs sampling of Gaussians using matrix decompositions

Embed Size (px)

DESCRIPTION

The multivariate Gaussian distribution

Citation preview

Al Parker January 18, 2009 Accelerating Gibbs sampling of Gaussians using matrix decompositions Colin Fox, Physics, University of Otago New Zealand Institute of Mathematics, University of Auckland Center for Biofilm Engineering,, Bozeman Acknowledgements The multivariate Gaussian distribution y = 1/2 z+ ~ N(,) How to sample from a Gaussian N(,)? Sample z ~ N(0, I ) (eg y = W 1/2 z + ) Example: From 64 faces, modeling face space with a Gaussian Process N(,) Pixel intensity at the ith row and jth column is y(s(i,j)), y(s) R 112 x R 112 (s) R 112 x R 112 (s,s) R x R 12544 ~N(,) A bigger example: data = SPHERE + , Sample from (SPHERE|data) ~ N(0, 2 I) The problem To generate a sample y = 1/2 z+ ~ N(,), how to calculate the factorization = 1/2 ( 1/2 ) T ? 1/2 = W 1/2 by eigen-decomposition, 10/3n 3 flops 1/2 = C by Cholesky factorization, 1/3n 3 flops For LARGE Gaussians (n>10 5, eg in image analysis and global data sets), these approaches are not possible n 3 is computationally TOO EXPENSIVE storing an n x n matrix requires TOO MUCH MEMORY Some solutions Work with sparse precision matrix -1 models (Rue, 2001) Circulant embeddings (Gneiting et al, 2005) Iterative methods: Advantages: COST: n 2 flops per iteration MEMORY: Only vectors of size n x 1 need be stored Disadvantages: If the method runs for n iterations, then there is no cost savings over a direct method Gibbs: an iterative sampler of N(0,A) and N(0, A -1 ) Let A= or A= -1 1.Split A into D=diag(A), L=lower(A), L T =upper(A) 2.Sample z ~ N(0, I ) 3.Take conditional samples in each coordinate direction, so that a full sweep of all n coordinates is y k =-D -1 L y k - D -1 L T y k-1 + D -1/2 z y k converges in distribution geometrically to N(0,A -1 ) Ay k converges in distribution geometrically to N(0,A) Gibbs: an iterative sampler Gibbs sampling from N(,) starting from (0,0) Gibbs: an iterative sampler Gibbs sampling from N(,) starting from (0,0) Whats the link to Ax=b? Solving Ax=b is equivalent to minimizing an n- dimensional quadratic (when A is spd) A Gaussian is sufficiently specified by the same quadratic (with A= -1 and b=A): Gauss-Siedel Linear Solve of Ax=b 1.Split A into D=diag(A), L=lower (A), L T =upper(A) 2.Minimize the quadratic f(x) in each coordinate direction, so that a full sweep of all n coordinates is x k =-D -1 L x k - D -1 L T x k-1 + D -1 b x k converges geometrically A -1 b Gauss-Siedel Linear Solve of Ax=b x k converges geometrically A -1 b, (x k - A -1 b) = G k ( x 0 - A -1 b) where (G) < 1 Theorem: A Gibbs sampler is a Gauss Siedel linear solver Proof: A Gibbs sampler is y k =-D -1 L y k - D -1 L T y k-1 + D -1/2 z A Gauss-Siedel linear solve of Ax=b is x k =-D -1 L x k - D -1 L T x k-1 + D -1 b Gauss Siedel is a Stationary Linear Solver A Gauss-Siedel linear solve of Ax=b is x k =-D -1 L x k - D -1 L T x k-1 + D -1 b Gauss Siedel can be written as M x k = N x k-1 + b where M = D + L and N = D - L T, A = M N, the general form of a stationary linear solver Stationary linear solvers of Ax=b 1.Split A=M-N 2.Iterate Mx k = N x k-1 + b 1.Split A=M-N 2.Iterate x k = M -1 Nx k-1 + M -1 b = Gx k-1 + M -1 b x k converges geometrically A -1 b, (x k - A -1 b) = G k ( x 0 - A -1 b) when (G) = (M -1 N)< 1 Stationary Samplers from Stationary Solvers Solving Ax=b: 1.Split A=M-N 2.Iterate Mx k = N x k-1 + b x k A -1 b if (M -1 N)< 1 Sampling from N(0,A) and N(0,A -1 ): 1.Split A=M-N 2.Iterate My k = N y k-1 + c k-1 where c k-1 ~ N(0, M T + N) y k N(0,A -1 ) if (M -1 N)< 1 Ay k N(0,A) if (M -1 N)< 1 How to sample c k-1 ~ N(0, M T + N) ? Gauss Siedel M = D + L, c k-1 ~ N(0, D) SOR (successive over-relaxation) M = 1/wD + L, c k-1 ~ N(0, (2-w)/w D) Richardson M = I, c k-1 ~ N(0, 2I-A ) Jacobi M = D, c k-1 ~ N(0, 2D-A ) Theorem: Stat Linear Solver converges iff Stat Sampler converges and the convergence is geometric Proof: They have the same iteration operator: For linear solves: x k = Gx k-1 + M -1 b so that (x k - A -1 b) = G k ( x 0 - A -1 b) For sampling: y k = Gy k-1 + M -1 c k-1 E(y k )= G k E(y 0 ) Var(y k ) = A -1 - G k A -1 G kT Proof for Gaussians given by Barone and Frigessi, For arbitrary distributions by Duflo, 1997 Acceleration schemes for stationary linear solvers can be used to accelerate stationary samplers Polynomial acceleration of a stationary solver of Ax=b is 1. Split A = M - N 2. x k+1 = (1- v k ) x k-1 + v k x k + v k u k M -1 (b-A x k ) which replaces (x k - A -1 b) = G k ( x 0 - A -1 b) with a k th order polynomial (x k - A -1 b) = p(G)( x 0 - A -1 b) Chebyshev acceleration x k+1 = (1- v k ) x k-1 + v k x k + v k u k M -1 (b-A x k ) where v k, u k are functions of the 2 extreme eigenvalues of G (not very expensive to get estimates of these eigenvalues) Gauss-Siedel converged like this x k+1 = (1- v k ) x k-1 + v k x k + v k u k M -1 (b-A x k ) where v k, u k are functions of the 2 extreme eigenvalues of G (not very expensive to get estimates of these eigenvalues) convergence (geometric-like) with Chebyshev acceleration x k+1 = (1- v k ) x k-1 + v k x k + v k u k M -1 (b-A x k ) where v k, u k are functions of the residuals b-Ax k convergence guaranteed in n finite steps with CG acceleration Conjugate Gradient (CG) acceleration Polynomial accelerated sampler of N(0,A -1 ) if v k, u k are independent of the iterates y k,c k 1. Split A = M - N 2. y k+1 = (1- v k ) y k-1 + v k y k + v k u k M -1 (c k -A y k ) where c k ~ N(0, (2-v k )/v k ( (2 u k )/ u k M + N) Gibbs sampler Chebyshev accelerated Gibbs Chebyshev accelerated Gibbs sampler in 2D Chebyshev accelerated Gibbs sampler in 100D Covariance matrix convergence ||A -1 S k || 2 Chebyshev accelerated Gibbs sample in 10 6 D ~N(0,Laplacian -1 ) The polynomial accelerated sampler presented here does not apply since the parameters v k, u k are functions of the residuals c k - A y k Colin devised an approach called the conjugate direction sampler Conjugate Gradient (CG) acceleration Lanczos sampler Iterative eigen-solvers can also be hijacked to produce samples as well as eignenvalues. One extremely effective sampler for LARGE Gaussians Use a combination of the ideas presented: Use the Lanczos or CD sampler to generate samples and estimates of the extreme eigenvalues of G. Seed these samples and extreme eigenvalues into a Chebyshev accelerated SOR sampler Conclusions Common techniques from numerical linear algebra can be used to sample from Gaussians Cholesky factorization (precise but expensive) Any stationary linear solver can be used as a stationary sampler (inexpensive but with geometric convergence) Polynomial accelerated Samplers Chebyshev Conjugate Gradients Lanczos Sampler Estimation of (,r) from the data using a a Markov Chain Marginal Posteriors Simulating the process: samples from N(,|data) x = 1/2 z + ~N(, ) TOO COURSE OF A GRID Why is CG so fast? Gauss Siedels Coordinate directions CGs conjugate directions Simulating the process: samples from N(,) y = 1/2 z + ~N(, ) Chebyshev accelerated Gibbs sampler in 10 6 D b = SPHERE + , ~ N(0, 2 I), SNR=.5 Chebyshev accelerated Gibbs sampler in 10 6 D A GOOD THING: The CG algorithm is a great linear solver! If the eigenvalues of A are in c clusters, then a solution to Ax=b is found in c