26
Applied Bayesian Nonparametrics Special Topics in Machine Learning Brown University CSCI 2950-P, Fall 2011 October 11: Variational Methods

Special Topics in Machine Learning Brown University CSCI

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Special Topics in Machine Learning Brown University CSCI

Applied Bayesian Nonparametrics

Special Topics in Machine Learning Brown University CSCI 2950-P, Fall 2011

October 11: Variational Methods

Page 2: Special Topics in Machine Learning Brown University CSCI

Convexity & Jensen s Inequality

f(x)

chord

a x! b

Page 3: Special Topics in Machine Learning Brown University CSCI

Lower Bounds on Marginal Likelihood

ln p(X|θ)L(q,θ)

KL(q||p)

C. Bishop, Pattern Recognition & Machine Learning

Page 4: Special Topics in Machine Learning Brown University CSCI

Expectation Maximization Algorithm

C. Bishop, Pattern Recognition & Machine Learning

ln p(X|θold)L(q,θold)

KL(q||p) = 0

ln p(X|θnew)L(q,θnew)

KL(q||p)

E Step: Optimize distribution on hidden variables given parameters

M Step: Optimize parameters given distribution on hidden variables

Page 5: Special Topics in Machine Learning Brown University CSCI

EM: A Sequence of Lower Bounds

C. Bishop, Pattern Recognition & Machine Learning

θold θnew

L (q, θ)

ln p(X|θ)

Page 6: Special Topics in Machine Learning Brown University CSCI

Fitting Gaussian Mixtures

C. Bishop, Pattern Recognition & Machine Learning

(a)

0 0.5 1

0

0.5

1 (b)

0 0.5 1

0

0.5

1

Complete Data Labeled by True Cluster Assignments

Incomplete Data: Points to be Clustered

Page 7: Special Topics in Machine Learning Brown University CSCI

Posterior Assignment Probabilities

C. Bishop, Pattern Recognition & Machine Learning

(b)

0 0.5 1

0

0.5

1

Posterior Probabilities of Assignment to Each Cluster

Incomplete Data: Points to be Clustered

(c)

0 0.5 1

0

0.5

1

Page 8: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(a)!2 0 2

!2

0

2

Page 9: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(b)!2 0 2

!2

0

2

Page 10: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(c)

L = 1

!2 0 2

!2

0

2

Page 11: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(d)

L = 2

!2 0 2

!2

0

2

Page 12: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(e)

L = 5

!2 0 2

!2

0

2

Page 13: Special Topics in Machine Learning Brown University CSCI

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(f)

L = 20

!2 0 2

!2

0

2

Page 14: Special Topics in Machine Learning Brown University CSCI

Pairwise Markov Random Fields

•! Product of arbitrary positive clique potential functions

•! Guaranteed Markov with respect to corresponding graph

set of nodes

set of edges connecting nodes normalization constant (partition function)

Page 15: Special Topics in Machine Learning Brown University CSCI

Markov Chain Factorizations

Page 16: Special Topics in Machine Learning Brown University CSCI

Energy Functions

Interpretation and terminology from statistical physics

Page 17: Special Topics in Machine Learning Brown University CSCI

Approximate Inference Framework

•!Choose a family of approximating distributions which is tractable. The simplest example:

•!Define a distance to measure the quality of different approximations. Two possibilities:

•!Find the approximation minimizing this distance

Page 18: Special Topics in Machine Learning Brown University CSCI

Fully Factored Approximations

•!Trivially minimized by setting

•!Doesn’t provide a computational method!

Marginal Entropies

Joint Entropy

Page 19: Special Topics in Machine Learning Brown University CSCI

Variational Approximations

(Multiply by one)

(Jensen’s inequality)

•!Minimizing KL divergence maximizes a lower bound on the data likelihood

Page 20: Special Topics in Machine Learning Brown University CSCI

Free Energies

Negative Entropy

Average Energy

Gibbs Free Energy

Normalization

•!Free energies equivalent to KL divergence, up to a normalization constant

Page 21: Special Topics in Machine Learning Brown University CSCI

Mean Field Free Energy

Page 22: Special Topics in Machine Learning Brown University CSCI

Mean Field Equations

•!Add Lagrange multipliers to enforce

•!Taking derivatives and simplifying, we find a set of fixed point equations:

•!Updating one marginal at a time gives convergent coordinate descent

Page 23: Special Topics in Machine Learning Brown University CSCI

Mean Field Message Passing

Want products of messages to be simple

Want expectations of log potential functions to be simple

Page 24: Special Topics in Machine Learning Brown University CSCI

Exponential Families •! Natural or canonical parameters determine log-linear

combination of sufficient statistics:

•! Log partition function normalizes to produce valid probability distribution:

Page 25: Special Topics in Machine Learning Brown University CSCI

Directed Mean Field

•! Can derive updates using exponential family form of the conditional distribution of each variable, given its parents

•! Can also just take derivatives, collect terms, simplify!

Variational Message Passing, Winn &

Bishop, JMLR 2005

Page 26: Special Topics in Machine Learning Brown University CSCI

Structured Mean Field

Original Graph Naïve Mean Field Structured Mean Field

•!Any subgraph for which inference is tractable leads to a mean field style approximation for which the update equations are tractable