Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM)

Chapter 3 (Duda et al.) – Section 3.9

CS479/679 Pattern RecognitionDr. George Bebis

Expectation-Maximization (EM)

• EM is an iterative ML estimation method:

– Starts with an initial estimate for θ.

– Refines the current estimate iteratively to increase the likelihood of the observed data:

p(D/ θ)

Expectation-Maximization (cont’d)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Incomplete Data

• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.

• The EM algorithm is ideal for problems with unobserved (missing) data.

Example (Moon, 1996)

xx11!x!x22!x!x33!!

x1+x2+x3=k

Assume a trinomialdistribution:

s

k!

Example (Moon, 1996) (cont’d)

y1

y2

EM: Main Idea

• If x was available, we could estimate θ using ML:

• Given that only y is available, estimate θ by:

Maximizing the expectation of ln p(Dx / θ) (with

respect to the unknown variables) given Dy and an estimate of θ.

θθ̂ arg max ln ( / θ)xp D

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

EM Steps

(1) Initialization(2) E-Step: Expectation(3) M-Step: Maximization(4) Test for convergence

EM Steps (cont’d)

(1) Initialization Step: initialize the algorithm with a guess θ0

(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:

– Note: if ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

( ; ) ( / , )t tunobserved yQ E x D

EM Steps (cont’d)

(3) Maximization Step: provides a new estimate of the parameters:

(4) Test for Convergence:

stop; otherwise, go to Step 2.

t+1 tθθ arg max (θ;θ )Q

t+1 t|θ - θ | if


xx11!x!x22!x!x33!!

k!

k!

where xi=(xi1,xi2,xi3)


Let’s look at the M-step before completing the E-step …

• Take the expected value:

k!


Let’s complete the E-step now …

• We only need to estimate:

22ΣΣii

ΣΣii

=


(see Moon’s paper, page 53)


• Initialization: θ0

• Expectation Step:

• Maximization Step:

• Convergence Step: t+1 t|θ -θ |

22ΣΣii

ΣΣii


θθtt

Convergence properties of EM

• The solution depends on the initial estimate θ0

• At each iteration, a value of θ is computed so that the likelihood function does not decrease.

• The algorithm is guaranteed to be stable (i.e., does not oscillate).

• There is no guarantee that it will convergence to a global maximum.

• EM is the standard method for estimating the parameters of “mixture models”.

Mixture Models

Example:

mixture of2D Gaussians

Mixture Model (cont’d)

ππ11

ππ22ππ33

ππkk

Mixture of 1D Gaussians - Example

π1=0.3

π2=0.2

π3=0.5

Mixture Model (cont’d)

Estimating the parameters of a Mixture Model

• Two fundamental problems:

(1) Estimate the number of mixture components K

(2) Estimate mixture parameters (πk, θk), k=1,2,…,K

Mixtures of Gaussians(Chapter 10)

where p(x/θk)=

• In this case, θk = (μk,Σk)

kk

k k

Data Generation Process Using Mixtures of Gaussians

ππ11

ππ22ππ33

ππkk

Estimating Mixture Parameters Using ML – not easy!

• ML works my maximizing:

• The density function is a mixture:

• Using ML, we must maximize:

Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions

Observation

… but we don’t!

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Introduce hidden or unobserved variables zi

• Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)


(cont’d)• Expectation Step

substitute







E(zik) is just the probability that xi was generated by the k-th

component:


• Maximization Step



Estimating Mixture Parameters Using EM: General Case

• Need to review Lagrange Optimization first …

Lagrange Optimization

g(x)=0

solve forx and λ

n+1 equations / n+1 unknowns

Lagrange Optimization (cont’d)

• Example

Maximize f(x1,x2)=x1x2 subject to the constraint

g(x1,x2)=x1+x2-1=0

1 22

1

1 21

2

1 2

( , , )0

( , , )0

1 0

L x xx

x

L x xx

x

x x

1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x

3 equations / 3 unknowns

Estimating Mixture Parameters Using EM: General Case

• Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step

substitute


• Expectation Step (cont’d)


• Expectation Step (cont’d)


• Maximization Stepuse Lagrangeoptimization

g(x)=0

n


• Maximization Step (cont’d)



Estimating the Number of Components K

Documents

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9