Upload
rigg
View
76
Download
3
Embed Size (px)
DESCRIPTION
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ . - PowerPoint PPT Presentation
Citation preview
Expectation-Maximization (EM)
Chapter 3 (Duda et al.) – Section 3.9
CS479/679 Pattern RecognitionDr. George Bebis
Expectation-Maximization (EM)
• EM is an iterative ML estimation method:
– Starts with an initial estimate for θ.
– Refines the current estimate iteratively to increase the likelihood of the observed data:
p(D/ θ)
Expectation-Maximization (cont’d)
• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)
– Some creativity is required to recognize where the EM algorithm can be used.
– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Incomplete Data
• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.
• The EM algorithm is ideal for problems with unobserved (missing) data.
Example (Moon, 1996)
xx11!x!x22!x!x33!!
x1+x2+x3=k
Assume a trinomialdistribution:
s
k!
Example (Moon, 1996) (cont’d)
y1
y2
EM: Main Idea
• If x was available, we could estimate θ using ML:
• Given that only y is available, estimate θ by:
Maximizing the expectation of ln p(Dx / θ) (with
respect to the unknown variables) given Dy and an estimate of θ.
θθ̂ arg max ln ( / θ)xp D
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
EM Steps
(1) Initialization(2) E-Step: Expectation(3) M-Step: Maximization(4) Test for convergence
EM Steps (cont’d)
(1) Initialization Step: initialize the algorithm with a guess θ0
(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:
– Note: if ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
( ; ) ( / , )t tunobserved yQ E x D
EM Steps (cont’d)
(3) Maximization Step: provides a new estimate of the parameters:
(4) Test for Convergence:
stop; otherwise, go to Step 2.
t+1 tθθ arg max (θ;θ )Q
t+1 t|θ - θ | if
Example (Moon, 1996) (cont’d)
xx11!x!x22!x!x33!!
k!
k!
where xi=(xi1,xi2,xi3)
Example (Moon, 1996) (cont’d)
Let’s look at the M-step before completing the E-step …
• Take the expected value:
k!
Example (Moon, 1996) (cont’d)
Let’s complete the E-step now …
• We only need to estimate:
22ΣΣii
ΣΣii
=
Example (Moon, 1996) (cont’d)
(see Moon’s paper, page 53)
Example (Moon, 1996) (cont’d)
• Initialization: θ0
• Expectation Step:
• Maximization Step:
• Convergence Step: t+1 t|θ -θ |
22ΣΣii
ΣΣii
Example (Moon, 1996) (cont’d)
θθtt
Convergence properties of EM
• The solution depends on the initial estimate θ0
• At each iteration, a value of θ is computed so that the likelihood function does not decrease.
• The algorithm is guaranteed to be stable (i.e., does not oscillate).
• There is no guarantee that it will convergence to a global maximum.
• EM is the standard method for estimating the parameters of “mixture models”.
Mixture Models
Example:
mixture of2D Gaussians
Mixture Model (cont’d)
ππ11
ππ22ππ33
ππkk
Mixture of 1D Gaussians - Example
π1=0.3
π2=0.2
π3=0.5
Mixture Model (cont’d)
Estimating the parameters of a Mixture Model
• Two fundamental problems:
(1) Estimate the number of mixture components K
(2) Estimate mixture parameters (πk, θk), k=1,2,…,K
Mixtures of Gaussians(Chapter 10)
where p(x/θk)=
• In this case, θk = (μk,Σk)
kk
k k
Data Generation Process Using Mixtures of Gaussians
ππ11
ππ22ππ33
ππkk
Estimating Mixture Parameters Using ML – not easy!
• ML works my maximizing:
• The density function is a mixture:
• Using ML, we must maximize:
Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions
Observation
… but we don’t!
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Introduce hidden or unobserved variables zi
• Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
substitute
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
E(zik) is just the probability that xi was generated by the k-th
component:
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Maximization Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
Estimating Mixture Parameters Using EM: General Case
• Need to review Lagrange Optimization first …
Lagrange Optimization
g(x)=0
solve forx and λ
n+1 equations / n+1 unknowns
Lagrange Optimization (cont’d)
• Example
Maximize f(x1,x2)=x1x2 subject to the constraint
g(x1,x2)=x1+x2-1=0
1 22
1
1 21
2
1 2
( , , )0
( , , )0
1 0
L x xx
x
L x xx
x
x x
1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x
3 equations / 3 unknowns
Estimating Mixture Parameters Using EM: General Case
• Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step
substitute
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Stepuse Lagrangeoptimization
g(x)=0
n
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
Estimating the Number of Components K