EM ALGORITHMEM ALGORITHM
• EM algorithm is a general iterative method of maximum likelihood estimation for incomplete data
• Used to tackle a wide variety of problems, some of which would not usually be viewed as an incomplete data problem
• Natural situations
– Missing data problems
– Grouped data problems
– Truncated and censored data problems
• Not so obvious situations
– Variance component estimation
– Latent variable situations and random effects models
– Mixture models
• Areas of applications
– Image analysis
– Epidemiology and Medicine
– Engineering
– Genetics and Biology
• Seminal Paper
Dempster, A.P., Laird, N.M. and Rubin,D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). JRSS B 39: 1-38
EM algorithm closely related to the following ad hoc process of handling missing data
1. Fill in the missing values by their estimated values
2. Estimate the parameters for this completed dataset
3. Use the estimated parameters to re-estimate the missing values
4. Re-estimate the parameters from this updated completed dataset
Alternate between steps 3 and 4 until convergence of the parameter estimates
• The EM algorithm formalises this approach
The essential idea behind the EM algorithm is to calculate the maximum likelihood estimates for the incomplete data problem by using the complete data likelihood instead of the observed likelihood because the observed likelihood might be complicated or numerically infeasible to maximise.
To do this, we augment the observed data with manufactured data so as to create a complete likelihood that is computationally more tractable. We then replace, at each iteration, the incomplete data, which are in the sufficient statistics for the parameters in the complete data likelihood, by their conditional expectation given the observed data and the current parameter estimates (Expectation step: E-step)
The new parameter estimates are obtained from these replaced sufficient statistics as though they had come from the complete sample (Maximisation step: M-step)
Alternating E- and M-steps, the sequence of estimates often converges to the mle’s under very general conditions
EXAMPLESEXAMPLES
1. Genetic Linkage Model
2. Censored (survival) data
3. Mixture of two univariate normals
• Genetic Linkage Model
197 animals distributed into four categories
Y is postulated to have arisen from a multinomial d/n with cell probabilities
⎟⎠⎞
⎜⎝⎛ −−+
4),1(
41),1(
41,
421 θθθθ
1 2 3 4( , , , ) (125,18,20,34)y y y y y= =
( )
data' real' as 25 treat :step-M
2525.05.0125;|
: estimate, initialan obtain to
n expectatio lconditiona itsby replace :step-E
5.0estimate, initial theas Take
)0(2
)0(2
)0(2
)0(2
2
)0(
=
=+
×=Ε=
=
x
YXx
x
x
θ
θ
( ) 14.292608.0
608.0125;|
: of estimate improvedObtain
608.0)20183425(
)3425(
: of estimate Update
)1(2
)1(2
2
(1)
=+
×=Ε=
=+++
+=
θ
θ
θ
YXx
x
Alternate E and M-steps
step θ(m)
0 0.5
1 0.6082
2 0.6243
3 0.6265
4 0.6268
5 0.6268
6 0.6268
• Survival time data: right censored exponential(θ) data
3 uncensored observations: t1 = 0.5, t2 = 1.5 and t3 = 4
2 right-censored observations: t4 = 1* and t5 = 3*
Recall lack of memory property
θ1)|( +=>Ε ttTT
5.0)45.15.0(
3
:ratetheofestimateinitialan obtain todatacensoredIgnore
(0) =++
=θ
52313)3|(
of estimate initialan Obtain
32111)1|(
of estimate initialan Obtain
:nsexpectatio lconditionaby their data censored replace :step-E
)0(55)0(
5
5)0(
5
)0(44)0(
4
4)0(
4
=+=+=>Ε=
=+=+=>Ε=
θ
θ
TTt
tt
TTt
tt
8.513)3|(
of estimate improvedObtain
8.311)1|(
of estimate improvedObtain
3571.0)5345.15.0(
5:rate theof estimate Update
data' real' as 5 and 3 treat :step-M
)1(55)1(
5
5)1(
5
)1(44)1(
4
4)1(
4
(1)
)0(5
)0(4
=+=>Ε=
=+=>Ε=
=++++
=
==
θ
θ
θ
TTt
tt
TTt
tt
tt
step θ(m)
0 0.51 0.35712 0.32053 0.30794 0.30315 0.30126 0.30057 0.30028 0.3001
• Mixture of two univariate normals (Old Faithful’s eruptions)
step pi mu[1] mu[2] sigma2[1] sigma2[2]0 0.35 2 4.3 0.1 0.21 0.34920810 2.02049523 4.27511436 0.05694792 0.188709482 0.34889072 2.01974514 4.27441727 0.05637577 0.189616523 0.34869541 2.01928712 4.27398638 0.05602933 0.190180384 0.34857750 2.01901127 4.27372586 0.05582123 0.190521925 0.34850700 2.01884660 4.27357000 0.05569720 0.190726506 0.34846512 2.01874885 4.27347731 0.05562365 0.190848237 0.34844032 2.01869101 4.27342243 0.05558015 0.190920348 0.34842567 2.01865686 4.27339000 0.05555447 0.190962969 0.34841703 2.01863671 4.27337087 0.05553933 0.1909881110 0.34841194 2.01862484 4.27335959 0.05553041 0.1910029411 0.34840893 2.01861784 4.27335294 0.05552515 0.1910116712 0.34840717 2.01861372 4.27334903 0.05552205 0.1910168213 0.34840613 2.01861129 4.27334672 0.05552023 0.1910198514 0.34840551 2.01860986 4.27334537 0.05551916 0.1910216415 0.34840515 2.01860902 4.27334457 0.05551852 0.1910226916 0.34840494 2.01860853 4.27334410 0.05551815 0.1910233117 0.34840481 2.01860823 4.27334382 0.05551793 0.1910236718 0.34840470 2.01860810 4.27334370 0.05551780 0.1910239019 0.34840470 2.01860796 4.27334356 0.05551773 0.1910240120 0.34840467 2.01860790 4.27334350 0.05551768 0.19102409
faithful$eruptions
1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
Normal mixtureKernel density
EM ALGORITHM FOR THE REGULAR EM ALGORITHM FOR THE REGULAR EXPONENTIAL FAMILYEXPONENTIAL FAMILY
)())(exp()();(
from (wlog) ddistribute be ),(Let
θθθ aXtXbXg
ZYX
TC
TTT
=
=
[ ] say ,;|)(
of computing therequires step-E
)()( mm tYXt =Ε θ
[ ] )(|)(
solving requires step-M
mtXt =Ε θ
EM ALGORITHM FOR THE FINITE MIXTURE EM ALGORITHM FOR THE FINITE MIXTURE PROBLEMPROBLEM
Let XT = (YT, ZT) be the complete data vector. Y is the observed data vector and Z the unobserved data vector
The observed likelihood is
∏∑= =
=n
i
k
jjijj ygyL
1 1);( )|( ψπθ
which is difficult to maximise
}componentpth {11 and ),,( where),,,( Define ∈Ι=== iyipikiT
iT
nTT zzzzzzZ ……
ijij zjij
n
i
k
j
zjC ygxL );()|(
1 1
ψπθ ∏∏= =
=
Thus,
and
{ }∑=
+==n
ii
TiCC uvzxLxl
1)()()|(log)|( ψπθθ
where
);(log,),;((log)(
)log,,(log)(
11
1
kikiT
i
kT
ygygu
v
ψψψ
πππ
……
=
=
In the E-step, we compute
∑∑==
+=n
ii
Tmi
n
i
Tmi
m uwvwQ1
)(
1
)()( )()()()(),( ψθπθθθ
where
[ ])(;|)( miii yzw θθ Ε=
∑=
= k
j
mjij
mj
mjij
mjm
ij
yg
ygw
1
)()(
)()()(
);(
);()(
ψπ
ψπθ
and
In the M-step, we simply maximise Q(θ, θ(m))
PROPERTIES OF THE EM ALGORITHMPROPERTIES OF THE EM ALGORITHM
• Stability/Monotonicity
• Under suitable regularity conditions, if θ(m) ’s converge then they converge to a stationary point of l(θ
| y)
• EM algorithm converges at a linear rate, with the rate depending on the proportion of information about θ
in the
observed density
STANDARD ERRORS OF PARAMETERSSTANDARD ERRORS OF PARAMETERS
Louis (1982) showed that
θθ
θθ
θθ
θθ
θθ
θθ
θθθ
ˆ
ˆ
ˆ;|)|(logcov);ˆ(
ˆ;|)|(log)|(log);ˆ();ˆ(
=
=
⎭⎬⎫
⎩⎨⎧
∂∂
−=
⎥⎦⎤
⎢⎣⎡
∂∂
∂∂
Ε−=Ι
yxLyI
yxLxLyIy
CC
TCC
C
Invert to get approximate covariance matrix for the parameter estimates
Returning to Example 1 (Genetic Linkage),
232
242
2
3242
)1()()|(log
)1()|(log
θθθθθ
θθθθ
−+
++
=∂∂
∂−
−+
−+
=∂
∂
yyyxxL
yyyxxL
TC
C
Therefore,
5.435)ˆ1(ˆ
2ˆˆ125
)ˆ1()(
ˆ]ˆ;|[
);ˆ(
232
2
4
232
242
=−
++
++=
−
++
+Ε=
θθθ
θ
θθθ
θ
yyy
yyyyxyI C
and
8.572ˆ
22ˆ
ˆˆ
125ˆ
)ˆ;|var(ˆ;|)|(log
cov22
2
ˆ
=⎟⎠⎞
⎜⎝⎛
+⎟⎟⎠
⎞⎜⎜⎝
⎛
+==
⎭⎬⎫
⎩⎨⎧
∂∂
= θθθ
θθθ
θθ
θ
θθ
yxy
xLC
Thus,
7.3778.575.435);ˆ( =−=Ι yθ
and the standard error of θ̂ is equal to .05.07.3771 =
EM ALGORITHMSlide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27