Upload
duongkien
View
225
Download
2
Embed Size (px)
Citation preview
Likelihood Theory with Score
Function
Dongyi Sun 250656653
Fan Xiao 250656655
Outline
• Definition of Score Function
• Properties
• Likelihood Estimate
• Examples
Definition of score function
• The score function, 𝑢(𝜃), is the derivative of
function 𝑙𝑜𝑔𝑓(𝑦|𝜃) with respect to the
parameters:
𝑢 𝜃 =𝜕
𝜕𝜃𝑙𝑜𝑔𝑓 𝑦 𝜃 =
1
𝑓(𝜃|𝑦) 𝜕
𝜕𝜃𝑓(𝑦|𝜃)
• The score 𝑢(𝜃) indicates the sensitivity of
𝑓(𝑦|𝜃).
Properties
𝐸 𝑢 𝜃 =
𝜕𝑓 𝑦 𝜃𝜕𝜃
𝑓 𝑦 𝜃 𝑓 𝑦|𝜃 𝑑𝑦
+∞
−∞
= 𝜕𝑓 𝑦 𝜃
𝜕𝜃𝑑𝑦
+∞
−∞
=𝜕
𝜕𝜃 𝑓 𝑦 𝜃 𝑑𝑦
+∞
−∞
= 0
• Mean
• Fisher’s Information
Properties
𝑉𝑎𝑟 𝑢 𝜃 = 𝐸𝜕
𝜕𝜃𝑙𝑜𝑔𝑓 𝑦 𝜃
2
|𝜃
= 𝜕
𝜕𝜃𝑙𝑜𝑔𝑓 𝑦|𝜃
2
𝑓 𝑦|𝜃 𝑑𝑦+∞
−∞
= 𝐼(𝜃)
• Fisher’s Information
Properties
𝐼 𝜃 = −𝐸𝜕2
𝜕𝜃2𝑙𝑜𝑔𝑓 𝑦|𝜃 |𝜃
If log 𝑓(𝑦|𝜃) is twice differentiable with respect
to 𝜃, and under certain regularity conditions, then
the Fisher information may also be written as
• Consider n independent random variables,
𝑌1, … , 𝑌𝑛, with probability density function
𝑓(𝑦|𝜃) where 𝜃 is the, possibly vector-
valued, parameter.
Maximum Likelihood Estimate
• If the observation is 𝒚 = (𝑦1, 𝑦2, … , 𝑦𝑛)T, then
the maximum likelihood estimate(MLE) is the
value of the parameter(s) that maximizes the
likelihood function
𝐿 𝜃 𝑦 = 𝑓(𝑦𝑖|𝜃)
𝑛
𝑖=1
Where 𝑦 = 𝑦1, ⋯ , 𝑦𝑛𝑇
Maximum Likelihood Estimate
• So we obtain the score function for the
maximum likelihood function as following:
𝑢 𝜃 =𝜕
𝜕𝜃𝑙𝑜𝑔𝐿 𝜃 𝑦 =
1
𝐿 𝜃 𝑦 𝜕
𝜕𝜃𝐿 𝜃 𝑦
• Let 𝑢 𝜃 = 0 and solve the equation to get
MLE of 𝜃.
Maximum Likelihood Estimate
• Variance of 𝜃
𝑉𝑎𝑟 𝜃 = 𝐼−1 𝜃 = −E𝜕2𝑢 𝜃
𝜕𝜃𝜕𝜃′
−1
Maximum Likelihood Estimate
• Newton-Raphson method.
𝑢 𝜃 ≈ 𝑢 𝜃0 +𝜕𝑢 𝜃
𝜕𝜃(𝜃 − 𝜃0)
• Denote
𝐻 𝜃 =𝜕𝑢(𝜃)
𝜕𝜃
Numerical Method
• Here set 𝑢 𝜃 = 0 and rearrange, we get
𝜃 1 = 𝜃0 − 𝐻−1(𝜃0)𝑢 𝜃0
• Here is the recursive formula
𝜃 𝑖+1 = 𝜃 𝑖 − 𝐻−1(𝜃 𝑖)𝑢(𝜃 𝑖)
Numerical Method
• Fisher-Scoring method.
Replace −𝐻 𝜃 by 𝐼 𝜃 , we get Fisher-
scoring method
𝜃 𝑖+1 = 𝜃 𝑖 + 𝐼−1(𝜃 𝑖)𝑢(𝜃 𝑖)
Numerical Method
Geometric Example
• We use R to simulate a sample of 50 data
values from geometric distribution.
• Here we assume that the probability density
function for each data is
𝑓 𝑦𝑖 𝛽 =𝛽𝑦𝑖
(1 + 𝛽)𝑦𝑖+1
Geometric Example
Geometric Example
• We have the following likelihood function
(since there are 50 data points)
𝐿 𝛽 𝒚 = 𝑓(𝑦𝑖|𝛽)
50
𝑖=1
=𝛽 𝑦𝑖
50𝑖=1
(1 + 𝛽) (𝑦𝑖+1)50𝑖=1
• Then we obtain the score function
𝑢 𝛽 = 𝑦𝑖
50𝑖=1
𝛽−
(𝑦𝑖 + 1)50𝑖=1
1 + 𝛽
Geometric Example
• Since now we have the score function, there
are two methods to get the MLE.
• First, we calculate it directly by solving: 𝑢 𝛽 = 0
⇒ 𝛽 = (𝑦𝑖 + 1)50
𝑖=1
𝑦𝑖50𝑖=1
− 1
−1
= 𝑦𝑖
50𝑖=1
50
= 3.92
Geometric Example
• By using Newton-Raphson method, we
have:
𝛽 𝑖+1 = 𝛽 𝑖 − 𝐻−1(𝛽 𝑖)𝑢(𝛽 𝑖)
where 𝐻 𝛽 =𝜕𝑢(𝛽)
𝜕𝛽= −
𝑦𝑖50𝑖=1
𝛽2 + (𝑦𝑖+1)50
𝑖=1
(1+𝛽)2
Geometric Example
• Now using Fisher-Scoring method:
𝛽 𝑖+1 = 𝛽 𝑖 − 𝐼−1(𝛽 𝑖)𝑢(𝛽 𝑖)
where I 𝛽 = −𝐸𝜕𝑢 𝛽
𝜕𝛽
= −E − 𝑦𝑖
50𝑖=1
𝛽2+
(𝑦𝑖 + 1)50𝑖=1
1 + 𝛽 2
=50𝛽
𝛽2−
50 1 + 𝛽
1 + 𝛽 2=
50
𝛽−
50
1 + 𝛽
Geometric Example
Geometric Example
𝑢 𝛽 0 =19356.44
• Newton-Raphson
Geometric Example
……
• The maximum that Newton-Raphson finds will depend
on the choice of initial estimate.
• The maximum likelihood may occur at the boundary of
the parameter space. This means that perhaps
𝑢(𝜃 ) ≠ 0 which will confuse the Newton-Raphson
method.
• The likelihood has a large number of parameters and is
quite flat in the neighborhood of the maximum. The
Newton-Raphson method may take a long time to
converge.
Geometric Example
• #Geometric
• l<-sum(X)*log(b)-sum(X+1)*log(b+1)
• u<-function(th,m) sum(X)/th-sum(X+1)/(th+1)
• H<-function(th,m) 1/(-sum(X)/th^2+sum(X+1)/(th+1)^2)
• I<-function(th,m) 1/(-m*th/th^2+m*(th+1)/(th+1)^2)
• #Newton
• n<-50
• m<-length(X)
• s7<-c()
• s7[1]<-0.00000000001
• for(j in 1:n)
• {
• s7[j+1]<-s7[j]-H(s7[j],m)*u(s7[j],m)
• }
• s7[n]
#Fisher
n<-50
m<-length(X)
d<-c()
d[1]<-0.01
for(j in 1:n)
{
d[j+1]<-d[j]-I(d[j],m)*u(d[j],m)
}
d[n]
beta<-seq(0,5,0.01)
comp<-data.frame(IniVal0.00000000001=s7,IniVal0.01=s,
IniVal2=s4,IniVal3=s1,Inival4=s5,Inival5=s6,IniVal6=s2,IniVal7=s3)
comp
Appendix
• par(mfrow=c(1,2))
• #Newton-Raphson
• plot(beta,u(beta,m),type="l",ylim=c(-100,1500),xlab="beta",ylab="u(beta)",main="Newton-Raphson")
• abline(0,0)
• for(k in 1:n)
• {
• points(s[k],u(s[k],m))
• }
• #Fisher-Scoring
• plot(beta,u(beta,m),type="l",ylim=c(-100,1500),xlab="beta",ylab="u(beta)",main="Fisher-Scoring")
• abline(0,0)
• for(k in 1:n)
• {
• points(d[k],u(d[k],m))
• }
• result<-data.frame(Step=seq(1,n+1),Newton-Raphson=s,Fisher-Scoring=d)
• result
Appendix