Likelihood Theory with Score Function - News · Definition of score function • The score function, 𝑢(𝜃), is the derivative of function 𝑙𝑜 (𝑦|𝜃) with respect to

Likelihood Theory with Score

Function

Dongyi Sun 250656653

Fan Xiao 250656655

Outline

• Definition of Score Function

• Properties

• Likelihood Estimate

• Examples

Definition of score function

• The score function, 𝑢(𝜃), is the derivative of

function 𝑙𝑜𝑔𝑓(𝑦|𝜃) with respect to the

parameters:

𝑢 𝜃 =𝜕

𝜕𝜃𝑙𝑜𝑔𝑓 𝑦 𝜃 =

1

𝑓(𝜃|𝑦) 𝜕

𝜕𝜃𝑓(𝑦|𝜃)

• The score 𝑢(𝜃) indicates the sensitivity of

𝑓(𝑦|𝜃).

Properties

𝐸 𝑢 𝜃 =

𝜕𝑓 𝑦 𝜃𝜕𝜃

𝑓 𝑦 𝜃 𝑓 𝑦|𝜃 𝑑𝑦

+∞

−∞

= 𝜕𝑓 𝑦 𝜃

𝜕𝜃𝑑𝑦

+∞

−∞

=𝜕

𝜕𝜃 𝑓 𝑦 𝜃 𝑑𝑦

+∞

−∞

= 0

• Mean

• Fisher’s Information

Properties

𝑉𝑎𝑟 𝑢 𝜃 = 𝐸𝜕

𝜕𝜃𝑙𝑜𝑔𝑓 𝑦 𝜃

2

|𝜃

= 𝜕

𝜕𝜃𝑙𝑜𝑔𝑓 𝑦|𝜃

2

𝑓 𝑦|𝜃 𝑑𝑦+∞

−∞

= 𝐼(𝜃)

• Fisher’s Information

Properties

𝐼 𝜃 = −𝐸𝜕2

𝜕𝜃2𝑙𝑜𝑔𝑓 𝑦|𝜃 |𝜃

If log 𝑓(𝑦|𝜃) is twice differentiable with respect

to 𝜃, and under certain regularity conditions, then

the Fisher information may also be written as

• Consider n independent random variables,

𝑌1, … , 𝑌𝑛, with probability density function

𝑓(𝑦|𝜃) where 𝜃 is the, possibly vector-

valued, parameter.

Maximum Likelihood Estimate

• If the observation is 𝒚 = (𝑦1, 𝑦2, … , 𝑦𝑛)T, then

the maximum likelihood estimate(MLE) is the

value of the parameter(s) that maximizes the

likelihood function

𝐿 𝜃 𝑦 = 𝑓(𝑦𝑖|𝜃)

𝑛

𝑖=1

Where 𝑦 = 𝑦1, ⋯ , 𝑦𝑛𝑇


• So we obtain the score function for the

maximum likelihood function as following:

𝑢 𝜃 =𝜕

𝜕𝜃𝑙𝑜𝑔𝐿 𝜃 𝑦 =

1

𝐿 𝜃 𝑦 𝜕

𝜕𝜃𝐿 𝜃 𝑦

• Let 𝑢 𝜃 = 0 and solve the equation to get

MLE of 𝜃.


• Variance of 𝜃

𝑉𝑎𝑟 𝜃 = 𝐼−1 𝜃 = −E𝜕2𝑢 𝜃

𝜕𝜃𝜕𝜃′

−1


• Newton-Raphson method.

𝑢 𝜃 ≈ 𝑢 𝜃0 +𝜕𝑢 𝜃

𝜕𝜃(𝜃 − 𝜃0)

• Denote

𝐻 𝜃 =𝜕𝑢(𝜃)

𝜕𝜃

Numerical Method

• Here set 𝑢 𝜃 = 0 and rearrange, we get

𝜃 1 = 𝜃0 − 𝐻−1(𝜃0)𝑢 𝜃0

• Here is the recursive formula

𝜃 𝑖+1 = 𝜃 𝑖 − 𝐻−1(𝜃 𝑖)𝑢(𝜃 𝑖)

Numerical Method

• Fisher-Scoring method.

Replace −𝐻 𝜃 by 𝐼 𝜃 , we get Fisher-

scoring method

𝜃 𝑖+1 = 𝜃 𝑖 + 𝐼−1(𝜃 𝑖)𝑢(𝜃 𝑖)

Numerical Method

Geometric Example

• We use R to simulate a sample of 50 data

values from geometric distribution.

• Here we assume that the probability density

function for each data is

𝑓 𝑦𝑖 𝛽 =𝛽𝑦𝑖

(1 + 𝛽)𝑦𝑖+1

Geometric Example

Geometric Example

• We have the following likelihood function

(since there are 50 data points)

𝐿 𝛽 𝒚 = 𝑓(𝑦𝑖|𝛽)

50

𝑖=1

=𝛽 𝑦𝑖

50𝑖=1

(1 + 𝛽) (𝑦𝑖+1)50𝑖=1

• Then we obtain the score function

𝑢 𝛽 = 𝑦𝑖

50𝑖=1

𝛽−

(𝑦𝑖 + 1)50𝑖=1

1 + 𝛽

Geometric Example

• Since now we have the score function, there

are two methods to get the MLE.

• First, we calculate it directly by solving: 𝑢 𝛽 = 0

⇒ 𝛽 = (𝑦𝑖 + 1)50

𝑖=1

𝑦𝑖50𝑖=1

− 1

−1

= 𝑦𝑖

50𝑖=1

50

= 3.92

Geometric Example

• By using Newton-Raphson method, we

have:

𝛽 𝑖+1 = 𝛽 𝑖 − 𝐻−1(𝛽 𝑖)𝑢(𝛽 𝑖)

where 𝐻 𝛽 =𝜕𝑢(𝛽)

𝜕𝛽= −

𝑦𝑖50𝑖=1

𝛽2 + (𝑦𝑖+1)50

𝑖=1

(1+𝛽)2

Geometric Example

• Now using Fisher-Scoring method:

𝛽 𝑖+1 = 𝛽 𝑖 − 𝐼−1(𝛽 𝑖)𝑢(𝛽 𝑖)

where I 𝛽 = −𝐸𝜕𝑢 𝛽

𝜕𝛽

= −E − 𝑦𝑖

50𝑖=1

𝛽2+

(𝑦𝑖 + 1)50𝑖=1

1 + 𝛽 2

=50𝛽

𝛽2−

50 1 + 𝛽

1 + 𝛽 2=

50

𝛽−

50

1 + 𝛽

Geometric Example

Geometric Example

𝑢 𝛽 0 =19356.44

• Newton-Raphson

Geometric Example

……

• The maximum that Newton-Raphson finds will depend

on the choice of initial estimate.

• The maximum likelihood may occur at the boundary of

the parameter space. This means that perhaps

𝑢(𝜃 ) ≠ 0 which will confuse the Newton-Raphson

method.

• The likelihood has a large number of parameters and is

quite flat in the neighborhood of the maximum. The

Newton-Raphson method may take a long time to

converge.

Geometric Example

• #Geometric

• l<-sum(X)*log(b)-sum(X+1)*log(b+1)

• u<-function(th,m) sum(X)/th-sum(X+1)/(th+1)

• H<-function(th,m) 1/(-sum(X)/th^2+sum(X+1)/(th+1)^2)

• I<-function(th,m) 1/(-m*th/th^2+m*(th+1)/(th+1)^2)

• #Newton

• n<-50

• m<-length(X)

• s7<-c()

• s7[1]<-0.00000000001

• for(j in 1:n)

• {

• s7[j+1]<-s7[j]-H(s7[j],m)*u(s7[j],m)

• }

• s7[n]

#Fisher

n<-50

m<-length(X)

d<-c()

d[1]<-0.01

for(j in 1:n)

{

d[j+1]<-d[j]-I(d[j],m)*u(d[j],m)

}

d[n]

beta<-seq(0,5,0.01)

comp<-data.frame(IniVal0.00000000001=s7,IniVal0.01=s,

IniVal2=s4,IniVal3=s1,Inival4=s5,Inival5=s6,IniVal6=s2,IniVal7=s3)

comp

Appendix

• par(mfrow=c(1,2))

• #Newton-Raphson

• plot(beta,u(beta,m),type="l",ylim=c(-100,1500),xlab="beta",ylab="u(beta)",main="Newton-Raphson")

• abline(0,0)

• for(k in 1:n)

• {

• points(s[k],u(s[k],m))

• }

• #Fisher-Scoring

• plot(beta,u(beta,m),type="l",ylim=c(-100,1500),xlab="beta",ylab="u(beta)",main="Fisher-Scoring")

• abline(0,0)

• for(k in 1:n)

• {

• points(d[k],u(d[k],m))

• }

• result<-data.frame(Step=seq(1,n+1),Newton-Raphson=s,Fisher-Scoring=d)

• result

Appendix

Documents

Likelihood Theory with Score Function - News · Definition of score function • The score function, 𝑢(𝜃), is the derivative of function 𝑙𝑜 (𝑦|𝜃) with respect to