Liero Chapter 4 Estimation Istatlearn.uos.ac.kr/2020/grad_mathstat_spring_20/Lch4_slide1.pdf · PP j6= E (X iX j) = 1 m 2 + n 1 m2 1 E (^˙2 MME) = E (1 n P i X 2 i) E (X 2) = m 2

Liero Chapter 4 Estimation I

박창이

서울시립대학교 통계학과

박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 1 / 25

학습 내용

Methods of Estimation

Method of moments

Maximum likelihood estimators

Unbiasedness and Mean Squared Error(MSE)

Best unbiased estimators(BUE)

Cramer-Rao lower bound

Multi-dimensional parameters (next week)

Rao-Blackwell and Lehmann-Scheffe theorems

Asymptotic Properties of Estimators


Estimation I

g : Θ→ Γ

To derive conclusions about γ = g(θ)

Specifying a plausible value for γ ⇒ point estimation

Determining a subset of Γ of plausible values for γ ⇒ confidence sets

Definition 4.1 (Estimator)

A function T : X → Γ is an estimator of γ = g(θ). The value T (x) is

called the estimate of g(θ).


Estimation II

Example 4.1 (Ballpoint pens)

The probability of producing a defective pens g(θ) = θ and Γ = (0, 1).

T (x) = 17

∑7i=1

xin : the mean of daily ratios of defective pens

T (x) = x1

n : the ratio of defective pens on monday

Example 4.2 (Pendulum)

The variance γ = σ2 (the precision of the measurements) of the

assumed normal distribution.

T (x) = 1n−1

∑ni=1(xi − x)2


Methods of Estimation: method of moments I

X = (X1, . . . ,Xn): a sample of IID r.v.’s from F .

Assume the model

P = {P⊗nF : F ∈ Θ}, (1)

where Θ = {F :∫x rdF <∞}.

Denote mj = E(X j) =∫x jdF .

γ is assumed to depend on θ = F via the moments mj and can be

expressed as γ = h(m1(F ), . . . ,mr (F )), where h is a known function.


Methods of Estimation: method of moments II

Definition 4.2 (Method of moments estimator)

Suppose the model (1). The moment estimate for γ is defined by

γMME = h(m1, . . . , mr ), where mj is the empirical moment of order j :

mj = 1n

∑i x

ji .

The distribution function is defined by F (t) = PF ((−∞, t]) and we

define the empirical distribution function as

Fn(t) = Fn(t; x) = 1n

∑i I (xi ∈ (−∞, t]). Then we can write the

empirical moment in the form mj = mj(Fn) =∫x jdFn.


Methods of Estimation: method of moments III

Special case 4.1 (Mean)

For the case when r = 1, we estimate γ =∫xdF by the sample mean

1n

∑i xi .

Special case 4.2 (Standard deviation)

Suppose the model (1) with r = 2. Since σ2 = m2 −m21, we have

σ =√m2 −m2

1 = h(m1,m2). Thus

σMME =√

m2 − m21 =

√1n

∑i (xi − x)2.

Special case 4.6 (Poisson distribution)

XiIID∼ Poi(λ), i = 1, . . . , n.

λMME = 1n

∑i xi and another estimator is λMME = 1

n

∑i (xi − x)2.

The method of moments estimator may not be unique.


Methods of Estimation: maximum likelihood estimators I

The distribution of X of independent r.v.’s belongs to

P = {Pθ : P1,θ ⊗ · · · ⊗ Pn,θ : θ ∈ Θ ⊂ Rk} for some k .

L(θ; x) = p(x ; θ) and l(θ; x) = ln L(θ; x).

Definition 4.3 (Maximum likelihood estimator)

An estimator T is called maximum likelihood estimator (MLE) of θ, if

L(T (x); x) = maxθ∈Θ L(θ; x) for all x ∈ X .


Methods of Estimation: maximum likelihood estimators II

Theorem 4.1

If γ = g(θ) and g is bijective, i.e., θ = g−1(γ), then θ is a MLE for θ

iff γ = g(θ) is a MLE for γ.

(Proof)

Since g is bijective, L(γ; x) = p(x ; g−1(γ)). Furthermore, we have

L(γ; x) ≥ L(γ; x) for all γ iff p(x ; g−1(γ)) ≥ p(x ; g−1(γ)) for all γ iff

p(x ; θ) ≥ p(x ; θ) for all θ.

Definition 4.1 (MLE for g(θ))

The MLE of γ = g(θ) is defined by γMLE = g(θMLE ), where θMLE of

θ.


Methods of Estimation: maximum likelihood estimators III

Special case 4.7 (Binomial distribution)

X ∼ B(n, θ), θ ∈ Θ = (0, 1)

L(θ; x) =(nx

)θx(1− θ)n−x and

l(θ; x) = ln(nx

)+ x ln θ + (n − x) ln(1− θ).

l ′(θ; x) = xθ −

n−x1−θ = 0

If x 6= 0, n, the solution exists and we have θ(x) = xn .

Special case 4.8 (Normal distribution)

X1, . . . ,XnIID∼ N(µ, σ2)

l(µ, σ2; x) ∝ − n2 ln(σ2)− 1

2σ2

∑i (xi − µ)2

From 0 = ∂l∂µ =

∑i (xi − µ) and 0 = ∂l

∂σ2 = − n2σ2 + 2σ4∑

i(xi − µ)2, we

have µMLE (x) = x and σ2MLE (x) = 1

n

∑i (xi − x)2.


Methods of Estimation: maximum likelihood estimators IV

The equations obtained by differentiating the log-likelihood function

may not have an explicit solution. To solve the nonlinear system of

equations, we may adopt iterative procedures such as the

Newton-Raphson algorithm.

Example 4.8 (Nonuniqueness of the MLE)

X1,X2 have an independent Cauchy distribution.

L(θ; x) =∏2

i=11

π(1+(xi−θ)2) , θ ∈ R have the maximum at x1 and x2.


Methods of Estimation: maximum likelihood estimators V

Example 4.9 (Nonexistence of the MLE)

X1, . . . ,XnIID∼ Poi(λ) and Yi = I (Xi > 0).

l(λ; x) =∑

i yi ln(1− e−λ)− λ(n −∑

i yi )

If∑

i yi 6= n, l ′(λ; y) = 0 implies∑

i yi1−e−λ e

−λ = n −∑

i yi and we have

λ(y) = − ln(1− y).

For∑

i yi = n, l(λ; y) = n ln(1− e−λ) is a monotone increasing in λ.

Thus there does not exist a maximum.

Pλ(∑

i Yi = n) = Pλ(Y1 = 1, . . . ,Yn = 1) = (1− e−λ)n → 1 as

λ→∞. There exists λ s.t. the probability that a MLE does not exist is

near 1. On the other hand, for fixed λ and n→∞, we have

Pλ(∑

i Yi = n) = (1− e−λ)n → 0.


Unbiasedness and MSE I

Definition 4.6 (MSE)

Let P = {Pθ : θ ∈ Θ} be a statistical model for a r.v. X on X ,

g : Θ→ Γ be a function, and T : X → Γ an estimator for γ = g(θ).

The mean squared error (MSE) of T is given by

MSE (T , θ) = Eθ(T − g(θ))2.

MSE (T , θ) = Bias2(T , θ) + Varθ(T ), where

Bias(T , θ) = Eθ(T )− g(θ) is the bias of T at θ.

Definition 4.7 (Unbiasedness)

An estimator T for γ = g(θ) is called unbiased if Bias(T , θ) = 0 for

all θ ∈ Θ.


Unbiasedness and MSE II

Example 4.16 (Nonexistence of an unbiased estimator)

X ∼ B(n, θ)

γ = 1θ has no unbiased estimator.

(Proof)

If T is an unbiased estimator for γ, then

Eθ(T ) =∑n

k=0 T (k)(nk

)θk(1− θ)n−k = 1

θ for all θ ∈ (0, 1). The LHS

of the above equation for θ → 0,

T (0)(1− θ)n +∑n

k=1 T (k)(nk

)θk(1− θ)n−k → T (0) but the RHS

→∞.


Unbiasedness and MSE III

Special case 4.12 (Sample variance)

X1, . . . ,Xn: IID with finite variance and γ = σ2.

σ2MME = 1

n

∑ni=1(Xi − X )2 is biased.

Eθ(X )2 = 1n2

∑i Eθ(X 2

i ) + 1n2

∑i

∑j 6=i Eθ(XiXj) = 1

nm2 + n−1

nm2

1

Eθ(σ2MME ) = Eθ( 1

n

∑i X

2i ) − Eθ(X 2) = m2 − 1

nm2 − n−1

nm2

1 = n−1nσ2.

Bias(σ2MME , θ) = −σ2

n.

S2 = 1n−1

∑ni=1(Xi − X )2 is unbiased.

Suppose that Xi ’s are normally distributed.

Note that V = σ−2∑ni=1(Xi − X )2 ∼ χ2(n − 1) and thus

Eθ(V ) = n − 1 and Varθ(V ) = 2(n − 1).

The MME is the MLE. MSE(σ2MLE , θ) =

(− σ2

n

)2

+ 2(n−1)

n2 σ4 = 2n−1n2 σ4.

MSE(S2, θ) = 0 + 2n−1

σ4 = 2n−1

σ4.

MSE(S2, θ) > MSE(σ2MLE , θ).


Unbiasedness and MSE IV

Example 4.18 (Binomial distribution)

X ∼ B(n, θ).

θMLE = Xn is unbiased and MSE (θMLE , θ) = Varθ(θMLE ) = θ(1−θ)

n

which is maximal at θ = 12 .

θ = X+1n+2 with Bias(θ, θ) = nθ+1

n+2 − θ = 1−2θn+2 and

MSE (θ, θ) = nθ(1−θ)+(1−2θ)2

(n+2)2 .

Figure 4.9. MSE (θMLE , θ) > MSE (θ, θ) for all θ satisfying(θ−1/2)2

θ(1−θ) ≤ 1 + 1n or |θ − 1/2| ≤ 1/

√8. If θ is around 1/2, then θ is a

better estimator than θMLE .


Unbiasedness and MSE: BUE

Definition 4.8 (Best unbiased estimator)

Given a statistical model P = {Pθ : θ ∈ Θ}, an unbiased estimator

T ∗ for a parameter γ = g(θ) ∈ R is called best unbiased estimator

(BUE), if for any other unbiased estimator T , Varθ(T ∗) ≤ Varθ(T )

for all θ ∈ Θ.

While looking for best estimators, it is natural to ask if there is a

lower bound to the variance of the estimators for γ. An answer to this

question is given by the Cramer-Rao inequality. If an estimator attains

this bound, it is the best estimator.


Unbiasedness and MSE: Cramer-Rao lower bound I

Θ ⊂ R

A = {x : p(x ; θ) > 0}: common support of Pθ ∈ P

The estimator T is regular unbiased estimator if∫A T (x) ∂∂θL(θ; x)dx = ∂

∂θ

∫A T (x)L(θ; x)dx .


Unbiasedness and MSE: Cramer-Rao lower bound II

Theorem 4.2 (Cramer-Rao bound)

Suppose that Reg 3 and Reg 4 are satisfied and 0 < IX (θ) <∞. Let

γ = g(θ), where g is a continuously differentiable real-valued function

with g ′ 6= 0. If T is a regular unbiased estimator for γ, then

Varθ(T ) ≥ [g ′(θ)]2

IX (θ)for all θ ∈ Θ. The equality holds iff for x ∈ A and

for all θ ∈ Θ,

T (x)− g(θ) =g ′(θ)V (θ; x)

IX (θ), (2)

where V (·; x) is the score function.


Unbiasedness and MSE: Cramer-Rao lower bound III

(Proof)

We have Covθ(T (X ),V (θ; X )) = Eθ(T (X )V (θ; X )) =∫A T (x)∂ ln p(x ;θ)

∂θ p(x ; θ)dx =∫A T (x)∂p(x ;θ)

∂θ dx . Since T is regular,

we can interchange integration and differentiation.

Covθ(T (X ),V (θ; X )) = ∂∂θ

∫A T (x)p(x ; θ)dx = ∂

∂θEθ(T (X )) =

g ′(θ) because T is unbiased. Now set c(θ) = g ′(θ)/IX (θ). Then we

have

0 ≤ Varθ(T (X )− c(θ)V (θ; X ))

= Varθ(T (X ))− 2c(θ)Covθ(T (X ),V (θ; X )) + c2(θ)Varθ(V (θ; X ))

= Varθ(T (X ))− 2c(θ)g ′(θ) + c2(θ)IX (θ)

= Varθ(T (X ))− [g ′(θ)]2/IX (θ)


Unbiasedness and MSE: Cramer-Rao lower bound IV

and the desired inequality follows.

The equality holds if T (X )− c(θ)V (θ; X ) is constant. Since T is

unbiased, the constant is g(θ). So we have

T (x)− c(θ)V (θ; x) = g(θ) or

T (x)− g(θ) = V (θ; x)g ′(θ)/IX (θ).

Implications of Theorem 4.2

If we can find an unbiased estimator attaining the equality, the

estimator is the BUE.

The larger the Fisher information, the smaller the bound for the

variance. So we can estimate more accurately if X contains more

information.


Unbiasedness and MSE: Cramer-Rao lower bound V

For a sample IID r.v.’s with P = {Pθ = P⊗n1,θ : θ ∈ Θ}, IX (θ) = nIX (θ)

and thus Varθ(T (X )) ≥ [g ′(θ)]2

nIX (θ) for all θ ∈ Θ.

Definition 4.19 (Efficiency)

The efficiency of an unbiased estimator T is defined by

e(T , θ) =[g ′(θ)]2

IX (θ)Varθ(T (X)). An unbiased estimator attaining the C-R

bound is called an efficient estimator.

Example 4.19 (Binomial distribution)

X ∼ B(n, θ)

γ = θ(1− θ)

C-R lower bound: (1− 2θ)2θ(1− θ)/n.

T (x) = 1n−1x

(1− x

n

)with variance θ

n −θ2(5n−7)−4θ3(2n−3)+θ4(4n−6)

n(n−1) .

T is not efficient, but e(T , θ)→ 1 as n→∞.


Unbiasedness and MSE: Cramer-Rao lower bound VI

Theorem 4.3

Suppose that the distribution of X = (X1, . . . ,Xn) belongs to a

1-parameter exponential family in ζ and T . Then the sufficient

statistic T is an efficient estimator for γ = g(θ) = Eθ(T (X )).

(Proof)

p(x ; θ) = A(θ) exp(T (x)ζ(θ))h(x) and

V (θ; x) = ∂∂θ ln p(x ; θ) = A′(θ)

A(θ) + ζ ′(θ)T (x). Since V (θ; x) is a linear

function of T (x), Corθ(V (θ; X ),T (X ))2 = 1. From the proof of

Theorem 4.2, Covθ(V (θ; X ),T (X )) = g ′(θ) and thus[g ′(θ)]2

Varθ(T (X ))Varθ(V (θ;X )) = 1. Varθ(V (θ; X )) = IX (θ) implies that

Varθ(T (X )) = [g ′(θ)]2

IX (θ) , meaning that T attains the C-R bound.


Unbiasedness and MSE: Cramer-Rao lower bound VII

Special case 4.15 (Normal distribution, known mean)

Xi ∼ N(µ0, σ2), where µ0 is known.

1-parameter exponential family with T (x) =∑

i (xi − µ0)2 and

ζ(σ2) = − 12σ2 .

IX (µ) = n2σ4 and the variance of σ2 = 1

n

∑i (Xi − µ0)2 is 2σ4/n.

Special case 4.16 (Uniform distribution)

Xi ∼ U[0, θ]

T (X ) = n+1n X(n) is an unbiased estimator with variance θ2

n(n+2) .

The order of the variance is n−2, but this is not a contradiction to the

C-R bound (of n−1 order) because the uniform distributions do not

form a regular model.


Unbiasedness and MSE: Cramer-Rao lower bound VIII

Special case 4.17 (Exponential distribution)

Xi ∼ Exp(λ), λ > 0.

γ(λ) = 1/λ

X is an unbiased estimator of γ with variance 1nλ2 .

IX (λ) = nλ2 , g ′(λ) = − 1

λ2 , and λ−4

nλ−2 = 1nλ2 .

X is an efficient estimator.

MLE’s are not necessarily unbiased. But, as n→∞, the variance of

their limiting distribution is the lower bound. So MLE’s are under

some mild regularity conditions asymptotically efficient.


Documents

Liero Chapter 4 Estimation Istatlearn.uos.ac.kr/2020/grad_mathstat_spring_20/Lch4_slide1.pdf · PP j6= E (X iX j) = 1 m 2 + n 1 m2 1 E (^˙2 MME) = E (1 n P i X 2 i) E (X 2) = m 2