Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Liero Chapter 4 Estimation I
박창이
서울시립대학교 통계학과
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 1 / 25
학습 내용
Methods of Estimation
Method of moments
Maximum likelihood estimators
Unbiasedness and Mean Squared Error(MSE)
Best unbiased estimators(BUE)
Cramer-Rao lower bound
Multi-dimensional parameters (next week)
Rao-Blackwell and Lehmann-Scheffe theorems
Asymptotic Properties of Estimators
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 2 / 25
Estimation I
g : Θ→ Γ
To derive conclusions about γ = g(θ)
Specifying a plausible value for γ ⇒ point estimation
Determining a subset of Γ of plausible values for γ ⇒ confidence sets
Definition 4.1 (Estimator)
A function T : X → Γ is an estimator of γ = g(θ). The value T (x) is
called the estimate of g(θ).
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 3 / 25
Estimation II
Example 4.1 (Ballpoint pens)
The probability of producing a defective pens g(θ) = θ and Γ = (0, 1).
T (x) = 17
∑7i=1
xin : the mean of daily ratios of defective pens
T (x) = x1
n : the ratio of defective pens on monday
Example 4.2 (Pendulum)
The variance γ = σ2 (the precision of the measurements) of the
assumed normal distribution.
T (x) = 1n−1
∑ni=1(xi − x)2
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 4 / 25
Methods of Estimation: method of moments I
X = (X1, . . . ,Xn): a sample of IID r.v.’s from F .
Assume the model
P = {P⊗nF : F ∈ Θ}, (1)
where Θ = {F :∫x rdF <∞}.
Denote mj = E(X j) =∫x jdF .
γ is assumed to depend on θ = F via the moments mj and can be
expressed as γ = h(m1(F ), . . . ,mr (F )), where h is a known function.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 5 / 25
Methods of Estimation: method of moments II
Definition 4.2 (Method of moments estimator)
Suppose the model (1). The moment estimate for γ is defined by
γMME = h(m1, . . . , mr ), where mj is the empirical moment of order j :
mj = 1n
∑i x
ji .
The distribution function is defined by F (t) = PF ((−∞, t]) and we
define the empirical distribution function as
Fn(t) = Fn(t; x) = 1n
∑i I (xi ∈ (−∞, t]). Then we can write the
empirical moment in the form mj = mj(Fn) =∫x jdFn.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 6 / 25
Methods of Estimation: method of moments III
Special case 4.1 (Mean)
For the case when r = 1, we estimate γ =∫xdF by the sample mean
1n
∑i xi .
Special case 4.2 (Standard deviation)
Suppose the model (1) with r = 2. Since σ2 = m2 −m21, we have
σ =√m2 −m2
1 = h(m1,m2). Thus
σMME =√
m2 − m21 =
√1n
∑i (xi − x)2.
Special case 4.6 (Poisson distribution)
XiIID∼ Poi(λ), i = 1, . . . , n.
λMME = 1n
∑i xi and another estimator is λMME = 1
n
∑i (xi − x)2.
The method of moments estimator may not be unique.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 7 / 25
Methods of Estimation: maximum likelihood estimators I
The distribution of X of independent r.v.’s belongs to
P = {Pθ : P1,θ ⊗ · · · ⊗ Pn,θ : θ ∈ Θ ⊂ Rk} for some k .
L(θ; x) = p(x ; θ) and l(θ; x) = ln L(θ; x).
Definition 4.3 (Maximum likelihood estimator)
An estimator T is called maximum likelihood estimator (MLE) of θ, if
L(T (x); x) = maxθ∈Θ L(θ; x) for all x ∈ X .
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 8 / 25
Methods of Estimation: maximum likelihood estimators II
Theorem 4.1
If γ = g(θ) and g is bijective, i.e., θ = g−1(γ), then θ is a MLE for θ
iff γ = g(θ) is a MLE for γ.
(Proof)
Since g is bijective, L(γ; x) = p(x ; g−1(γ)). Furthermore, we have
L(γ; x) ≥ L(γ; x) for all γ iff p(x ; g−1(γ)) ≥ p(x ; g−1(γ)) for all γ iff
p(x ; θ) ≥ p(x ; θ) for all θ.
Definition 4.1 (MLE for g(θ))
The MLE of γ = g(θ) is defined by γMLE = g(θMLE ), where θMLE of
θ.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 9 / 25
Methods of Estimation: maximum likelihood estimators III
Special case 4.7 (Binomial distribution)
X ∼ B(n, θ), θ ∈ Θ = (0, 1)
L(θ; x) =(nx
)θx(1− θ)n−x and
l(θ; x) = ln(nx
)+ x ln θ + (n − x) ln(1− θ).
l ′(θ; x) = xθ −
n−x1−θ = 0
If x 6= 0, n, the solution exists and we have θ(x) = xn .
Special case 4.8 (Normal distribution)
X1, . . . ,XnIID∼ N(µ, σ2)
l(µ, σ2; x) ∝ − n2 ln(σ2)− 1
2σ2
∑i (xi − µ)2
From 0 = ∂l∂µ =
∑i (xi − µ) and 0 = ∂l
∂σ2 = − n2σ2 + 2σ4∑
i(xi − µ)2, we
have µMLE (x) = x and σ2MLE (x) = 1
n
∑i (xi − x)2.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 10 / 25
Methods of Estimation: maximum likelihood estimators IV
The equations obtained by differentiating the log-likelihood function
may not have an explicit solution. To solve the nonlinear system of
equations, we may adopt iterative procedures such as the
Newton-Raphson algorithm.
Example 4.8 (Nonuniqueness of the MLE)
X1,X2 have an independent Cauchy distribution.
L(θ; x) =∏2
i=11
π(1+(xi−θ)2) , θ ∈ R have the maximum at x1 and x2.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 11 / 25
Methods of Estimation: maximum likelihood estimators V
Example 4.9 (Nonexistence of the MLE)
X1, . . . ,XnIID∼ Poi(λ) and Yi = I (Xi > 0).
l(λ; x) =∑
i yi ln(1− e−λ)− λ(n −∑
i yi )
If∑
i yi 6= n, l ′(λ; y) = 0 implies∑
i yi1−e−λ e
−λ = n −∑
i yi and we have
λ(y) = − ln(1− y).
For∑
i yi = n, l(λ; y) = n ln(1− e−λ) is a monotone increasing in λ.
Thus there does not exist a maximum.
Pλ(∑
i Yi = n) = Pλ(Y1 = 1, . . . ,Yn = 1) = (1− e−λ)n → 1 as
λ→∞. There exists λ s.t. the probability that a MLE does not exist is
near 1. On the other hand, for fixed λ and n→∞, we have
Pλ(∑
i Yi = n) = (1− e−λ)n → 0.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 12 / 25
Unbiasedness and MSE I
Definition 4.6 (MSE)
Let P = {Pθ : θ ∈ Θ} be a statistical model for a r.v. X on X ,
g : Θ→ Γ be a function, and T : X → Γ an estimator for γ = g(θ).
The mean squared error (MSE) of T is given by
MSE (T , θ) = Eθ(T − g(θ))2.
MSE (T , θ) = Bias2(T , θ) + Varθ(T ), where
Bias(T , θ) = Eθ(T )− g(θ) is the bias of T at θ.
Definition 4.7 (Unbiasedness)
An estimator T for γ = g(θ) is called unbiased if Bias(T , θ) = 0 for
all θ ∈ Θ.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 13 / 25
Unbiasedness and MSE II
Example 4.16 (Nonexistence of an unbiased estimator)
X ∼ B(n, θ)
γ = 1θ has no unbiased estimator.
(Proof)
If T is an unbiased estimator for γ, then
Eθ(T ) =∑n
k=0 T (k)(nk
)θk(1− θ)n−k = 1
θ for all θ ∈ (0, 1). The LHS
of the above equation for θ → 0,
T (0)(1− θ)n +∑n
k=1 T (k)(nk
)θk(1− θ)n−k → T (0) but the RHS
→∞.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 14 / 25
Unbiasedness and MSE III
Special case 4.12 (Sample variance)
X1, . . . ,Xn: IID with finite variance and γ = σ2.
σ2MME = 1
n
∑ni=1(Xi − X )2 is biased.
Eθ(X )2 = 1n2
∑i Eθ(X 2
i ) + 1n2
∑i
∑j 6=i Eθ(XiXj) = 1
nm2 + n−1
nm2
1
Eθ(σ2MME ) = Eθ( 1
n
∑i X
2i ) − Eθ(X 2) = m2 − 1
nm2 − n−1
nm2
1 = n−1nσ2.
Bias(σ2MME , θ) = −σ2
n.
S2 = 1n−1
∑ni=1(Xi − X )2 is unbiased.
Suppose that Xi ’s are normally distributed.
Note that V = σ−2∑ni=1(Xi − X )2 ∼ χ2(n − 1) and thus
Eθ(V ) = n − 1 and Varθ(V ) = 2(n − 1).
The MME is the MLE. MSE(σ2MLE , θ) =
(− σ2
n
)2
+ 2(n−1)
n2 σ4 = 2n−1n2 σ4.
MSE(S2, θ) = 0 + 2n−1
σ4 = 2n−1
σ4.
MSE(S2, θ) > MSE(σ2MLE , θ).
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 15 / 25
Unbiasedness and MSE IV
Example 4.18 (Binomial distribution)
X ∼ B(n, θ).
θMLE = Xn is unbiased and MSE (θMLE , θ) = Varθ(θMLE ) = θ(1−θ)
n
which is maximal at θ = 12 .
θ = X+1n+2 with Bias(θ, θ) = nθ+1
n+2 − θ = 1−2θn+2 and
MSE (θ, θ) = nθ(1−θ)+(1−2θ)2
(n+2)2 .
Figure 4.9. MSE (θMLE , θ) > MSE (θ, θ) for all θ satisfying(θ−1/2)2
θ(1−θ) ≤ 1 + 1n or |θ − 1/2| ≤ 1/
√8. If θ is around 1/2, then θ is a
better estimator than θMLE .
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 16 / 25
Unbiasedness and MSE: BUE
Definition 4.8 (Best unbiased estimator)
Given a statistical model P = {Pθ : θ ∈ Θ}, an unbiased estimator
T ∗ for a parameter γ = g(θ) ∈ R is called best unbiased estimator
(BUE), if for any other unbiased estimator T , Varθ(T ∗) ≤ Varθ(T )
for all θ ∈ Θ.
While looking for best estimators, it is natural to ask if there is a
lower bound to the variance of the estimators for γ. An answer to this
question is given by the Cramer-Rao inequality. If an estimator attains
this bound, it is the best estimator.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 17 / 25
Unbiasedness and MSE: Cramer-Rao lower bound I
Θ ⊂ R
A = {x : p(x ; θ) > 0}: common support of Pθ ∈ P
The estimator T is regular unbiased estimator if∫A T (x) ∂∂θL(θ; x)dx = ∂
∂θ
∫A T (x)L(θ; x)dx .
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 18 / 25
Unbiasedness and MSE: Cramer-Rao lower bound II
Theorem 4.2 (Cramer-Rao bound)
Suppose that Reg 3 and Reg 4 are satisfied and 0 < IX (θ) <∞. Let
γ = g(θ), where g is a continuously differentiable real-valued function
with g ′ 6= 0. If T is a regular unbiased estimator for γ, then
Varθ(T ) ≥ [g ′(θ)]2
IX (θ)for all θ ∈ Θ. The equality holds iff for x ∈ A and
for all θ ∈ Θ,
T (x)− g(θ) =g ′(θ)V (θ; x)
IX (θ), (2)
where V (·; x) is the score function.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 19 / 25
Unbiasedness and MSE: Cramer-Rao lower bound III
(Proof)
We have Covθ(T (X ),V (θ; X )) = Eθ(T (X )V (θ; X )) =∫A T (x)∂ ln p(x ;θ)
∂θ p(x ; θ)dx =∫A T (x)∂p(x ;θ)
∂θ dx . Since T is regular,
we can interchange integration and differentiation.
Covθ(T (X ),V (θ; X )) = ∂∂θ
∫A T (x)p(x ; θ)dx = ∂
∂θEθ(T (X )) =
g ′(θ) because T is unbiased. Now set c(θ) = g ′(θ)/IX (θ). Then we
have
0 ≤ Varθ(T (X )− c(θ)V (θ; X ))
= Varθ(T (X ))− 2c(θ)Covθ(T (X ),V (θ; X )) + c2(θ)Varθ(V (θ; X ))
= Varθ(T (X ))− 2c(θ)g ′(θ) + c2(θ)IX (θ)
= Varθ(T (X ))− [g ′(θ)]2/IX (θ)
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 20 / 25
Unbiasedness and MSE: Cramer-Rao lower bound IV
and the desired inequality follows.
The equality holds if T (X )− c(θ)V (θ; X ) is constant. Since T is
unbiased, the constant is g(θ). So we have
T (x)− c(θ)V (θ; x) = g(θ) or
T (x)− g(θ) = V (θ; x)g ′(θ)/IX (θ).
Implications of Theorem 4.2
If we can find an unbiased estimator attaining the equality, the
estimator is the BUE.
The larger the Fisher information, the smaller the bound for the
variance. So we can estimate more accurately if X contains more
information.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 21 / 25
Unbiasedness and MSE: Cramer-Rao lower bound V
For a sample IID r.v.’s with P = {Pθ = P⊗n1,θ : θ ∈ Θ}, IX (θ) = nIX (θ)
and thus Varθ(T (X )) ≥ [g ′(θ)]2
nIX (θ) for all θ ∈ Θ.
Definition 4.19 (Efficiency)
The efficiency of an unbiased estimator T is defined by
e(T , θ) =[g ′(θ)]2
IX (θ)Varθ(T (X)). An unbiased estimator attaining the C-R
bound is called an efficient estimator.
Example 4.19 (Binomial distribution)
X ∼ B(n, θ)
γ = θ(1− θ)
C-R lower bound: (1− 2θ)2θ(1− θ)/n.
T (x) = 1n−1x
(1− x
n
)with variance θ
n −θ2(5n−7)−4θ3(2n−3)+θ4(4n−6)
n(n−1) .
T is not efficient, but e(T , θ)→ 1 as n→∞.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 22 / 25
Unbiasedness and MSE: Cramer-Rao lower bound VI
Theorem 4.3
Suppose that the distribution of X = (X1, . . . ,Xn) belongs to a
1-parameter exponential family in ζ and T . Then the sufficient
statistic T is an efficient estimator for γ = g(θ) = Eθ(T (X )).
(Proof)
p(x ; θ) = A(θ) exp(T (x)ζ(θ))h(x) and
V (θ; x) = ∂∂θ ln p(x ; θ) = A′(θ)
A(θ) + ζ ′(θ)T (x). Since V (θ; x) is a linear
function of T (x), Corθ(V (θ; X ),T (X ))2 = 1. From the proof of
Theorem 4.2, Covθ(V (θ; X ),T (X )) = g ′(θ) and thus[g ′(θ)]2
Varθ(T (X ))Varθ(V (θ;X )) = 1. Varθ(V (θ; X )) = IX (θ) implies that
Varθ(T (X )) = [g ′(θ)]2
IX (θ) , meaning that T attains the C-R bound.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 23 / 25
Unbiasedness and MSE: Cramer-Rao lower bound VII
Special case 4.15 (Normal distribution, known mean)
Xi ∼ N(µ0, σ2), where µ0 is known.
1-parameter exponential family with T (x) =∑
i (xi − µ0)2 and
ζ(σ2) = − 12σ2 .
IX (µ) = n2σ4 and the variance of σ2 = 1
n
∑i (Xi − µ0)2 is 2σ4/n.
Special case 4.16 (Uniform distribution)
Xi ∼ U[0, θ]
T (X ) = n+1n X(n) is an unbiased estimator with variance θ2
n(n+2) .
The order of the variance is n−2, but this is not a contradiction to the
C-R bound (of n−1 order) because the uniform distributions do not
form a regular model.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 24 / 25
Unbiasedness and MSE: Cramer-Rao lower bound VIII
Special case 4.17 (Exponential distribution)
Xi ∼ Exp(λ), λ > 0.
γ(λ) = 1/λ
X is an unbiased estimator of γ with variance 1nλ2 .
IX (λ) = nλ2 , g ′(λ) = − 1
λ2 , and λ−4
nλ−2 = 1nλ2 .
X is an efficient estimator.
MLE’s are not necessarily unbiased. But, as n→∞, the variance of
their limiting distribution is the lower bound. So MLE’s are under
some mild regularity conditions asymptotically efficient.
박창이 (서울시립대학교 통계학과) Liero Chapter 4 Estimation I 25 / 25