Mathematical Statistics, Fall 2015 Chapter 7: Sufficiency and Comparison Byeong U. Park Department of Statistics, Seoul National University

Chapter 7: Sufficiency and Comparison

Mathematical Statistics, Fall 2015

Chapter 7: Sufficiency and Comparison

Byeong U. Park

Department of Statistics, Seoul National University

7.1 Optimality

7.2 Sufficiency and Completeness

7.3 Exponential Family

Comparison of Estimators

Let X1, . . . , Xn be a random sample from a population with pdf

f(·; θ), θ ∈ Θ ⊂ Rd. Below, we write X = (X1, . . . , Xn). How to compare

different estimators of a parameter of interest, say g(θ) ∈ R?

I Loss function: L(·, ·) : Θ×A → R+, where A ⊂ R is an action space for

the estimation of g(θ) such that L(θ, g(θ)) = 0. For example,

L(θ, a) = (a− g(θ))2, L(θ, a) = |a− g(θ)|.

I Risk function: For an estimator δ(X) of g(θ),

R(θ, δ) = EθL(θ, δ(X)).

In the case of the squared error loss, the risk R(θ, δ) is the mean squared

error of δ(X), i.e., R(θ, δ) = Eθ(δ(X)− g(θ))2.

Difficulty with Uniform Comparison

I One would prefer δ1 to δ2 if

R(θ, δ1) ≤ R(θ, δ2) for all θ ∈ Θ, and R(θ, δ1) < R(θ, δ2) for some θ ∈ Θ.

I The difficulty is that there exists no estimator that is best in this sense,

i.e., there exists no δ0 such that

R(θ, δ0) ≤ R(θ, δ) for all δ and for all θ ∈ Θ.

I Why? If it exists, then R(θ, δ0)θ∈Θ≡ 0, which leads to a contradiction.

Optimal Estimation

I Restricted class of estimators: One may find an estimator δ0 in the class

of unbiased estimators of g(θ) such that

Eθ(δ0(X)− g(θ))2 ≤ Eθ(δ(X)− g(θ))2 for all θ ∈ Θ

for any unbiased estimator δ(X). If exists, such an estimator is called

UMVUE (Uniformly Minimum Variance Unbiased Estimator).

I Global measures of performance: The estimator that minimizes the

maximum risk r(δ) = maxθ∈Θ R(θ, δ) is called a minimax estimator. The

estimator that minimizes the average risk

r(δ;π) =


R(θ, δ)π(θ) dµ(θ)

for a weight function π is called a Bayes estimator.

Example: Minimax Estimation

Let X1, . . . , Xn be a random sample from N(µ, σ2), θ = (µ, σ2) ∈ R× R+.

Consider the loss function L(θ, a) = (a/σ2 − 1)2 for the estimation of

g(θ) = σ2. Suppose we want to find an estimator that minimizes the maximum

risk among all estimators in the class{δc : δc(x) = c

∑ni=1(xi − x)2, c > 0


I Let S2 =∑ni=1(Xi − X)2. Since S2/σ2 d

= χ2(n− 1), it follows that

R(θ, δc) = Eθ(cS2/σ2 − 1)2

= c2Var(χ2(n− 1)

)+(cE(χ2(n− 1))− 1

)2= c2 · 2(n− 1) +

(c(n− 1)− 1

)2θ≡ max

θR(θ, δc).

I Since arg minc>0 maxθ R(θ, δc) = 1/(n+ 1), we get

σ2 =∑ni=1(Xi − X)2/(n+ 1) as the minimax estimator among the class.

7.1 Optimality

7.2 Sufficiency and Completeness

7.3 Exponential Family

The Idea of Sufficiency

Suppose that we observe X1 and X2 that are i.i.d. Bernoulli(θ) random

variables, where 0 < θ < 1.

I The distribution of Y = X1 +X2:

Pθ(Y = 0) = (1− θ)2, Pθ(Y = 1) = 2θ(1− θ), Pθ(Y = 2) = θ2.

I When we observe Y , we may produce (X∗1 , X∗2 ), without knowledge of the

true θ, that has the same distribution of the original (X1, X2), as follows:

(i) Put (X∗1 , X∗2 ) = (0, 0) when Y = 0; (ii) put (X∗1 , X

∗2 ) = (1, 1) when

Y = 2; (iii) conduct a randomized experiment and put (X∗1 , X∗2 ) = (1, 0)

and (X∗1 , X∗2 ) = (0, 1), each with probability 1/2 when Y = 1.

I Thus, for any estimator δ(X1, X2) of g(θ) one may find an estimator that

depends only on Y rather than (X1, X2) but has the same risk as

δ(X1, X2).

Sufficient Statistic

Let X1, . . . , Xn be a random sample from a population with pdf

f(·; θ), θ ∈ Θ ⊂ Rd. We write X = (X1, . . . , Xn) below.

I Sufficient statistic for θ ∈ Θ: A statistic Y = u(X) is called a sufficient

statistic if the conditional distribution of X given Y does not depend on

θ ∈ Θ, i.e., if

Pθ1(X ∈ A|Y = y)θ1,θ2∈Θ≡ Pθ2(X ∈ A|Y = y)

for all A and for all y.

I This definition is more general than those in the textbook.

I Read Remark 7.2.1 on p.391.

Factorization Theorem

A statistic Y = u(X) is a sufficient statistic for θ ∈ Θ if and only if there exist

functions f1 and f2 such that


f(xi; θ) = f1(u(x), θ) · f2(x) for all x and for all θ ∈ Θ.

Proof: We give a proof for the case of discrete Xi. For the necessity part,


f(xi; θ) = Pθ(X = x, Y = u(x)

)= P

(X = x|Y = u(x)

)· Pθ

(Y = u(x)


The first component on the RHS of the second equation does not depend on

θ ∈ Θ. For the sufficiency part,

Pθ(X = x|Y = y) =Pθ(X = x, u(X) = y)

Pθ(u(X) = y)=Pθ(X = x)I(u(x) = y)∑

z:u(z)=y Pθ(X = z)

=f2(x)I(u(x) = y)∑

z:u(z)=y f2(z).

Sufficient Statistic: Examples

I Bernoulli(θ), θ ∈ (0, 1):


f(xi; θ) =n∏i=1

θxi(1− θ)1−xi = θ∑xi(1− θ)n−

∑xi ,

so that Y =∑ni=1 Xi is a sufficient statistic for θ ∈ (0, 1)

I Gamma(2, θ), θ > 0:n∏i=1

f(xi; θ) =


θ−2 · xi · e−xi/θ · I(0,∞)(xi)

= θ−2n exp







so that Y =∑ni=1 Xi is a sufficient statistic for θ > 0.

Sufficient Statistic: Examples

I Gamma(α, β), θ = (α, β) ∈ R+ × R+:n∏i=1

f(xi; θ) =n∏i=1

Γ(α)−1β−2 · xα−1i · e−xi/β · I(0,∞)(xi)

= Γ(α)−nβ−2n









so that Y = (∏ni=1 Xi,

∑ni=1 Xi) is a sufficient statistic for θ ∈ R+ × R+.

I Uniform(θ1 − θ2, θ1 + θ2), θ = (θ1, θ2) ∈ R× R+: Assume n ≥ 2.


f(xi; θ) = (2θ2)−nI(θ1−θ2,∞)(x(1))I(−∞,θ1+θ2)(x(n)),

so that Y = (X(1), X(n)) is a sufficient statistic for θ ∈ R× R+.

I Exp(θ, 1), θ ∈ R: Y = X(1) is a sufficient statistic for θ ∈ R.

Minimal Sufficient Statistic

I There are many sufficient statistics for a given model. As an extreme

example, X = (X1, . . . , Xn) itself is also a sufficient statistic. As another

example, consider the case where X1, X2, X3 are i.i.d. Bernoulli(θ)

random variables and θ ∈ (0, 1). For latter model, sufficient statistics

include (i) (X1, X2, X3); (ii) (X1 +X2, X3); (iii) (X1 +X3, X2); (iv)

(X1, X2 +X3); (v) X1 +X2 +X3.

I Minimal sufficient statistic: A sufficient statistic is called a minimal

sufficient statistic (MSS) if it is a function of any sufficient statistic.

Properties of Sufficient Statistic

I Any statistic that is a 1-1 function of a sufficient statistic for θ ∈ Θ is also

a sufficient statistic for θ ∈ Θ.

Proof: Let Y be a sufficient statistic for θ ∈ Θ and let W = g(Y ) for a

known 1-1 function g. Then, the event (W = w) equals (Y = g−1(w)) for

any w, so that for any A, w and θ ∈ Θ it holds that

Pθ(X ∈ A|W = w) = P (X ∈ A|Y = g−1(w)),

the latter not depending on θ ∈ Θ.

I Any statistic that is a 1-1 function of MSS is also an MSS.

I MLE and SS: The unique MLE of θ, when it exists, is a function of any

sufficient statistic for θ ∈ Θ.

Properties of Sufficient Statistic

Proof: Let u(X) be a sufficient statistic for θ ∈ Θ. Then, for the unique

MLE θ,

θ = arg maxθ∈Θ


f(xi; θ)

= arg maxθ∈Θ


(θ, u(x)


= arg maxθ∈Θ


(θ, u(x)

)= ( function of u(x)).

I Existence of MSS: If the MLE of θ ∈ Θ is unique and it is a sufficient

statistic for θ ∈ Θ, then it is a minimal sufficient statistic for θ ∈ Θ.

Proof: Immediate from the second property.

I Suppose that there exists θ0 ∈ Θ such that supp(f(·; θ)) ⊂ supp(f(·; θ0))

for all θ ∈ Θ. Then, the statistic T (X), as a (random) function defined on

Θ in such a way that T (X)(θ) =∏ni=1

(f(Xi, θ)

/f(Xi; θ0)

), is an MSS.

MSS: An Example

Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform(θ1 − θ2, θ1 + θ2),

θ = (θ1, θ2) ∈ R× R+. For this model, Y = (X(1), X(n)) is an MSS.

Proof: The sufficiency was established before. Let the model is reparametrized

by η = (η1, η2), where η1 = θ1 − θ2 and η2 = θ1 + θ2. Then,


f(xi; η) = (η2 − η1)−nI(−∞,x(1)](η1)I[x(n),∞)(η2).

Clearly, (η1, η2) = (X(1), X(n)) is the unique MLE, so that (θ1, θ2) defined by

θ1 = (X(1) +X(n))/2, θ2 = (X(n) −X(1))/2

is the unique MLE of (θ1, θ2) =((η1 + η2)/2, (η2 − η1)/2

). Since (θ1, θ2) is a

1-1 function of the SS (X(1), X(n)), it is also an SS and thus an MSS. This

establishes that (X(1), X(n)) is an MSS since it is a 1-1 function of (θ1, θ2).

Rao-Blackwell Theorem

I Rao-Blackwell theorem: Let X1, . . . , Xn be a random sample from a

population with pdf f(·; θ), θ ∈ Θ ⊂ Rd. Let Y = u(X) be a sufficient

statistic. Then, for any estimator of η(X) of η = g(θ) with finite second


η∗(Y ) = E(η(X)|Y )

is a statistic with the properties that (i) Eθ(η∗) = Eθ(η); (ii)

varθ(η∗) ≤ varθ(η); (iii) MSEθ(η

∗) ≤ MSEθ(η).

I The Rao-Blackwell theorem tells that one may have a better estimator by

conditioning on a sufficient statistic, in terms of MSE. By the theorem, if

η is an unbiased estimator of η, then η∗ is also an unbiased estimator but

with a smaller variance.

I Proof of the theorem: var(W ) = var(E(W |V )) + E(var(W |V )).

Rao-Blackwell theorem

space of random variables

space of functions of V


E(W )

space of constants

E(W |V )

Example: Rao-Blackwellization

Let X1, . . . , Xn (n ≥ 2) be a random sample from U [0, θ], θ > 0. Take

θ = 2X as an unbiased estimator of θ. We know that X(n) is a sufficient

statistic for θ > 0. By Rao-Blackwell theorem, θ∗ ≡ E(2X|X(n)) is an UE of θ

with variance less than or equal to that of θ. For 1 ≤ r ≤ n− 1, we note that

pdfX(r)|X(n)(x|y) =

(n− 1)!

(r − 1)!(n− r − 1)!



)r−1(1− x




and that, recalling the pdf of Beta(r + 1, n− r + 1),∫ y


x · (n− 1)!

(r − 1)!(n− r − 1)!



)r−1(1− x






∫ 1



r!(n− r − 1)!tr(1− t)n−r−1 dt

= (r/n)y.

Example: Rao-Blackwellization

Thus, we get

2E(X|X(n) = y) = 2n−1

(y +


E(X(r)|X(n) = y)


= 2n−1

(y +



n· y


=n+ 1


Indeed, θ∗ = (n+ 1)X(n)/n and

varθ(θ∗) =

(n+ 1



· varθ(X(n))


n(n+ 2)θ2


3nθ2 = varθ(θ).

Uniformly Minimum Variance Unbiased Estimator

We have seen that taking conditional expectation, on a sufficient statistic, of a

given unbiased estimator always improves the estimator in terms of variance.

I UMVUE: An estimator η of η = g(θ) is called the uniformly minimum

variance unbiased estimator if it itself is unbiased and varθ(η) ≤ varθ(η)

for all θ ∈ Θ and for any unbiased estimator η of η.

I Uniqueness of UMVUE: If there exists an unbiased estimator with finite

variance, then UMVUE is unique: Let η1 and η2 be UMVUE of η. Since

varθ(η1) = varθ(η2) ≤ varθ((η1 + η2)/2) <∞ for all θ ∈ Θ, we get

varθ(η1 − η2) ≤ 0 (and thus = 0) for all θ ∈ Θ.

This implies Pθ(η1 − η2 = c) = 1 for all θ ∈ Θ for some constant c. That

constant equals zero since both η1 and η2 are unbiased.

Complete Statistic

Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),

θ ∈ Θ ⊂ Rd. The following notion of completeness facilitates the derivation of


I Complete statistic for θ ∈ Θ: A statistic Y = u(X) is called a complete

statistic for θ ∈ Θ and {pdfY (·; θ) : θ ∈ Θ} is called a complete family of

distributions if

Eθ ϕ(Y ) = 0 for all θ ∈ Θ implies Pθ(ϕ(Y ) = 0) = 1 for all θ ∈ Θ.

I A complete statistic Y is “complete” in the sense that any non-constant

function of Y has a non-constant expected value (as a function of θ).

I Complete sufficient statistic: A statistic is called a complete sufficient

statistic (CSS) for θ ∈ Θ if it is sufficient and complete for θ ∈ Θ.

Rao-Blackwell-Lehmann-Scheffe Theorem

Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),

θ ∈ Θ ⊂ Rd. Let Y = u(X) be a CSS for θ ∈ Θ. Assume that there exists an

unbiased estimator with finite variance. Then, (i) for any unbiased estimator η0

of η = g(θ) with finite variance, η = E(η0|Y ) is the UMVUE; (ii) any function

of Y , say ϕ(Y ), is the UMVUE if it is unbiased.

Proof: (i) Clearly, η is an UE. For an arbitrary unbiased estimator η with finite


varθ(η) ≥ varθ(E(η|Y ))θ≡ varθ(η),

the inequality holding due to the Rao-Blackwell theorem and the equality

holding due to the completeness of Y [Pθ(η = E(η|Y ))θ≡ 1].

(ii) Replace η by ϕ(Y ) in the proof of (i).

Remark: UE that is a function of a CSS is unique and it is the UMVUE.

Methods of Finding UMVUE

I Method 1: Rao-Blackwellization with a CSS.

(1) Find a CSS Y = u(X).

(2) Find an ‘easy’ UE η0.

(3) Compute the conditional expectation E(η0|Y ).

I Method 2: Trial and error.

(1) Find a CSS Y = u(X).

(2) Solve Eθ ϕ(Y )θ≡ g(θ) with respect to ϕ,

or try some ϕ(Y ) and check unbiasedness.

CSS and UMVUE: Bernoulli Model

Let X1, . . . , Xn be a random sample from Bernoulli(θ), θ ∈ [0, 1].

I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ ∈ [0, 1], suppose that Eθ ϕ(Y ) = 0

for all θ ∈ [0, 1] for a function ϕ. Then, since Yd= Binomial(n, θ), we have





)θy(1− θ)n−y θ≡ 0






λ>0≡ 0

⇒ ϕ(y)



)y≡ 0 (uniqueness of polynomial coefficients)

⇒ ϕ(y)y≡ 0.

I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is

the UMVUE of θ.

CSS and UMVUE: Bernoulli Model

I Estimation of the variance (Method 1): For the estimation of

η = θ(1− θ), consider η0 = X1(1−X2) as an UE of η. Then, we get

E(η0|Y = y) = P(X1 = 1, X2 = 0

∣∣∣ n∑i=1

Xi = y)


(n− 2

y − 1


)= y(n− y)/n(n− 1),

so that η = nX(1− X)/(n− 1) is the UMVUE.

I Estimation of the variance (Method 2): Try the MLE of η:

ηMLE = X(1− X). We get

Eθ(X(1− X)) = EθX −[var(X) +


)2]= (n− 1)θ(1− θ)/n.

Thus, η = nX(1− X)/(n− 1) is an UE that is a function of CSS,

concluding that it is the UMVUE.

CSS and UMVUE: Poisson Model

Let X1, . . . , Xn be a random sample from Poisson(θ), θ > 0.

I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for

all θ > 0 for a function ϕ. Then, since Yd= Poisson(nθ), we have




θ>0≡ 0




λ>0≡ 0

⇒ ϕ(y)


y≡ 0 (uniqueness of power series coefficients)

⇒ ϕ(y)y≡ 0.

I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is

the UMVUE of θ.

CSS and UMVUE: Poisson Model

I Estimation of η = e−2θ: We solve Eθ ϕ(Y )θ≡ e−2θ with respect to ϕ.

Note that∞∑y=0



θ≡ e−2θ




n− 2

)y((n− 2)θ)y


θ≡ e(n−2)θ




n− 2

)y((n− 2)θ)y



((n− 2)θ)y


This gives ϕ(y) = (1− 2/n)y, so that the UMVUE of η is given by

η =

(n− 2



I Remark: As n ↑ ∞, the UMVUE η is approximated by the MLE e−2X , so

one may expect that the UMVUE would behave nicely for large n.

CSS and UMVUE: Exponential Model

Let X1, . . . , Xn be a random sample from Exponential(θ), θ > 0 with pdf

f(x1; θ) = θ−1e−x1/θ. By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all

θ > 0 for a function ϕ. Then, since Yd= Gamma(n, θ), we have∫ ∞



Γ(n)θne−y/θ dy

θ>0≡ 0

⇒∫ ∞


ϕ(y) yn−1e−λy dyλ>0≡ 0

⇒ ϕ(y) yn−1 y>0≡ 0 (uniqueness of Laplace transform)

⇒ ϕ(y)y>0≡ 0.

Since Eθ(Y/n)θ≡ θ, we conclude that η = X is the UMVUE of θ.

CSS and UMVUE: Uniform[0, θ] Model

Let X1, . . . , Xn be a random sample from Uniform[0, θ], θ > 0. By

Factorization theorem, it can be verified that Y = X(n) is an SS. To check

whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all θ > 0

for a function ϕ. Then, since the pdf of Y is given by

pdfY (y; θ) = nθ−nyn−1I[0,θ](y), we have∫ θ


ϕ(y)nθ−nyn−1 dyθ>0≡ 0

⇒∫ θ


ϕ(y) yn−1 dyθ>0≡ 0

⇒ ϕ(θ) θn−1 θ>0≡ 0 (Fundamental Theorem of Calculus)

⇒ ϕ(y)y>0≡ 0.

Since Eθ(Y )θ≡ nθ/(n+ 1), we conclude that η = (n+ 1)X(n)/n is the

UMVUE of θ.

CSS and UMVUE: Uniform[−θ, θ] Model

Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform[−θ, θ], θ > 0.

I By Factorization theorem, it can be verified that (X(1), X(n)) is an SS.

But, it is not a complete statistic for θ > 0. One may find a non-trivial

function of (X(1), X(n)) such that Eθ ϕ(X(1), X(n)) equals identically zero.

For example, ϕ(X(1), X(n)) = X(n)/X(1) − an, where an = E(U(n)/U(1))

for a random sample (Ui : 1 ≤ i ≤ n) from Uniform[−1, 1].

I In fact, for this model, Y = max1≤i≤n |Xi| is a CSS and

η = (n+ 1) max1≤i≤n |Xi|/n is the UMVUE of θ. To see this, note that

the joint density of X at x equals (2θ)−nI[0,θ](max1≤i≤n |xi|), which tells

that Y is an SS. Since |Xi| are i.i.d. Uniform[0, θ], one may then show

that Y is also complete and thus η is the UMVUE.

Ancillary Statistic

Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),

θ ∈ Θ ⊂ Rd.

I Ancillary statistic: A statistic Z = v(X) is called an ancillary statistic for

θ ∈ Θ if Pθ(Z ∈ A) does not depend on θ ∈ Θ for all A, i.e.,

Pθ1(Z ∈ A)θ1,θ2∈Θ≡ Pθ2(Z ∈ A)

for all A.

I Basu’s Theorem (Independence of CSS and AS): If Y = u(X) is a CSS

and Z = v(X) is an AS for θ ∈ Θ, then Y and Z are independent under

Pθ for all θ ∈ Θ, i.e.,

Pθ(Y ∈ A,Z ∈ B)A,B≡ Pθ(Y ∈ A)P (Z ∈ B) for all θ ∈ Θ.

Proof of Basu’s Theorem

Let A and B be arbitrary sets. Since Y is a sufficient statistic for θ ∈ Θ,

ϕ(y) ≡ P (Z ∈ B|Y = y) does not depend on θ ∈ Θ for all y. Also, we note

that Eθ(ϕ(Y )) = P (Z ∈ B), which does not depend on θ either because of the

ancillarity of Z. Thus,

Eθ (ϕ(Y )− P (Z ∈ B))θ≡ 0.

Due to the completeness of Y , this entails ϕ(y)y≡ P (Z ∈ B), so that

Pθ(Y ∈ A,Z ∈ B) =


P (Z ∈ B|Y = y) · pdfY (y; θ) dµ(y)

= P (Z ∈ B) · Pθ(Y ∈ A)

for all θ ∈ Θ.

Ancillary Statistic: Examples

I N(θ, 1), θ ∈ R:

(X1 − X, . . . , Xn − X)d≡ (Z1 − Z, . . . , Zn − Z)

for Zi being i.i.d. from N(0, 1).

I Exp(θ, 1), θ ∈ R:

(X1 −X(1), . . . , Xn −X(n))d≡ (Z1 − Z(1), . . . , Zn − Z(n))

for Zi being i.i.d. from Exp(0, 1).

I Gamma(α, β), β > 0 with α known:(X1∑n+1i=1 Xi

, . . . ,Xn∑n+1i=1 Xi


(Z1∑n+1i=1 Zi

, . . . ,Zn∑n+1i=1 Zi


for Zi being i.i.d. from Gamma(α, 1).

Ancillary Statistic: Examples

I Uniform(0, θ), θ > 0: X(n)/X(1)d≡ Z(n)/Z(1) for Zi being i.i.d. from

Uniform(0, 1).

I N(µ, σ2), (µ, σ2) ∈ R× R+:(X1 − XSX

, . . . ,Xn − XSX

)d≡(Z1 − ZSZ

, . . . ,Zn − ZSZ

)for Zi being i.i.d. from N(0, 1), where S2

X =∑ni=1(Xi − X)2/(n− 1) and

S2Z =

∑ni=1(Zi − Z)2/(n− 1).

I Exp(µ, σ), (µ, σ) ∈ R× R+:

X1 −X(1)∑ni=1(Xi −X(1))

d≡Z1 − Z(1)∑ni=1(Zi − Z(1))

for Zi being i.i.d. from Exp(0, 1).

Basu’s Theorem: Examples

I Let X1, . . . , Xn be a random sample from Exp(µ, σ), µ ∈ R, σ ∈ R+.

Then, X(1) and∑ni=1(Xi −X(1)) are independent.

Proof: Fix σ = σ0. Then, X(1) is a CSS for µ ∈ R and∑ni=1(Xi −X(1))

is an AS for µ ∈ R. Thus, they are independent under Pµ,σ0 for all µ ∈ R.

Since this holds for any choice of σ0, they are independent under the

entire model.

I Let X1, . . . , Xn+1 be a random sample from Gamma(α, β), α > 0, β > 0.

Then,∑n+1i=1 Xi is independent of

Z ≡(


X1 + · · ·+Xn+1, . . . ,

XnX1 + · · ·+Xn+1


Proof: Fix α = α0. Then,∑n+1i=1 Xi is a CSS for β > 0 (Check this!).

Since Z is an AS for β > 0, the two statistics are independent under Pα0,β

for all β > 0, and thus under Pα,β for all α > 0 and β > 0.

Use Of Ancillary Statistic To Find UMVUE

Let X1, . . . , Xn be a random sample from Exp(θ), θ > 0. For a given a > 0 we

want to find the UMVUE of (the system reliability) η = Pθ(X1 > a) = e−a/θ.

We have seen that Y =∑ni=1 Xi is a CSS for θ > 0. An easy UE of η is

η0 = I(a,∞)(X1). Thus, η = E(η0|Y ) is the UMVUE. Now, we note that

X1/Yd= Beta(1, n− 1) is an AS, so that it is independent of Y . From this we

get that

E(η0 |Y = y) = P (X1 > a |Y = y)

= P (X1/Y > a/y |Y = y) = P (X1/Y > a/y)


∫ 1



Γ(1)Γ(n− 1)z0(1− z)n−2 dz

= (1− a/y)n−1I(a,∞)(y).

Thus, η = (1− a/Y )n−1I(a,∞)(Y ) is the UMVUE.

7.1 Optimality

7.2 Sufficiency and Completeness

7.3 Exponential Family

Exponential Family

A family of distributions {f(·; θ) : θ ∈ Θ} for Θ ⊂ Rd is called exponential

family if

(i) the support of the density f(·; θ) does not depend on θ ∈ Θ;

(ii) the density has the following form:

f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x)

for some known functions η = (η1, . . . , ηk)>, T = (T1, . . . , Tk)>, B and h.

An exponential family is called k-parameter regular exponential family if

(iii) η(Θ) ≡ {η(θ) : θ ∈ Θ} ⊂ Rk contains a k-dimensional open rectangle.

Canonical Form: Reparametrization by Natural Parameter

I The d-variate real valued function B depends on θ only through η ≡ η(θ),

i.e., B(θ) = A(η(θ)) for some k-variate real valued function A:

1 =


exp(η(θ)>T (x)−B(θ))h(x) dµ(x)

= e−B(θ)


exp(η(θ)>T (x))h(x) dµ(x)

let= e−B(θ) · eA(η(θ)).

I This means that the density f(·; θ) actually depends on η:

f(x; η) = exp(η>T (x)−A(η))h(x).

The parameter η is called a natural parameter.

I Natural parameter space: The maximum possible set of η is given by{η ∈ Rk :


exp(η>T (x))h(x) dµ(x) <∞}.

Page 41: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


I Bernoulli(θ), 0 < θ < 1: For x ∈ {0, 1},

f(x; θ) = exp

(x log

1− θ

)+ log(1− θ)

)= exp(ηx−A(η)),

where η = log(θ/(1− θ)) and A(η) = log(1 + eη). Thus, it is a

1-parameter regular exponential family.

I N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+: For x ∈ R,

f(x; θ) = exp

(− 1

2σ2x2 +


σ2x− µ2

2σ2− 1


)= exp (η1T1(x) + η2T2(x)−A(η1, η2)) · (2π)−1/2,

where η1 = −1/(2σ2), η2 = µ/σ2, A(η1, η2) = −η1η22/4 + log(−η1/2)/2

and T1(x) = x2, T2(x) = x. Thus, it is a 2-parameter regular exponential


Exponential Family: Examples

I Two independent normal populations with a common mean: Let X and Y

be independent and have N(µ, σ21) and N(µ, σ2

2) distributions,

respectively, for θ ≡ (µ, σ21 , σ

22) ∈ R× R2

+. In this case, for (x, y) ∈ R2,

f(x, y; θ) = exp

(− 1


x2 +µ


x− 1


y2 +µ



− µ2


− µ2


− 1


1)− 1



)= exp (η1T1(x, y) + η2T2(x, y)x+ η3T3(x, y)

+η4T4(x, y)−A(η1, . . . , η4)) · h(x, y),

where η1 = −1/(2σ21), η3 = −1/(2σ2

2), η2 = µ/σ21 and η4 = µ/σ2

2 . In

fact, this is not a 4-parameter regular exponential family since

η4 = (η3/η1) · η2.

Random Sample from Exponential Family

Let X1, . . . , Xn be a random sample from an exponential family of pdf’s

f(·, θ), θ ∈ Θ with f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x), where η(θ) is a

k-vector. Let X denote the common support of f(·; θ). Then,

(1) the joint densities of (X1, . . . , Xn) also form an exponential family with

Xn as the common support and


f(xi; θ) = exp



T (xi)− nB(θ)



as the joint density;

(2) If η(Θ) contains a k-dimensional open rectangle, then Y =∑ni=1 T (Xi) is

a CSS for θ ∈ Θ.

Proof of (2)

A proof is given only for the case where Xi are discrete random variables.

Clearly, Y is an SS. Let η = η(θ). Then, with∑∗y denoting the sum over all

(x1, . . . , xn) with∑ni=1 T (xi) = y, we get

pdfY (y; η) =∑∗y exp



T (xi)− nA(η)



= exp(η>y − nA(η)




let= exp

(η>y −A∗(η)


Thus, it follows that Eθ ϕ(Y )θ≡ 0 implies∑

all y

ϕ(y)h∗(y) · eη>y η≡ 0

⇒ ϕ(y) = 0 for all y with h∗(y) > 0,

where the second implication is from the uniqueness of Laplace transform.

MGF/CGF of T (X) in Exponential Family

Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume

f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional

open rectangle. Then,

(3) the cumulant generating function of T (X) is given by

cgfT (X)(u; η) ≡ logEηeu>T (X) = A(η + u)−A(η)

for all η ∈ Int(N );

(4) the mean and variance of T (X) under Pη with η ∈ Int(N ) are then given


Eη T (X) = A(η), varη(T (X)) = A(η).

Page 46: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


Proofs of (3) and (4)

To prove (3), let η ∈ Int(N ). We can find a small ε such that η + u for all u

with ‖u‖ ≤ ε belong to N . For such u, we get

Eηeu>T (X) =

∫e(η+u)>T (x)−A(η)h(x) dµ(x)

= eA(η+u)−A(η)

∫e(η+u)>T (x)−A(η+u)h(x) dµ(x)

= eA(η+u)−A(η).

The fact (4) is immediate from the fact that

Eη T (X) =∂

∂ucgfT (X)(u; η)



varη(T (X)) =∂2

∂uu>cgfT (X)(u; η)



Differentiation Under Integral Sign

Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume

f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional

open rectangle. Suppose that Eη φ(X) exists for all η ∈ N . Then,

(5) Eη φ(X) as a function of η is infinitely many times differentiable at

η ∈ Int(N ), and the differentiation can be made under the integral sign.

For example,

∂ηEη φ(X) =


∂ηf(x; η) dµ(x).

Remark: Applying the theorem to φ ≡ 1 gives

0 = Eη(T (X)− A(η)

)and 0 = Eη

((T (X)− A(η))2 − A(η)


so that Eη T (X) = A(η) and varη(T (X)) = A(η).

MLE and Exponential Family

Let X1, . . . , Xn be a random sample from an exponential family of pdf’s

f(·, η), η ∈ N ⊂ Rk with f(x; η) = exp(η>T (x)−A(η)) · h(x). Assume that

N contains a k-dimensional open rectangle. Then,

(6) (i) the log-likelihood is strictly concave, and the unique MLE of η is

determined by the likelihood equation


T (xi) = A(η),

provided that it has a solution η ∈ N ;

(ii) the Fisher information is given by

I1(η) = A(η).

Proof of (6): Positive Definiteness of A(η)

Suppose that there exists c ≡ (c1, . . . , ck)> 6= 0 in Rk such that c>A(η0)c = 0

for some η0 ∈ N . Then, Pη0(c>T (X) = c>Eη0T (X)) = 1. This implies that

there exists a constant c0 such that

c>T (x) = c0 a.e. [µ] for x ∈ X = {x : h(x) > 0}.

Without loss of generality, assume c1 6= 0. Then,

T1(x) = c0/c1 − (c2/c1)T2 − · · · − (ck/c1)Tk

a.e. [µ] for x ∈ X = {x : h(x) > 0}, which gives that, for all η ∈ Rk,

η>T (x) = η>M>T−1(x) + (c0/c1)η1

a.e. [µ] for x ∈ X = {x : h(x) > 0},

Page 50: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


Proof of (6): Continued


M> =

−(c2/c1) · · · −(ck/c1)

1 · · · 0

.... . .


0 · · · 1

, T−1(x) =






Thus, it holds that for a.e. [µ]

f(x; η) = exp(

(Mη)>T−1(x) + (c0/c1)η1 −A(η))· h(x).

This implies

(c0/c1)η1 −A(η) = − log


exp((Mη)>T−1(x))h(x) dµ(x)let= −B(Mη)

so that f(x; η) = exp[(Mη)>T−1(x)−B(Mη)] · h(x). Clearly, η is not

identifiable. Contradiction.

Multinomial Experiments

Let Xi = (Xi,1, · · · , Xi,k−1)> be i.i.d. Multinomial(1, p),

p ≡ (p1, . . . , pk−1)>, pj > 0, p1 + · · ·+ pk−1 < 1.

I Let pk = 1− p1 − · · · − pk−1. Then, the common density of Xi is given by

f(x; p) = exp(x1 log(p1/pk) + · · ·+ xk−1 log(pk−1/pk) + log pk


so that the distributions of Xi form a (k − 1)-parameter regular

exponential family.

I Y =∑ni=1 Xi = (

∑ni=1 Xi,1, . . . ,

∑ni=1 Xi,k−1)> is a CSS for p.

I MLE of η: The MLE of η ≡ (log(p1/pk), . . . , log(pk−1/pk))>let= h(p)

solves the equation

Y/n = EηX1, i.e., Y/n = h−1(η).

Thus, the MLE of η is given by η = h(Y/n).

Page 52: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


Multinomial Experiments

I MLE of p: The MLE of p is then p = h−1(η) = Y/n.

I UMVUE of p: Y/n is an UE of p and is a function of the CSS Y , so that

p = Y/n is also the UMVUE of p.

I UMVUE of Σ ≡ diag(p)− pp>: Here, an estimator Σ of Σ is called the

UMVUE of Σ if varp(ΣUE)− varp(Σ) is nonnegative definite for all ΣUE

and for all p, with Σ and ΣUE being the vectorized versions. Note that

ΣMLE = diag(Y/n)− (Y/n)(Y/n)>.

Computing the expected value of ΣMLE, we get

Ep(ΣMLE) = diag(p)− varp(Y/n)− Ep(Y/n)Ep(Y/n)>

= diag(p)− n−1Σ− pp> = (1− 1/n)Σ.

Thus, ΣUMVUE = n · ΣMLE/(n− 1).

Multivariate Normal Population

Let Xi = (Xi,1, · · · , Xi,k)> (n ≥ 2) be i.i.d. Normal(µ,Σ), µ ∈ Rk and Σ in

the set of k × k positive definite matrices.

I With θ ≡ (µ,Σ) ∈ Rd for d = k + k(k + 1)/2,

f(x; θ) = det(2πΣ)−1/2 exp(−(x− µ)tΣ−1(x− µ)/2

)= exp

(−tr(Σ−1xx>)/2 + µ>Σ−1x

−µ>Σ−1µ/2− log(detπΣ)).

I Y =(∑n

i=1 Xi,∑ni=1 XiX


)is a CSS for θ, where

∑ni=1 XiX

>i is

understood to be a k(k + 1)/2-vector.

I MLE of η ≡ (Σ−1µ, Σ−1): It is the solution of

Y/n = Eη(X1, X1X>1 ), i.e., Y/n = (µ,Σ + µµ>)

let= g(µ,Σ).

Let h(µ,Σ) = (Σ−1µ,Σ−1). Then, ηMLE = h ◦ g−1(Y/n).

Page 54: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


Multivariate Normal Population

I MLE of µ and Σ: The MLE of (µ,Σ) = h−1(η) is then

(µMLE, ΣMLE) = h−1 ◦ h ◦ g−1(Y/n) = g−1(Y/n).

By the definition of the function g : Rd → Rd, this means


Xi = µMLE and n−1n∑i=1

XiX>i = ΣMLE + µMLEµMLE>,

so that µMLE = X and ΣMLE = n−1∑ni=1(Xi − X)(Xi − X)>.

I UMVUE of µ and Σ: Since(X, (n− 1)−1∑n

i=1(Xi − X)(Xi − X)>)


a 1-1 function of Y , it is also a CSS for (µ,Σ). Since X and

(n− 1)−1∑ni=1(Xi − X)(Xi − X)> are UEs of µ and Σ, respectively,

they are the UMVUE of the respective parameters.

UMVUE of Normal Probabilities

Let X1, . . . , Xn (n ≥ 2) be a random sample from N(θ, 1), θ ∈ R. We want to

find the UMVUE of the normal probabilities η = Pθ(X1 ≤ c) = Φ(c− θ). Note

that X is a CSS for this model. An easy UE of η is I(−∞,c](X1). Thus, by

R-B-L-S Theorem, the UMVUE of η is given by

η = E(I(−∞,c](X1) | X) = P (X1 ≤ c | X). We compute P (X1 ≤ c | X = y)

below. Recall that (X1 − X, . . . , Xn − X) is an ancillary statistic and thus

independent of X (Basu Theorem).

P (X1 ≤ c | X = y) = P (X1 − X ≤ c− y|X = y)

= P (X1 − X ≤ c− y)

= Φ


n− 1(c− y)


Thus, the UMVUE of η = Φ(c− θ) is given by

η = Φ


n− 1(c− X)


Page 56: Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su ciency and Comparison Byeong U. Park Department of Statistics, Seoul National University


Independence of Normal Sample Mean and Variance

Let X1, . . . , Xn (n ≥ 2) be a random sample from

N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+. Then, X and∑ni=1(Xi − X)2 are

independent under Pθ for all θ ∈ R× R+.

Proof: Fix σ2 = σ20 . Then, X is a CSS and (X1 − X, . . . , Xn − X) is an

ancillary statistic for the submodel Θ(σ20) ≡ {(µ, σ2

0) : µ ∈ R}. Thus, they are

independent under Pθ for all θ ∈ Θ(σ20), due to the Basu’s theorem. Since the

choice σ20 is arbitrary within R+, this implies that X and

(X1 − X, . . . , Xn − X) are independent under Pθ for all

θ ∈⋃


Θ(σ20) = R× R+.

The independence of X and∑ni=1(Xi − X)2 is immediate from this.