Minimal sufficient statistic

April 21, 2023 1

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Minimal sufficient statistic

.sufficient minimal also is then

andfunction one-to-one a is If Note!

is then

functions are ,, where

and

for number) infinitean (possibly

statistics sufficient available all are ,,,, If

21

2211

1

V

ThVh

t sufficienminimalT

gg

UgUgUgT

UUT

mm

m

April 21, 2023 2


A statistic defines a partition of the sample space of (X1, … , Xn ) into classes satisfying T(x1, … , xn ) = t for different values of t.

If such a partition puts the sample x = (x1, … , xn ) and

y = (y1, … , yn ) into the same class if and only if

then T is minimal sufficient for

on dependnot does ;

;

y

x

L

L

April 21, 2023 3


Example

for sufficient minimal is

on dependingnot 1

;

;

;

;

as class same the tobelongs ,, i.e.

,,,,let and ,,Let

from sample a is ,,again Assume

1

111

11

11

11

1

1

11

1

1

,,,,

,,

,,

,,

,,

1

111

1

1

T

e

ee

e

L

L

eeL

eeL

yy

yyTtxxTXXXT

Expxx

tt

yyTxxT

yyTn

xxTn

yyTnyn

xxTnxn

n

nn

n

iin

n

nn

n

n

n

n

i i

n

n

i i

y

x

y

x

xy

x

April 21, 2023 4


Rao-Blackwell theorem

ˆˆ 3)

ˆ 2)

alone offunction a is ˆ 1)

Then

ˆˆLet

of estimator point

unbiasedan ˆ and for statistic sufficient a be Let

; p.d.f

on withdistributi a from sample random a be ,,Let 1

VarVar

E

T

TE

T

xf

xx

T

T

T

T

n

April 21, 2023 5


The Exponential family of distributions

A random variable X belongs to the (k-parameter) exponential family of probability distributions if the p.d.f. of X can be written

What about N(, 2 ) ? Po( ) ? U(0, ) ?

alone of functions are and ,,

alone ,, of functions are and ,,

where

,,;

1

11

,,,,1

11 1

xCBB

DAA

exf

k

kk

DxCxBAk

k

k

j jkj

April 21, 2023 6


For a random sample x = (x1, … , xn ) from a distribution belonging to the exponential family

2.5) (Lemma sufficient minimal is addition In

,,for sufficient is ,,

,,,,;,,

;,,

111 1

12111 11

,,,,

,,,,

,,,,

1

,,,,

1

111 11

111 11

1 11 1

11 1

T

xBxBT

xxKxBxBK

ee

e

e

eL

kni ik

ni i

nkni ik

ni i

xCDnxBA

DnxCxBA

DxCxBA

n

i

DxCxBA

k

n

i ikk

j

n

i ijkj

kn

i ik

j

n

i ijkj

n

i kik

j ijkj

kik

j ijkj

x

April 21, 2023 7


Exponential family written on the canonical form:

kk

DxCxBk

kkkk

DD

exf

canonicalnatural

AA

k

k

j jj

,,,, where

,,;

parameters or called-so the

,,,,,,Let

11*

,,1

1111

1*

1

April 21, 2023 8


Completeness Let x1, … , xn be a random sample from a distribution with

p.d.f. f (x; )and T = T (x1, … , xn ) a statistic

Then T is complete for if whenever hT (T ) is a function of T such that E[hT (T )] = 0 for all values of then

Pr(hT (T ) 0) = 1 Important lemmas from this definition:

Lemma 2.6: If T is a complete sufficient statistic for and h (T ) is a function of T such that E[h (T ) ] = , then h is unique (there is at most one such function)

Lemma 2.7: If there exists a Minimum Variance Unbiased Estimator (MVUE) for and h (T ) is an unbiased estimator for , where T is a complete minimal sufficient statistic for , then h (T ) is MVUE

Lemma 2.8: If a sample is from a distribution belonging to the exponential family, then (B1(xi ) , … , Bk(xi ) ) is complete and minimal sufficient for 1 , … , k

April 21, 2023 9


Maximum-Likelihood estimation

Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )

The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )

Useful notation:

With a k-dimensional parameter:

x;maxargˆ

LML

x;maxargˆ θθθ

LML

April 21, 2023 10


Complete sample case:

If all sample values are explicitly known, then

Censored data case:

If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then

where

n

ii

n

iiML xfxf

11

;lnmaxarg;maxargˆ

uclcc

nnnn

iiML kXkXxf ,,

211

PrPr;maxargˆ

cuclc

uc

lc

nnn

kn

kn

,,

2,

1,

censored)-(right above being asonly known valuesofNumber

censored)-(left below being asonly known valuesofNumber

April 21, 2023 11


When the sample comes from a continuous distribution the censored data case can be written

In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write

where

uclcc

nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

uclc

c nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

ofright theoclosest t valueattainable thebut valuea is

ofleft theoclosest t valueattainable thebut valuea is

222

111

kkk

kkk

April 21, 2023 12


Example

excluded) 0 (case 1

10;

1

1lnln

lnlnln;

0,0,;

p.d.f.on with distributiRayleigh the

from (R.S.) sample random ,,

1

2

1

22

1

22

1

2

1

1

2

1

2

2

1

2

2

n

ii

n

ii

n

ii

n

ii

n

ii

n

i

ii

n

i

x

x

n

xn

xnl

xnl

xnx

xxe

xl

xex

xf

xx

x

x

April 21, 2023 13


n

iiML

n

ii

n

ii

n

ii

n

ii

xn

n

ii

xn

xn

x

n

x

n

x

nlx

nln

ii

1

2

1

2

2

1

2

3

2

1

2

3

2

1

2

3

12

2

1

2322

2

1ˆ maximum (local) a defines 1

022

1

2

April 21, 2023 14


Example

5Prln;lnminarg

5Pr;minargˆ

5 :censored-right is valuessample theof One!

;

function) (mass p.d.f.on with distributiPoisson

thefrom R.S. "5",3,5,3,4

4

1

4

1

5

Xxf

Xxf

x

ex

xf

ii

iiML

x

x

April 21, 2023 15


5

0

4

1

6

4

1

!1ln!lnlnminarg

!ln

!lnminarg

y

y

iii

y

yy

i i

x

ey

xx

ey

ex

i

Too complicated to find an analytical solutions.

Solve by a numerical routine!

April 21, 2023 16


Exponential family distributions:

Use the canonical form (natural parameterization):

Let

Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations

*

1;DxCxB

k

jjj

exf

kjtT

kjXBT

jj

n

iijj

,,1, assume and

,,1,1

x

X

kjtTE jj ,,1, X

April 21, 2023 17


Example

n

ii

n

ii

n

ii

n

ii

n

ii

exxxxx

n

xXExXE

MLE

XTxxB

xf

eeex

xf

xx

1111

1

!ln!lnln

1

solvingby found is

lni.e.,;!

;

ondistributiPoisson thefrom R.S. ,,

X

x

April 21, 2023 18


x

xexnexe

eeey

eexy

ex

eee

x

ee

x

ex

ex

exexxfxXE

ML

n

ii

n

ii

n

i

y

e

y

x

e

x

x

e

x

x

e

x

x

e

x

x

exxi

lnˆ

1!

1

!1!1!

!;

111

0

1

1

11

00

!ln

April 21, 2023 19


Computational aspects

When the MLEs can be found by evaluating

numerical routines for solving the generic equation g( ) = 0 can be used.

• Newton-Raphson method

• Fisher’s method of scoring (makes use of the fact that under regularity conditions:

)

This is the multidimensional analogue of Lemma 2.1 ( see page 17)

0θ

l

jiji

llE

θ

lE

xxx ;;;2 θθθ

April 21, 2023 20


When the MLEs cannot be found the above way other numerical routines must be used:

• Simplex method

• EM-algorithm

For description of the numerical routines see textbook.

Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.

April 21, 2023 21


Example

exist. minimaor maxima local No

case). ed(degenerat when )(largest as is ;,

otherwise0;,

otherwise0,;

, from R.S. ,,

21

1

1

abbaL

bxxxaabbaL

bxaabbaxf

baUxx

nn

n

x

x

April 21, 2023 22


xaxb

a

b

ab

baL

MLnML 1ˆ and ˆ

sample in the values thefrom ofion approximat possiblelargest

theand ofion approximat possiblesmallest theChoose

possible as small as is when

sample theorespect t with maximized is ;,

x

April 21, 2023 23


Properties of MLEs

Invariance:

Consistency:

Under some weak regularity conditions all MLEs are consistent

φθθθ

φθθφθφ

φθ

of theis ˆ of theis ˆ ifthen

, : offunction one-to-one a is and

zationsparameteri ealternativ tworepresent and If

MLEMLE g

hg

April 21, 2023 24


Efficiency:

Under the usual regularity conditions:

(Asymptotically efficient and normally distributed)

Sufficiency:

1ML , as ddistributeally asymptotic is ˆ

IN

for statistic sufficient minimal

theoffunction a is ˆfor unique ˆMLML MLE

April 21, 2023 25


Example

ii

nnnxx

xxxx

xx

XTXT

eL

ee

eexf

N

ii

XX

x

x

22

12221

2ln

22ln

22

12

2ln

2

12ln

2

1

2

1

2

22lnln

22ln

2

1ln

2

1

22

2

2

and ; and 2

1

;,

2

1,;

, from R.S.

2

22

2

2

2

2

22

2

2

22

222

2

22

2

2

April 21, 2023 26


i

i

MLML

i

i

xn

xn

nnnXETE

nnnXETE

1

2

22

1

22

1

,2,1

1

22

22

21

22

1

2

22

1

2221

2

42

1

solvingby obtained are ˆ and ˆ

2

42

1

2

1

X

X

April 21, 2023 27


2121,2

21

212,1212

1

22

1

21

112

2

12ˆ

2

1

2

1ˆ2

1

4

2

2

12

xxn

xx

xxn

xxn

xnxxnx

xx

nx

ii

ML

i

i

MLi

i

April 21, 2023 28


Invariance property

nbiasNote

x

xxn

ML

MLMLML

i

ML

ML

as 0but 0ˆ!

ˆˆˆ

ˆ2

1ˆ

solution) unique a has equations relating of system the(as

with iprelationsh one-to-one a has

2

,22

21

,1

2

2

1

2

April 21, 2023 29


32

2

2232

23222

2

22222

2

22

2

22

2

222

2222

22

2

22

22

22

2

21

1

222

1

12

ln2

2ln22

1;,

nnxx

l

nx

l

nl

nnxx

l

nx

l

nnnxxl

ii

i

ii

i

ii

x

April 21, 2023 30


4

222

222

32

2

2232

223222

22

22222

22

22

22

2

20

0;,

,with 2

2

21;,

01;,

;,

;,

n

nl

EI

n

nnnn

lE

nn

lE

nlE

l

2θ θ

θ

x

x

x

X

x

April 21, 2023 31


n

nN

n

n

n

n

nnII

II

II

ML

ML4

2

22

4

2

2

4

421,1,1,2,

2,1,2,2,1

20

0; as ddistributeally asymptotic is

ˆ

ˆ

20

0

0

02

002

1

det

1

θθ

θθ

θθ

i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)

April 21, 2023 32


Modifications and extensions

Ancillarity and conditional sufficiency:

2

112

2122

12

21

21

of be tosaid

is and for statsitic an be tosaid is

on not but on depends b)

on not but on depends a)

,for

statistic sufficient minimal a ,

21

2

θ

θ

θθ

θθ

θθθ

tindependenlly conditiona

TancillaryT

tTtf

tf

TTT

TT

T

XX

April 21, 2023 33


Profile likelihood:

This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function

1

1.211

21.22121

for

thecalled is ;ˆ, then of uegiven val afor

of theis ˆ if , ;, and ,With

θ

θθθ

θθθθθθθ

likelihood profile

L

MLEL

x

x

April 21, 2023 34


Marginal and conditional likelihood:

.for the,;

onsolely based becan about inferencesthen

,,;; as factorized becan ,; If

.for the,;on solely based becan

about inferences then , on dependnot does ,; and

,;; as factorized becan ,; if Now,

,;,; then of ngpartitioni a is , If

,;

i.e. , sample theof p.d.f.joint the toequivalent is ;,

111

1

2121121

111

11212

2121121

2121

21

21

θθ

θ

θθθθθ

θθ

θθθθ

θθθθθ

θθθθ

θθ

θθ

likelihood lconditionaf

ff,f

likelihood marginalf

f

ff,f

,ff

f

L

vu

vvuvu

u

uv

uvuvu

vuxxvu

x

xx

X

X

XX

X

Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

April 21, 2023 35


Penalized likelihood:

MLEs can be derived subjected to some criteria of smoothness.

In particular this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown density function or a regression curve.

The penalized log-likelihood function is written

valuesddistributey identicall-nonbut t independen ofset a

becan but here, sample random ususal not the is that Note

9) ch. (see es techniqu called-soby estimated

becan but , ; maximizingby estimatednot thusis

. of influence thegcontrollin

parameter fixed a is andfunction penalty theis where

;;

x

x

xx

validation-cross

l

R

R

Rll

P

P

θ

θ

θ

θθθ

April 21, 2023 36


Method of moments estimation (MM )

rr

rr

XE

centralr

populationr

XE

populationr

X

1'

is moment) th (

mean about themoment th The

is origin about themoment th The

: variablerandom aFor

April 21, 2023 37


n

i

rir

n

i

rir

n

xxnm

sampler

xnm

sampler

xx

1

1'

1

1

1

is mean about themoment th The

is origin about themoment th The

:,, sample random aFor x

April 21, 2023 38


The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations

twohesebetween t mixture aor

,,1,

or ,,1,'' krm

krm

rr

rr

April 21, 2023 39


Example

1212

2

12

363

12

444

4

2

3

43434

1

:moment central Second

2 :origin about themoment First

, from R.S. ,,

222

2222

2222

2332322

222'2

1

1

abaabb

babaaabb

baba

ab

aabbab

ba

ab

abab

ab

ybady

aby

XE

ba

baUxx

b

a

b

a

n

x

April 21, 2023 40


22

222

22

2222

1

2122

ˆ3ˆˆ3ˆ

since

possiblenot ˆ3ˆ32ˆ3

ˆ3ˆ3

ˆ3ˆ12

222

ˆ12

2

equations of systems the,for Solve

xbxa

ba

xxxbxa

xaax

axax

axb

xxnab

xba

ba

MMMM

n

ii

April 21, 2023 41


Method of Least Squares (LS)First principles:

Assume a sample x where the random variable Xi can be written

The least-squares estimator of is the value of that minimizes

i.e.

2 varianceconstant

andmean zero with variablerandom a is and

involving (function) emean valu theis where

i

ii

m

mX

n

ii mx

1

2

n

iiLS mx

1

2minargˆ

April 21, 2023 42


A more general approach

Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i

The least squares estimator of is then

, of functions possibly , with

, and , 0 where

;

2

2

jiiji

ijjiiii

iii

zzc

cCovVarE

zmX

ε

ε

εε

ofmatrix covariance-variance

theis and ,, where

minargˆ

1 W

W

n

TLS

April 21, 2023 43


Special cases

The ordinary linear regression model:

The heteroscedastic regression model:

xTTn

iippii

n

iiLS

n

iippii

zzx

zzX

ZZZ

ZIW

Z

1

1

2,,110

1

2

2

,,110

minarg

minargˆ

matrixconstant a be toconsidered is and with

β

ββ

εβ

x111

2

ˆ

with

WZZWZ

IWZ

TTLS

niX

β

ε β

April 21, 2023 44


The first-order auto-regressive model:

ntzX

z

xxxxx

XX

ttt

nn

nttt

,,2,

availablenot is point)-me(first tipoint samplefirst

for the i.e. , ,,*, and ,,,Let

,

1121

21

zx

IW

April 21, 2023 45


The conditional least-squares estimator of (given ) is

0

0

zeros ofvector

ldimensiona-1 theis and 0

where

minarg

minargminargˆ

111

11

1

2

2

2

2

n

zx

nnn

Tn

T

n

iii

n

iiCLS

0I0

0W

W εε

Documents

Minimal sufficient statistic