Upload
lilian
View
66
Download
0
Embed Size (px)
DESCRIPTION
Minimal sufficient statistic. A statistic defines a partition of the sample space of (X 1 , … , X n ) into classes satisfying T(x 1 , … , x n ) = t for different values of t. If such a partition puts the sample x = ( x 1 , … , x n ) and - PowerPoint PPT Presentation
Citation preview
April 21, 2023 1
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Minimal sufficient statistic
.sufficient minimal also is then
andfunction one-to-one a is If Note!
is then
functions are ,, where
and
for number) infinitean (possibly
statistics sufficient available all are ,,,, If
21
2211
1
V
ThVh
t sufficienminimalT
gg
UgUgUgT
UUT
mm
m
April 21, 2023 2
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
A statistic defines a partition of the sample space of (X1, … , Xn ) into classes satisfying T(x1, … , xn ) = t for different values of t.
If such a partition puts the sample x = (x1, … , xn ) and
y = (y1, … , yn ) into the same class if and only if
then T is minimal sufficient for
on dependnot does ;
;
y
x
L
L
April 21, 2023 3
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
for sufficient minimal is
on dependingnot 1
;
;
;
;
as class same the tobelongs ,, i.e.
,,,,let and ,,Let
from sample a is ,,again Assume
1
111
11
11
11
1
1
11
1
1
,,,,
,,
,,
,,
,,
1
111
1
1
T
e
ee
e
L
L
eeL
eeL
yy
yyTtxxTXXXT
Expxx
tt
yyTxxT
yyTn
xxTn
yyTnyn
xxTnxn
n
nn
n
iin
n
nn
n
n
n
n
i i
n
n
i i
y
x
y
x
xy
x
April 21, 2023 4
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Rao-Blackwell theorem
ˆˆ 3)
ˆ 2)
alone offunction a is ˆ 1)
Then
ˆˆLet
of estimator point
unbiasedan ˆ and for statistic sufficient a be Let
; p.d.f
on withdistributi a from sample random a be ,,Let 1
VarVar
E
T
TE
T
xf
xx
T
T
T
T
n
April 21, 2023 5
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The Exponential family of distributions
A random variable X belongs to the (k-parameter) exponential family of probability distributions if the p.d.f. of X can be written
What about N(, 2 ) ? Po( ) ? U(0, ) ?
alone of functions are and ,,
alone ,, of functions are and ,,
where
,,;
1
11
,,,,1
11 1
xCBB
DAA
exf
k
kk
DxCxBAk
k
k
j jkj
April 21, 2023 6
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
For a random sample x = (x1, … , xn ) from a distribution belonging to the exponential family
2.5) (Lemma sufficient minimal is addition In
,,for sufficient is ,,
,,,,;,,
;,,
111 1
12111 11
,,,,
,,,,
,,,,
1
,,,,
1
111 11
111 11
1 11 1
11 1
T
xBxBT
xxKxBxBK
ee
e
e
eL
kni ik
ni i
nkni ik
ni i
xCDnxBA
DnxCxBA
DxCxBA
n
i
DxCxBA
k
n
i ikk
j
n
i ijkj
kn
i ik
j
n
i ijkj
n
i kik
j ijkj
kik
j ijkj
x
April 21, 2023 7
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Exponential family written on the canonical form:
kk
DxCxBk
kkkk
DD
exf
canonicalnatural
AA
k
k
j jj
,,,, where
,,;
parameters or called-so the
,,,,,,Let
11*
,,1
1111
1*
1
April 21, 2023 8
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Completeness Let x1, … , xn be a random sample from a distribution with
p.d.f. f (x; )and T = T (x1, … , xn ) a statistic
Then T is complete for if whenever hT (T ) is a function of T such that E[hT (T )] = 0 for all values of then
Pr(hT (T ) 0) = 1 Important lemmas from this definition:
Lemma 2.6: If T is a complete sufficient statistic for and h (T ) is a function of T such that E[h (T ) ] = , then h is unique (there is at most one such function)
Lemma 2.7: If there exists a Minimum Variance Unbiased Estimator (MVUE) for and h (T ) is an unbiased estimator for , where T is a complete minimal sufficient statistic for , then h (T ) is MVUE
Lemma 2.8: If a sample is from a distribution belonging to the exponential family, then (B1(xi ) , … , Bk(xi ) ) is complete and minimal sufficient for 1 , … , k
April 21, 2023 9
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Maximum-Likelihood estimation
Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )
The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )
Useful notation:
With a k-dimensional parameter:
x;maxargˆ
LML
x;maxargˆ θθθ
LML
April 21, 2023 10
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Complete sample case:
If all sample values are explicitly known, then
Censored data case:
If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then
where
n
ii
n
iiML xfxf
11
;lnmaxarg;maxargˆ
uclcc
nnnn
iiML kXkXxf ,,
211
PrPr;maxargˆ
cuclc
uc
lc
nnn
kn
kn
,,
2,
1,
censored)-(right above being asonly known valuesofNumber
censored)-(left below being asonly known valuesofNumber
April 21, 2023 11
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
When the sample comes from a continuous distribution the censored data case can be written
In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write
where
uclcc
nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
uclc
c nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
ofright theoclosest t valueattainable thebut valuea is
ofleft theoclosest t valueattainable thebut valuea is
222
111
kkk
kkk
April 21, 2023 12
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
excluded) 0 (case 1
10;
1
1lnln
lnlnln;
0,0,;
p.d.f.on with distributiRayleigh the
from (R.S.) sample random ,,
1
2
1
22
1
22
1
2
1
1
2
1
2
2
1
2
2
n
ii
n
ii
n
ii
n
ii
n
ii
n
i
ii
n
i
x
x
n
xn
xnl
xnl
xnx
xxe
xl
xex
xf
xx
x
x
April 21, 2023 13
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
n
iiML
n
ii
n
ii
n
ii
n
ii
xn
n
ii
xn
xn
x
n
x
n
x
nlx
nln
ii
1
2
1
2
2
1
2
3
2
1
2
3
2
1
2
3
12
2
1
2322
2
1ˆ maximum (local) a defines 1
022
1
2
April 21, 2023 14
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
5Prln;lnminarg
5Pr;minargˆ
5 :censored-right is valuessample theof One!
;
function) (mass p.d.f.on with distributiPoisson
thefrom R.S. "5",3,5,3,4
4
1
4
1
5
Xxf
Xxf
x
ex
xf
ii
iiML
x
x
April 21, 2023 15
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
5
0
4
1
6
4
1
!1ln!lnlnminarg
!ln
!lnminarg
y
y
iii
y
yy
i i
x
ey
xx
ey
ex
i
Too complicated to find an analytical solutions.
Solve by a numerical routine!
April 21, 2023 16
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Exponential family distributions:
Use the canonical form (natural parameterization):
Let
Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations
*
1;DxCxB
k
jjj
exf
kjtT
kjXBT
jj
n
iijj
,,1, assume and
,,1,1
x
X
kjtTE jj ,,1, X
April 21, 2023 17
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
n
ii
n
ii
n
ii
n
ii
n
ii
exxxxx
n
xXExXE
MLE
XTxxB
xf
eeex
xf
xx
1111
1
!ln!lnln
1
solvingby found is
lni.e.,;!
;
ondistributiPoisson thefrom R.S. ,,
X
x
April 21, 2023 18
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
x
xexnexe
eeey
eexy
ex
eee
x
ee
x
ex
ex
exexxfxXE
ML
n
ii
n
ii
n
i
y
e
y
x
e
x
x
e
x
x
e
x
x
e
x
x
exxi
lnˆ
1!
1
!1!1!
!;
111
0
1
1
11
00
!ln
April 21, 2023 19
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Computational aspects
When the MLEs can be found by evaluating
numerical routines for solving the generic equation g( ) = 0 can be used.
• Newton-Raphson method
• Fisher’s method of scoring (makes use of the fact that under regularity conditions:
)
This is the multidimensional analogue of Lemma 2.1 ( see page 17)
0θ
l
jiji
llE
θ
lE
xxx ;;;2 θθθ
April 21, 2023 20
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
When the MLEs cannot be found the above way other numerical routines must be used:
• Simplex method
• EM-algorithm
For description of the numerical routines see textbook.
Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.
April 21, 2023 21
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
exist. minimaor maxima local No
case). ed(degenerat when )(largest as is ;,
otherwise0;,
otherwise0,;
, from R.S. ,,
21
1
1
abbaL
bxxxaabbaL
bxaabbaxf
baUxx
nn
n
x
x
April 21, 2023 22
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
xaxb
a
b
ab
baL
MLnML 1ˆ and ˆ
sample in the values thefrom ofion approximat possiblelargest
theand ofion approximat possiblesmallest theChoose
possible as small as is when
sample theorespect t with maximized is ;,
x
April 21, 2023 23
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Properties of MLEs
Invariance:
Consistency:
Under some weak regularity conditions all MLEs are consistent
φθθθ
φθθφθφ
φθ
of theis ˆ of theis ˆ ifthen
, : offunction one-to-one a is and
zationsparameteri ealternativ tworepresent and If
MLEMLE g
hg
April 21, 2023 24
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Efficiency:
Under the usual regularity conditions:
(Asymptotically efficient and normally distributed)
Sufficiency:
1ML , as ddistributeally asymptotic is ˆ
IN
for statistic sufficient minimal
theoffunction a is ˆfor unique ˆMLML MLE
April 21, 2023 25
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
ii
nnnxx
xxxx
xx
XTXT
eL
ee
eexf
N
ii
XX
x
x
22
12221
2ln
22ln
22
12
2ln
2
12ln
2
1
2
1
2
22lnln
22ln
2
1ln
2
1
22
2
2
and ; and 2
1
;,
2
1,;
, from R.S.
2
22
2
2
2
2
22
2
2
22
222
2
22
2
2
April 21, 2023 26
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
i
i
MLML
i
i
xn
xn
nnnXETE
nnnXETE
1
2
22
1
22
1
,2,1
1
22
22
21
22
1
2
22
1
2221
2
42
1
solvingby obtained are ˆ and ˆ
2
42
1
2
1
X
X
April 21, 2023 27
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
2121,2
21
212,1212
1
22
1
21
112
2
12ˆ
2
1
2
1ˆ2
1
4
2
2
12
xxn
xx
xxn
xxn
xnxxnx
xx
nx
ii
ML
i
i
MLi
i
April 21, 2023 28
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Invariance property
nbiasNote
x
xxn
ML
MLMLML
i
ML
ML
as 0but 0ˆ!
ˆˆˆ
ˆ2
1ˆ
solution) unique a has equations relating of system the(as
with iprelationsh one-to-one a has
2
,22
21
,1
2
2
1
2
April 21, 2023 29
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
32
2
2232
23222
2
22222
2
22
2
22
2
222
2222
22
2
22
22
22
2
21
1
222
1
12
ln2
2ln22
1;,
nnxx
l
nx
l
nl
nnxx
l
nx
l
nnnxxl
ii
i
ii
i
ii
x
April 21, 2023 30
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
4
222
222
32
2
2232
223222
22
22222
22
22
22
2
20
0;,
,with 2
2
21;,
01;,
;,
;,
n
nl
EI
n
nnnn
lE
nn
lE
nlE
l
2θ θ
θ
x
x
x
X
x
April 21, 2023 31
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
n
nN
n
n
n
n
nnII
II
II
ML
ML4
2
22
4
2
2
4
421,1,1,2,
2,1,2,2,1
20
0; as ddistributeally asymptotic is
ˆ
ˆ
20
0
0
02
002
1
det
1
θθ
θθ
θθ
i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)
April 21, 2023 32
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Modifications and extensions
Ancillarity and conditional sufficiency:
2
112
2122
12
21
21
of be tosaid
is and for statsitic an be tosaid is
on not but on depends b)
on not but on depends a)
,for
statistic sufficient minimal a ,
21
2
θ
θ
θθ
θθ
θθθ
tindependenlly conditiona
TancillaryT
tTtf
tf
TTT
TT
T
XX
April 21, 2023 33
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Profile likelihood:
This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function
1
1.211
21.22121
for
thecalled is ;ˆ, then of uegiven val afor
of theis ˆ if , ;, and ,With
θ
θθθ
θθθθθθθ
likelihood profile
L
MLEL
x
x
April 21, 2023 34
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Marginal and conditional likelihood:
.for the,;
onsolely based becan about inferencesthen
,,;; as factorized becan ,; If
.for the,;on solely based becan
about inferences then , on dependnot does ,; and
,;; as factorized becan ,; if Now,
,;,; then of ngpartitioni a is , If
,;
i.e. , sample theof p.d.f.joint the toequivalent is ;,
111
1
2121121
111
11212
2121121
2121
21
21
θθ
θ
θθθθθ
θθ
θθθθ
θθθθθ
θθθθ
θθ
θθ
likelihood lconditionaf
ff,f
likelihood marginalf
f
ff,f
,ff
f
L
vu
vvuvu
u
uv
uvuvu
vuxxvu
x
xx
X
X
XX
X
Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
April 21, 2023 35
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Penalized likelihood:
MLEs can be derived subjected to some criteria of smoothness.
In particular this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown density function or a regression curve.
The penalized log-likelihood function is written
valuesddistributey identicall-nonbut t independen ofset a
becan but here, sample random ususal not the is that Note
9) ch. (see es techniqu called-soby estimated
becan but , ; maximizingby estimatednot thusis
. of influence thegcontrollin
parameter fixed a is andfunction penalty theis where
;;
x
x
xx
validation-cross
l
R
R
Rll
P
P
θ
θ
θ
θθθ
April 21, 2023 36
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Method of moments estimation (MM )
rr
rr
XE
centralr
populationr
XE
populationr
X
1'
is moment) th (
mean about themoment th The
is origin about themoment th The
: variablerandom aFor
April 21, 2023 37
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
n
i
rir
n
i
rir
n
xxnm
sampler
xnm
sampler
xx
1
1'
1
1
1
is mean about themoment th The
is origin about themoment th The
:,, sample random aFor x
April 21, 2023 38
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations
twohesebetween t mixture aor
,,1,
or ,,1,'' krm
krm
rr
rr
April 21, 2023 39
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
1212
2
12
363
12
444
4
2
3
43434
1
:moment central Second
2 :origin about themoment First
, from R.S. ,,
222
2222
2222
2332322
222'2
1
1
abaabb
babaaabb
baba
ab
aabbab
ba
ab
abab
ab
ybady
aby
XE
ba
baUxx
b
a
b
a
n
x
April 21, 2023 40
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
22
222
22
2222
1
2122
ˆ3ˆˆ3ˆ
since
possiblenot ˆ3ˆ32ˆ3
ˆ3ˆ3
ˆ3ˆ12
222
ˆ12
2
equations of systems the,for Solve
xbxa
ba
xxxbxa
xaax
axax
axb
xxnab
xba
ba
MMMM
n
ii
April 21, 2023 41
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Method of Least Squares (LS)First principles:
Assume a sample x where the random variable Xi can be written
The least-squares estimator of is the value of that minimizes
i.e.
2 varianceconstant
andmean zero with variablerandom a is and
involving (function) emean valu theis where
i
ii
m
mX
n
ii mx
1
2
n
iiLS mx
1
2minargˆ
April 21, 2023 42
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
A more general approach
Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i
The least squares estimator of is then
, of functions possibly , with
, and , 0 where
;
2
2
jiiji
ijjiiii
iii
zzc
cCovVarE
zmX
ε
ε
εε
ofmatrix covariance-variance
theis and ,, where
minargˆ
1 W
W
n
TLS
April 21, 2023 43
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Special cases
The ordinary linear regression model:
The heteroscedastic regression model:
xTTn
iippii
n
iiLS
n
iippii
zzx
zzX
ZZZ
ZIW
Z
1
1
2,,110
1
2
2
,,110
minarg
minargˆ
matrixconstant a be toconsidered is and with
β
ββ
εβ
x111
2
ˆ
with
WZZWZ
IWZ
TTLS
niX
β
ε β
April 21, 2023 44
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The first-order auto-regressive model:
ntzX
z
xxxxx
XX
ttt
nn
nttt
,,2,
availablenot is point)-me(first tipoint samplefirst
for the i.e. , ,,*, and ,,,Let
,
1121
21
zx
IW
April 21, 2023 45
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The conditional least-squares estimator of (given ) is
0
0
zeros ofvector
ldimensiona-1 theis and 0
where
minarg
minargminargˆ
111
11
1
2
2
2
2
n
zx
nnn
Tn
T
n
iii
n
iiCLS
0I0
0W
W εε