Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
On Particle Gibbs Sampling
Sumeetpal S. SinghCambridge University Engineering Department
joint work: N. Chopin (CREST-ENSAE)
MCMSKICharmonix 7 January 2014
Page 1 of 17
State-space Model
Running example (Yu & Meng 2011):
X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)
Yt |Xt = xt ∼ Poisson(ext )
In general:
Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt
Yt |Xt = xt ∼ gθ(xt , yt)dyt
Page 2 of 17
State-space Model
Running example (Yu & Meng 2011):
X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)
Yt |Xt = xt ∼ Poisson(ext )
In general:
Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt
Yt |Xt = xt ∼ gθ(xt , yt)dyt
Page 2 of 17
Inference Objective
The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)
The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )
σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...
µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)
x ′0:T |(σ′, µ′, ρ′)
One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)
Page 3 of 17
Inference Objective
The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)
The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )
σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...
µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)
x ′0:T |(σ′, µ′, ρ′)
One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)
Page 3 of 17
Inference Objective
The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)
The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )
σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...
µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)
x ′0:T |(σ′, µ′, ρ′)
One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)
Page 3 of 17
Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):
Replace Gibbs step for x0:T with
X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )
Invariant measure p(x0:T |θ′, y0:T )
The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T
Meant to emulate the Gibbs step (change the wholetrajectory)
Typically:
x ′0:T 6= x0:T but x ′i = xi for small i .
Increasing N fixes this.
Page 4 of 17
Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):
Replace Gibbs step for x0:T with
X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )
Invariant measure p(x0:T |θ′, y0:T )
The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T
Meant to emulate the Gibbs step (change the wholetrajectory)
Typically:
x ′0:T 6= x0:T but x ′i = xi for small i .
Increasing N fixes this.
Page 4 of 17
Running Particle Gibbs
Sampling p(θ, x0:399|y0:399)
0 50 100 150 200 250 300 350 400t
0.0
0.2
0.4
0.6
0.8
1.0
upda
te ra
te
multinomialresidualsystematicmulti+BS
0 50 100 150 200 250 300 350 400t
0.0
0.2
0.4
0.6
0.8
1.0
upda
te ra
te
multinomialresidualsystematicmulti+BS
Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399
Page 5 of 17
Running Particle Gibbs
Sampling p(θ, x0:400|y0:400)
0 10 20 30 40 50 60 70 80lag
0.0
0.2
0.4
0.6
0.8
ACF
X0
multinomialresidualsystematicmulti+BS
0 10 20 30 40 50 60 70 80lag
0.0
0.2
0.4
0.6
0.8
ACF
X399
multinomialresidualsystematicmulti+BS
Statistic: autocorrellation X0, X399 (200 particles)
Page 6 of 17
Particle filter
Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)
Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)
Append:(x i0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from
N∑i=1
w it+1δx i0:t+1
(dx ′0:t+1)
Page 7 of 17
Particle filter
Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)
Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)
Append:(x i0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from
N∑i=1
w it+1δx i0:t+1
(dx ′0:t+1)
Page 7 of 17
Sampling from Pθ,N(x?0:T , dx0:T )
Idea: N particle system but force x10:T = x?0:T
Intermediate step tInput: {x i0:t}Ni=1 with x1
0:t = x?0:t
Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1
Append particles i = 2, . . . ,N:
(x0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output x10:t+1 = x?0:t+1 and N − 1 independent samples from
w?t+1δx?0:t+1
(dx ′0:t+1) +N∑i=2
w it+1δx i0:t+1
(dx ′0:t+1)
Page 8 of 17
Sampling from Pθ,N(x?0:T , dx0:T )
Idea: N particle system but force x10:T = x?0:T
Intermediate step tInput: {x i0:t}Ni=1 with x1
0:t = x?0:t
Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1
Append particles i = 2, . . . ,N:
(x0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output x10:t+1 = x?0:t+1 and N − 1 independent samples from
w?t+1δx?0:t+1
(dx ′0:t+1) +N∑i=2
w it+1δx i0:t+1
(dx ′0:t+1)
Page 8 of 17
Sampling from Pθ,N(x?0:T , dx0:T )
Idea: N particle system but force x10:T = x?0:T
Intermediate step tInput: {x i0:t}Ni=1 with x1
0:t = x?0:t
Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1
Append particles i = 2, . . . ,N:
(x0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output x10:t+1 = x?0:t+1 and N − 1 independent samples from
w?t+1δx?0:t+1
(dx ′0:t+1) +N∑i=2
w it+1δx i0:t+1
(dx ′0:t+1)
Page 8 of 17
Sampling from Pθ,N(x?0:T , dx0:T )
Idea: N particle system but force x10:T = x?0:T
Intermediate step tInput: {x i0:t}Ni=1 with x1
0:t = x?0:t
Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1
Append particles i = 2, . . . ,N:
(x0:t ,Xit+1) where X i
t+1 ∼ mθ(x it , dxt+1)
Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)
Output x10:t+1 = x?0:t+1 and N − 1 independent samples from
w?t+1δx?0:t+1
(dx ′0:t+1) +N∑i=2
w it+1δx i0:t+1
(dx ′0:t+1)
Page 8 of 17
cPF example
0 1 2 3 4 5 6 7 8 9 10−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Page 9 of 17
cPF example
0 1 2 3 4 5 6 7 8 9 10−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Page 9 of 17
cPF example
0 1 2 3 4 5 6 7 8 9 10−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Page 9 of 17
Uniform Ergodicity
Assumption
gθ(xt , yt) ≤ Gt,θ (density upper bounded)
Predicted density lower bounded:∫mθ(dx0)gθ(x0, y0) ≥ 1
G0,θ,
∫mθ(xt−1, dxt)gθ(xt , yt) ≥
1
Gt,θ
For any θ and ε > 0, for N large enough,
|(PN,θϕ) (x0:T )− (PN,θϕ) (x̌0:T )| ≤ ε
for all x0:T , x̌0:T , and ϕ : XT+1 → [−1, 1]
⇒ Large N gives samples arbitrarily close to target p(x0:T |θ, y0:T )⇒ Uniform ergodicity
Page 10 of 17
Proof Idea
Use coupling: define (X ?0:T , X̌
?0:T ) such that
Law(X ?0:T ) = PN,θ(x0:T , dx
?0:T ) Law(X̌ ?
0:T ) = PN,θ(x̌0:T , dx̌?0:T )
IfP(X ?
0:T 6= X̌ ?0:T ) ≤ ε
then
PNT (x0:T ,A)− PN
T (x̌0:T ,A) = E{(
IA(X ?0:T )− IA(X̌ ?
0:T ))I{X0:T 6=X̌0:T }
}≤ P
(X ?
0:T 6= X̌ ?0:T
)≤ ε
Page 11 of 17
Coupling cPFs
Aim: Coupling outputs of PN,θ(x0:T , dx?0:T ) and PN,θ(x̌0:T , dx̌
?0:T )
If cPF outputs {x i0:t}Ni=1 and {x̌ i0:t}Ni=1 satisfy
x i0:t = x̌ i0:t for i ∈ Ct
where C0 = {2, . . . ,N}Same holds after append move: (x i0:t , x
it+1) = (x̌ i0:t , x̌
it+1) for
i ∈ Ct
Resampling move: select particles in Ct for survival if possible
Coupling probability determined by law of CT
Page 12 of 17
Backward Sampling
N. Whiteley (RSS discussion of PMCMC) suggested an extrabackward step for cPF that tries to modify (recursively, backwardin time) the ancestry of the selected trajectory.
There is a forward “version” by Lindsten and Schon (2012)
Page 13 of 17
Idea of Backward Sampling
0 1 2 3 4 5 6 7 8 9 10−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Page 14 of 17
Running Backward Sampling
0 50 100 150 200 250 300 350 400t
0.0
0.2
0.4
0.6
0.8
1.0
upda
te ra
te
multinomialresidualsystematicmulti+BS
0 10 20 30 40 50 60 70 80lag
0.0
0.2
0.4
0.6
0.8
ACF
X0
multinomialresidualsystematicmulti+BS
Left Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399
Page 15 of 17
Backward Sampling
Success depends on state transition law mθ(xt , dxt+1)
The cPF kernel Pθ,N with a BS step dominates the no BS versionin asymptotic efficiency
i.e. gives rise to a CLT with a smaller asymptotic variance (Idea:self-adjoint operator + lag-1 domination; see Tierney (1998), Mira& Geyer (1999) )
The cPF kernel geometrically ergodic ⇒ cPF kernel with BS isgeometrically ergodic
(Idea: both kernels are positive operators)
Page 16 of 17
Final Remarks
Established uniform ergodicity of cPF using coupling
Dependance on N and T not given although in practise Nshould scale linearly with T– now proved by (Andrieu, Lee, Vihola) and (Douc, Lindsten,Moulines)
Whiteley’s BS is better: in asymptotic efficiency and inheritsgeometric ergodicity
Page 17 of 17