Introduction to Particle Filters for Data Assimilationoksana/docs/STATMOS-SSDA/Dowd... · Introduction to Particle Filters for Data Assimilation ... filter density ... Can use particle

Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography)

Dalhousie University, Halifax, Canada

Introduction to Particle Filters for Data Assimilation

STATMOS Summer School in Data Assimila5on, Boulder, Colorado, May 18-‐25 2015

Problem Statement

•  Noisy 'me series observa'ons on the system state

•  a discrete 'me stochas'c dynamic model describing the 'me evolu'on of the system state

y1:T = (y1,..., yT )

xt = θxt−1 + n t

•  We want to es'mate the system state (and some'mes parameters)

e.g. Bivariate AR(1) process

Noisy Observations of System State

0 50 100 150 200

-10

-50

510

time

state

Observations

Stochastic Dynamics governing System State

0 50 100 150 200

-10

-50

510

10 Realizations of System State

time

state

Estimation of System State by Particle Filters

0 50 100 150 200

-10

-50

510

time

state

Particle Filter Estimate of the State

Target: Evolution of the Probability Distribution Of the State through Time

state

'me

�Hierarchical Bayesian Model

p(x1:T ,θ | y1:T )∝ p(y1:T | x1:T ,θ ) ⋅ p(x1:T |θ ) ⋅ p(θ )

A General Framework for DA

Special Cases in DA

•  Parameter es'ma'on (no model error)

•  Filtering solu'on for state *

•  Smoothing solu'on for state

•  Joint state & parameter es'ma'on

•  Inference on dynamic model structure

target (posterior) measurement dynamics prior

•  The general goal is to make inferences about the state xt and some'mes the sta'c parameters* θ and ϕ**.

•  We use explicit priors on process noise & observa'on errors (i.e. parametric distribu'ons for et and vt). Assume yt condi'onally independent given xt, and that et and vt are independent.

•  For DA, we want to do space-‐'me problems: space is incorporated

in xt

è Solu'ons involve sequen'al Monte Carlo based for nonlinear and nonGaussian state space models

xt = d(xt−1,θ ,et )yt = h(xt ,φ,ν t )

� State Space Model: xt ~ p(xt ,θ | xt−1)yt | xt ~ p(yt ,φ | xt )

or

Data Assimilation: A Statistical Framework

* Dynamic parameters are part of the state

t =1,…,T

** drop φ hereaTer, and defer θ

Problem Classes We want to es'mate the system state at 'me t, or given the observa'ons up to 'me s, or

xty1:S = (y1, y2,.., yS )

data available

no data available

estimation time

A) Filtering (Nowcast)

Time

B) Forecasting

C) Smoothing (Hindcast)

s=t èFiltering

s>t èPredic'on

s<t èSmoothing

s

s

s

time

measurement

nowcast forecast

time = t-1 time = t

prediction

nowcast

p(xt−1,θ | y1:t−1)

xt = d(xt−1,θ ,et ) yt

p(xt ,θ | y1:t−1) p(xt ,θ | y1:t )

-  We’ll drop parameters θ … for now

Single stage transition of system from time t-1 to time t Sequential Estimation

Probabilistic Solution for Filtering

p(xt | y1:t ) ∝ p( yt | xt ) p(xt | xt−1) ⋅ p(xt−1 | y1:t−1)dxt−1∫

è General Filtering Solution:

do for t=1, …, T, given p(x0 ) and

A single stage transition of the system for time t-1 to t involves:

Prediction:

Measurement update:

p(xt | y1:t−1) = p(xt | xt−1) ⋅ p(xt−1 | y1:t−1) dxt−1∫

p(xt | y1:t ) =p(yt | xt ) ⋅ p(xt | y1:t−1)

p(y1:t )

y1:t = (y1, y2,.., yT )

Sampling Based Estimation

Let {xt|t(i ),wt

(i )}, i = 1, ..., n be a random sample* from

drawn from the target distribu'on (the filter density)

p(xt | y1:t )

where: xt|t(i ) are the support points

wt(i )

are the normalized weights

The filter density can then be approximated as

p(xt | y1:t ) ≈ wt(i ) δ (xt|t − xt|t

(i ) )i=1

n

∑

*referred to an ensemble, made up of a collec'on of par'cles (which are sample members)

Sampling Based Estimation

−2 0 20

0.2

0.4

(a) unweighted ensemble, n=25

x

pro

babi

lity

−2 0 20

0.2

0.4

(b) weighted ensemble, n=25

x

pro

babi

lity

−2 0 20

0.2

0.4

(c) unweighted ensemble, n=250

x

pro

babi

lity

−2 0 20

0.2

0.4

(d) weighted ensemble, n=250

x

pro

babi

lity

Sequential Importance Sampling (SIS)

At time t-1 we have:

i {xt−1|t−1(i ) ,wt−1

(i ) }, draws from p(xt−1 | y1:t−1)i yt , an observation

If a candidate xt|t(i ) is drawn from a proposal q(xt|t )

then wt(i ) ∝ p(xt|t

(i ) | yt )q(xt|t

(i ) | yt )

or wt(i ) ∝wt−1

(i ) p(yt | xt|t(i ) ) ⋅ p(xt|t

(i ) | xt−1|t−1(i ) )

q(xt|t(i ) )

A draw from p(xt | y1:t ) is then available as {xt|t(i ),wt

(i )}see Arulampalam et al. 2002

SIS Remarks

•  A common simplifica'on is to let the proposal be the the prior (the predic've density) leading to: q(xt|t ) = p(xt | xt−1)

wt(i ) ∝wt−1

(i ) p(yt | xt−1(i ) )

•  The main prac'cal problem with SIS is ‘weight collapse” wherein aTer a few 'me steps, all the weights become concentrated on a few par'cles. How to fix?

•  SIS solves the filtering problems for the nonlinear, non-‐Gaussian state space model via an algorithm for sequen'al “online” es'ma'on of the system state relying on a proposal and weight update.

Sequential Importance Resampling (SIR)

•  The major breakthrough in sequential Monte Carlo methods (Gordon et al. 1993) was to incorporate a resampling step to obviate weight collapse.

i Specifically, for each time t, {xt|t−1(i ) ,wt

(i )} is a sample from p(xt | yt−1)

i We can resample (with replacement) the xt|t(i ) with a probability

proportional to wt(i ) (i.e. a weighted bootstrap)

i This yields a modified sample {xt|t(i )}, wherein particles were both killed

and replicated, and the weights are now equal, i.e. wt(i ) = 1∀ i.

Remaining Issue: sample impoverishment - some particles replicated often in the unweighted sample . Less severe than weight collapse (but of course related).

{xt|t(i )}

Basic Particle Filter Algorithm Given: (1) dynamical model, (2) observa'ons, (3) ini'al condi'ons

for t = 1 to T

(a) Prediction: {xt−1|t−1(i ) }→ {xt|t−1

(i ) } as xt|t−1

(i ) = d(xt−1|t−1(i ) ,et

(i ) ) for i = 1,...,n

(b) Measurement: {xt|t−1(i ) }→ {xt|t

(i )} as • wt

(i ) ∝ p(yt | xt|t−1(i ) ) for i = 1,...,n

• resample with replacement {xt|t−1(i ) } using weights wt

(i ) → yields {xt|t

(i )}

end (for t)

Particle Filter Schematic

Interpretation: Particle History

'me

state

R-code for a Basic Particle Filter

!for (t in 2:T)!{!!xf=t(apply(xa,1,dynamicmodel)) !!yobs=matrix(y[t,],1,ncol(y))! !w=apply(matrix(xf[,1],np,1),1,dmvnorm,x=yobs,sigma=R) !!m=sample(1:np,size=np,prob=w,replace=T);!xa=xf[m,] !!}!

predic'on step from t-‐1 to t

extract measurement at 'me t

assign weights

weighted resampling

Element 1. Prediction Step xt−1|t−1

(i ){ }draw from p(xt−1 | y1:t−1)

xt|t−1(i ){ }

draw from p(xt | y1:t−1)filter density at time t-1

predictive density at time t

Remarks: -‐  computa'onally expensive to generate large samples for

big GFD models -‐  hard to effec'vely populate a large dimension state space

with par'cles

Role: acts as proposal distribution for particle filter

xt|t−1(i ) = d(xt−1|t−1

(i ) ,et(i ) ) for i = 1,...,n

Element 2. Measurement Update xt|t−1

(i ){ }draw from p(xt | y1:t−1)

xt|t(i ){ }

draw from p(xt | y1:t )predictive density

at time t-1 filter density

at time t

uses p(xt|t | y1:t )∝ p(yt | xt|t−1) ⋅ p(xt|t−1)(Bayes’ Formula)

i PF/SIR: weighted resample of xt|t−1(i ) ,wt

(i ){ }→ xt|t(i ){ }

� Metropolis Hastings MCMC

� Ensemble Kalman filter (approximate)

� plus many others (auxiliary, unscented, ….) that modify proposal distributions

yt

Approaches:

Sequential Metropolis-Hastings Construct (independence) chain to yield sample from target

filter distribu5on .

xt|t(i ){ }, a sample from p(xt | y1:t )

•  flexible, configurable and efficient (+ a parallel with SIR) •  can alleviate sample impoverishment (resample-‐move)

1. Generate candidate from predictive density: draw xt|t−1

* from predictive density p(xt | y1:t−1)

α = min 1, p(yt | xt|t−1* )

p(yt | xt|t(k ) )

⎛⎝⎜

⎞⎠⎟

2. Evaluate acceptance probability

for k = 1 to K

3. Let xt|t* enter the posterior sample {xt|t

(i )} with probability α ,otherwise retain xt|t

(k )

end (for k) yields

Limitations

Theory: convergence to target marginal distribu'on holds under weak assump'ons (e.g. Kunsch 2005) Prac7cal Problem: The standard par'cle filtering (SIR) when used in DA suffers from sample impoverishment (recall path degeneracy) Typically, the predic'on ensemble far away from the observed state (due to small ensemble size, and high dimensional state space). è likelihood is poorly represented and es'ma'on compromised.

What modifica5ons are made to improve par5cle filters ?

p(xt | y1:t )

Add “Ji;er”: overdisperse the sample of predic've density (inflate process error) Make a Gaussian approxima5on, or transforma5on/anamorphosis: Can Use Kalman filter upda'ng equa'on

Approximate the Likelihood func5on: change its func'onal form, or inflate or alter measurement error.

Error Subspace: confine stochas'city in parameters only. Dimension reduc'on.

Use Fixed lag smoother, Batch processing incorporate observa'ons from mul'ple 'mes into observa'on update. Clever Proposal Distribu5ons and look-‐ahead filters: move beyond using “prior” (predic've density) as proposal.

Some Fixes

What about Parameter Estimation?

xt=xtθt

⎛

⎝⎜⎜

⎞

⎠⎟⎟

1. State Augmentation: append parameters to the state

Can use particle filter machinery to solve (Kitagawa 1998). Also, see multiple iterative filtering (Ionides 2006)

2. Likelihood Methods: Use sample based likelihoods

[y1:T |θ ]= L(θ | y1:T ) = [yt | xt ,θ ][xt | y1:t−1,θ ]dxt∫t=1

T

∏

≈ 1nt=1

T

∏ p(yt | xt|t−1(i )

i=1

n

∑ )

Summary

1.  State space model is sta's'cal framework for considering dynamic models and measurements

2.  Importance sampling provides theore'cal basis for par'cle filters

3.  Sequen'al importance resampling (SIR) yields a straighforward and workable algorithm for the filtering problem

4.  For the realis'c problems in data assimila'on, par'cle filters must be modified either via approxima'ons, or clever use of proposal distribu'ons.

Some Key References Jazwinski AH. 1970. Stochas'c Processes and Filtering Theory. Academic Press: New York, 376pp. Gordon NJ, Salmond DJ, Smith AFM. 1993. Novel approach to nonlinear/non-‐Gaussian Bayesian state es'ma'on. IEE Proceedings-‐F 140(2):107-‐113. Kitagawa G. 1996. Monte Carlo filter and smoother for non-‐Gaussian nonlinear state space models. Journal of Computa'onal and Graphical Sta's'cs 5: 1-‐25. Kitagawa G. 1998. A self-‐organizing state space model. Journal of the American Sta's'cal Associa'on. 93(443):1203-‐1215. Arulampalam MS, Maskell S, Gordon N, Clapp T, 2002. A tutorial on par'cle filters for online nonlinear non-‐Gaussian Bayesian tracking. IEEE Transac'ons on Signal Processing 50(2):174-‐188. Kunsch HR. 2005. Recursive Monte Carlo filters: algorithms and theore'cal analysis. The Annals of Sta's'cs. 33(5):1983-‐2021. Godsill SJ, Doucet A, West M. 2004. Monte Carlo Smoothing for Nonlinear Time Series. Journal of the American Sta's'cal Associa'on. 99(465):156-‐168. Ionides EL, Breto C, King AA. 2006. Inference for nonlinear dynamical systems. Proceedings of the Na'onal Academy of Sciences 103: 18438-‐18443.

Documents

Introduction to Particle Filters for Data Assimilationoksana/docs/STATMOS-SSDA/Dowd... · Introduction to Particle Filters for Data Assimilation ... filter density ... Can use particle