Lecture on Parameter Estimation for Stochastic …I GMM-type estimators (13.6) are consistent if the moments are correctly speciﬁed (which is a non-trivial problem!) Erik Lindström

Lecture on Parameter Estimation forStochastic Differential Equations

Erik Lindström

FMS161/MASM18 Financial Statistics

Erik Lindström Lecture on Parameter Estimation for Stochastic Differential Equations

Recap

I We are interested in the parameters θ in the StochasticIntegral Equations

X (t) = X (0) +∫ t

0µθ (s,X (s))ds +

∫ t

0σθ (s,X (s))dW (s) (1)

Why?I Model validationI Risk managementI Advanced hedging (Greeks 9.2.2 and quadratic hedging

9.2.2.1 (P/Q))


Recap


X (t) = X (0) +∫ t

0µθ (s,X (s))ds +

∫ t

0σθ (s,X (s))dW (s) (1)


9.2.2.1 (P/Q))


Recap


X (t) = X (0) +∫ t

0µθ (s,X (s))ds +

∫ t

0σθ (s,X (s))dW (s) (1)


9.2.2.1 (P/Q))


Recap


X (t) = X (0) +∫ t

0µθ (s,X (s))ds +

∫ t

0σθ (s,X (s))dW (s) (1)


9.2.2.1 (P/Q))


Some asymptotics

Consider the arithmetic Brownian motion

dX (t) = µdt + σdW (t) (2)

The drift is estimated by computing the mean, andcompensating for the sampling δ = tn+1− tn

µ =1

δN

N−1

∑n=0

X (tn+1)−X (tn). (3)

Expanding this expression reveals that the MLE is given by

µ =X (tN)−X (t0)

tN − t0= µ + σ

W (tN)−W (t0)

tN − t0. (4)

The MLE for the diffusion (σ ) parameter is given by

σ2 =

1δ (N−1)

N−1

∑n=0

(X (tn+1)−X (tn)− µδ )2 d→ σ2 χ2(N−1)

N−1(5)


Some asymptotics


dX (t) = µdt + σdW (t) (2)


µ =1

δN

N−1

∑n=0

X (tn+1)−X (tn). (3)


µ =X (tN)−X (t0)

tN − t0= µ + σ

W (tN)−W (t0)

tN − t0. (4)


σ2 =

1δ (N−1)

N−1

∑n=0

(X (tn+1)−X (tn)− µδ )2 d→ σ2 χ2(N−1)

N−1(5)


Some asymptotics


dX (t) = µdt + σdW (t) (2)


µ =1

δN

N−1

∑n=0

X (tn+1)−X (tn). (3)


µ =X (tN)−X (t0)

tN − t0= µ + σ

W (tN)−W (t0)

tN − t0. (4)


σ2 =

1δ (N−1)

N−1

∑n=0

(X (tn+1)−X (tn)− µδ )2 d→ σ2 χ2(N−1)

N−1(5)


Some asymptotics


dX (t) = µdt + σdW (t) (2)


µ =1

δN

N−1

∑n=0

X (tn+1)−X (tn). (3)


µ =X (tN)−X (t0)

tN − t0= µ + σ

W (tN)−W (t0)

tN − t0. (4)


σ2 =

1δ (N−1)

N−1

∑n=0

(X (tn+1)−X (tn)− µδ )2 d→ σ2 χ2(N−1)

N−1(5)


A simple method

Many data sets are sampled at high frequency, making the biasdue to discretization of the SDEs some of the schemes inChapter 12 acceptable.The simplest discretization, the Explicit Euler method, would forthe stochastic differential equation

dX (t) = µ(t ,X (t))dt + σ(t ,X (t))dW (t) (6)

correspond the Discretized Maximum Likelihood (DML)estimator given by

θDML = argmaxθ∈Θ

N−1

∑n=1

logφ (X (tn+1),X (tn) + µ(tn,X (tn))∆,Σ(tn,X (tn))∆)

(7)where φ(x ,m,P) is the density for a multivariate Normaldistribution with argument x , mean m and covariance P and

Σ(t ,X (t)) = σ(t ,X (t))σ(t ,X (t))T . (8)Erik Lindström Lecture on Parameter Estimation for Stochastic Differential Equations

A simple method

Many data sets are sampled at high frequency, making the biasdue to discretization of the SDEs some of the schemes inChapter 12 acceptable.The simplest discretization, the Explicit Euler method, would forthe stochastic differential equation

dX (t) = µ(t ,X (t))dt + σ(t ,X (t))dW (t) (6)

correspond the Discretized Maximum Likelihood (DML)estimator given by

θDML = argmaxθ∈Θ

N−1

∑n=1

logφ (X (tn+1),X (tn) + µ(tn,X (tn))∆,Σ(tn,X (tn))∆)

(7)where φ(x ,m,P) is the density for a multivariate Normaldistribution with argument x , mean m and covariance P and

Σ(t ,X (t)) = σ(t ,X (t))σ(t ,X (t))T . (8)Erik Lindström Lecture on Parameter Estimation for Stochastic Differential Equations

Consistency

I The DMLE is generally NOT consistent.I Approximate ML estimators (13.5) are, provided enough

computational resources are allocatedI Simulation based estimatorsI Fokker-Planck based estimatorsI Series expansions.

I GMM-type estimators (13.6) are consistent if the momentsare correctly specified (which is a non-trivial problem!)


Consistency





Consistency





Simultion based estimators

I Discretely observed SDEs are Markov processesI Then it follows that

pθ (xt |xs) = Eθ [pθ (xt |xτ )|F (s)] , t > τ > s (9)

This is the Pedersen algorithm.I Improved by Durham-Gallant (2002) and Lindström (2012)I Works very well for Multivariate models!I and is easily (...) extended to Levy driven SDEs.

















Some key points

I Naive implementation only provides a point wise estimate -use CRNs or importance sampling

I Variance reduction helps (antithetic variates, controlvariates)

I The near optimal importance sampler is a Bridge process,as it reduces variance AND improves the asymptotics.

I There is a version that is completely bias free, albeitsomewhat restrictive in terms of the class of feasiblemodels.


Fokker-Planck

Consider the expectation

E [h(x(t))|F (0)] =∫

h(x(t))p(x(t)|x(0))dx(t) (10)

and then∂

∂ tE [h(x(t))|F (0)] (11)

Two possible ways to compute this, direct and using theIto formula. Equating these yields

∂p∂ t

(x(t)|x(0)) = A ?p(x(t)|x(0)) (12)

where

A ?p(x(t)) =− ∂

∂x(t)(µ(·)p(x(t))) +

12

∂ 2

∂x2(t)

(σ

2(·)p(x(t))).

(13)


Fokker-Planck


E [h(x(t))|F (0)] =∫

h(x(t))p(x(t)|x(0))dx(t) (10)

and then∂

∂ tE [h(x(t))|F (0)] (11)


∂p∂ t

(x(t)|x(0)) = A ?p(x(t)|x(0)) (12)

where

A ?p(x(t)) =− ∂

∂x(t)(µ(·)p(x(t))) +

12

∂ 2

∂x2(t)

(σ

2(·)p(x(t))).

(13)


Fokker-Planck


E [h(x(t))|F (0)] =∫

h(x(t))p(x(t)|x(0))dx(t) (10)

and then∂

∂ tE [h(x(t))|F (0)] (11)


∂p∂ t

(x(t)|x(0)) = A ?p(x(t)|x(0)) (12)

where

A ?p(x(t)) =− ∂

∂x(t)(µ(·)p(x(t))) +

12

∂ 2

∂x2(t)

(σ

2(·)p(x(t))).

(13)


Example of the Fokker-Planck equation

From (Lindström, 2007)

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.05

0.055

0.06

0.065

0.07

0.075

0.08

0

100

200

300

400

TimeGrid

p(s,

x s;t,x t)

Figure: Fokker-Planck equation computed for the CKLS process


Comments on the PDE approach

Generally better than the Monte Carlo method in lowdimensional problems.

102

103

10−5

10−4

10−3

10−2

10−1

MA

E

time

Durham−GallantPoulsenOrder2−Pade(1,1)Order4−Pade(2,2)

Figure: Comparing Monte Carlo, 2nd order and 4th order numericalapproximations of the Fokker-Planck equation


Discussion

I Fokker-Planck is the preferred method is the state space isnon-trivial (see Pedersen et. al, 2011)

I Successfully used in 1-d and 2-d problemsI but the “curse of dimensionality” will eventually make the

method infeasable


Discussion



method infeasable


Discussion



method infeasable


Series expansion

I The solution to the Fokker-Planck equation when

dX (t) = µdt + σdW (t) (14)

is p(x(t)|x(0)) = N(x(t);x(0) + µt ,σ2t).I Hermite polynomials are the orthogonal polynomial basis

when using a Gaussian as weight function.I This is used in the ’series expansion approach’, see e.g.

(Ait-Sahalia, 2002, 2008)


Series expansion


dX (t) = µdt + σdW (t) (14)





Series expansion


dX (t) = µdt + σdW (t) (14)





Key ideas

Transform from X → Y → Z where Z is approximately standardGaussian. We assume that

dX (t) = µ(X (t))dt + σ(X (t))dW (t) (15)

First step.

Y (t) =∫ du

σ(u)(16)

It then follows that

dY (t) = µY (Y (t))dt + dW (t) (17)

Second step: Transform

Z (tk ) =Y (tk )−Y (tk−1)

tk − tk−1. (18)


Key ideas


dX (t) = µ(X (t))dt + σ(X (t))dW (t) (15)

First step.

Y (t) =∫ du

σ(u)(16)


dY (t) = µY (Y (t))dt + dW (t) (17)


Z (tk ) =Y (tk )−Y (tk−1)

tk − tk−1. (18)


Key ideas


dX (t) = µ(X (t))dt + σ(X (t))dW (t) (15)

First step.

Y (t) =∫ du

σ(u)(16)


dY (t) = µY (Y (t))dt + dW (t) (17)


Z (tk ) =Y (tk )−Y (tk−1)

tk − tk−1. (18)


Key ideas


dX (t) = µ(X (t))dt + σ(X (t))dW (t) (15)

First step.

Y (t) =∫ du

σ(u)(16)


dY (t) = µY (Y (t))dt + dW (t) (17)


Z (tk ) =Y (tk )−Y (tk−1)

tk − tk−1. (18)


Expansion

A Hermite expansion for the density pZ at order J is given by

pJZ (z|y(0), tk − tk−1) = φ(z)

J

∑j=0

ηj(tk − tk−1,y0)Hj(z) (19)

where

Hj(z) = ez2/2 dj

dz j e−z2/2. (20)

The coefficients are computed by projecting the density ontothe basis functions Hj(z) (recall Hilbert space theory)

ηj(t ,y0) =1j!

∫Hj(z)pJ

Z (z|y(0), tk − tk−1)dz (21)


Expansion

A Hermite expansion for the density pZ at order J is given by

pJZ (z|y(0), tk − tk−1) = φ(z)

J

∑j=0

ηj(tk − tk−1,y0)Hj(z) (19)

where

Hj(z) = ez2/2 dj

dz j e−z2/2. (20)

The coefficients are computed by projecting the density ontothe basis functions Hj(z) (recall Hilbert space theory)

ηj(t ,y0) =1j!

∫Hj(z)pJ

Z (z|y(0), tk − tk−1)dz (21)


Practical concerns

I The series expansion can be extremely accurate.I The standard approach is to compute ηj by Taylor

expansion up to order (tk − tk−1)K = (∆t)K

I Some restrictions ’so-called reducible diffusion’ when usingthe method for multivariate diffusions.


Other alternatives - GMM/EF

What about non-likelihood methods?I The model is govern by some p-dimensional parameter.I Suppose some set of features are important,

hl(x), l = 1, . . . ,q ≥ pI Compute

f (x(t);θ) =

h1(x(t))−Eθ [h1(Xt )]. . .

hq(xt )−Eθ [hq(Xt )]

(22)

and formI

JN(θ) =

(1N

N

∑n=1

f (x(n);θ)

)T

W

(1N

N

∑n=1

f (x(n);θ)

)(23)





f (x(t);θ) =

h1(x(t))−Eθ [h1(Xt )]. . .


(22)

and formI

JN(θ) =

(1N

N

∑n=1

f (x(n);θ)

)T

W

(1N

N

∑n=1

f (x(n);θ)

)(23)





f (x(t);θ) =

h1(x(t))−Eθ [h1(Xt )]. . .


(22)

and formI

JN(θ) =

(1N

N

∑n=1

f (x(n);θ)

)T

W

(1N

N

∑n=1

f (x(n);θ)

)(23)


GMM

The Generalized Methods of Moments (GMM) estimators isthen given by

θ = argminJN(θ) (24)

It can be shown that√

N(

θN −θ0

)→ N(0,Σ) (25)

whereΣ =

(ΓT

N Ω−1N ΓN

)−1. (26)

and ΓN and ΩN are estimates of

Γ = E[(

∂ f(x,θ)

∂θ T

)], Ω = Var [f(x,θ)] . (27)


GMM

The Generalized Methods of Moments (GMM) estimators isthen given by

θ = argminJN(θ) (24)

It can be shown that√

N(

θN −θ0

)→ N(0,Σ) (25)

whereΣ =

(ΓT

N Ω−1N ΓN

)−1. (26)

and ΓN and ΩN are estimates of

Γ = E[(

∂ f(x,θ)

∂θ T

)], Ω = Var [f(x,θ)] . (27)


Documents

Lecture on Parameter Estimation for Stochastic …I GMM-type estimators (13.6) are consistent if the moments are correctly speciﬁed (which is a non-trivial problem!) Erik Lindström