28
λ

Generalized Cross Validation - KTHszepessy/inversfor/marten.pdfGeneralized Cross Validation (GCV) The Generalized Cross Validation (GCV) De nition Let A ( ) be the in uence matrix

  • Upload
    vankhue

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation

Mårten Marcus

26 november 2009

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

References

Chapter 4 of Spline models for Observational Data (1990) -

Grace Wahba

Optimal Estimation of Contour Properties by Cross-Validated

Regularization (1989) - Behzad Shahraray, David Anderson

Smoothing Noisy Data with Spline Function (1979) - Peter

Craven, Grace Wahba

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Conditional Expectations

From probability theory, we have for X ∈ L2Ω,F ,P

E[X ] = argminθ∈R

E[(X − θ)2

](1)

The least squares estimate therefore gives discretized estimate of θand a natural norm to choose when searching for data disturbed by

white noise

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Ill-posedness

Consider the model yi = g(ti ) + εi , i = 1, 2, . . . , n, ti ∈ [0, 1] where

g ∈W(m)2

= f |f ′, f ′′, . . . , f (m−1)abs.cont., f (m) ∈ L2[0, 1] andεi ∼WN(0, σ2) where σ is unknown.

The least squares estimate gives

mingi∈W

(m)2

n∑i=1

(yi − gi )2 = 0 ∀yini=1 (2)

The minimum does not depend on data and is clearly ill-posed.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Regularization

We may choose to regularize, or smooth, the data as

gn,λ = argmingi∈W

(m)2

n∑i=1

(yi − g(ti ))2 + λ

∫1

0

(g (m)(t))2dt λ ≥ 0

(3)

which has a unique solution.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Properties of gn,λ

The reason for the term smoothing spline is the following

For λ = 0, gn,λ can be seen as an interpolating spline

For λ =∞, gn,λ is a single polynomial of degree m − 1

(optimal in the least squares sense)

For 0 < λ <∞, it can be shown that gn,λ is composed by

polynomials of degree at most 2m − 1 on the intevals

[ti , ti+1], i ∈ 1, 2, . . . , n − 1 such that the function and its

derivatives up to and including the 2m − 2 derivative are

continuous at the knots, and f (k)(t1) = f (k)(tn) = 0 for

k = m,m + 1, . . . , 2m − 2, i.e. natural conditions in the end

points.

From which we can see that there is a strong dependence between

the quality of the result and a good choice of λ.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Examples

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Examples

Example

Example of impact of dierent λ on gn,λ and g ′n,λ for

yi = g(ti ) + εi where g(t) = sin( πt180

), t ∈ [0, 360]

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Dening the Optimal λ

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Dening the Optimal λ

True Mean Square Error R(λ) and Optimal λ

Using the notation above we dene the true mean square error,

R(λ), as

R(λ) :=1

n

n∑i=1

(gn,λ(ti )− gti )2 (4)

The optimal λ is then dened as

λ∗ = argminλ∈R+

R(λ) (5)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Cross Validation

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Cross Validation

Intuition

Given our sample of n measurements, we leave one measurement

out and predict that data point by the use of the n − 1 remaining

measurements.

The idea is now that the best modell for the measurements is the

one that best predicts each measurement as a function of the

others.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Cross Validation

The Ordinary Cross-Validation (OCV)

Denition

Let gkn,λ, k ∈ 1, 2, . . . , n be dened as

g[k]n,λ = argmin

gi∈W(m)2

n∑i=1i 6=k

(yi − g(ti ))2 + λ

∫1

0

(g (m)(t))2dt λ ≥ 0 (6)

Then we dene the OCV mean square error as

V0(λ) =1

n

n∑i=1

(g[i ]n,λ(ti )− yi )

2 (7)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Cross Validation

The Ordinary Cross-Validation (OCV)

Denition

Let V0(λ) be dened as in the previous denition, the Ordinary

Cross-Validation Estimate of λ is

λ0 = argminλ∈R+

V0(λ) (8)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Cross Validation

Why we need more

The OCV estimate have proven good accuracy for certain periodic

functions, but in general it lacks a feature shown by the following:

Consider the function g(t) ∈W(m)2

, g(t) = g(t + 1) sampled

equidistantly e.g. tj = jn, j ∈ 1, 2, . . . , n adding some

uniform noise.

In this case all data are treated symmetrically. On the other hand,

dropping the conditions of periodicity and equidistant sampling, the

points interact dierently, giving rise to the need of weighting the

samples dierently.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation (GCV)

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation (GCV)

Rewriting V0

There exists an n × n matrix A(λ), the inuence matrix, with the

property gn,λ(t1)gn,λ(t2)

...

gn,λ(tn)

= A(λ)y (9)

Such that V0(λ) can be rewritten as

V0(λ) =1

n

n∑k=1

∑ni=1

(akjyj − yk)2

(1− akk)2(10)

Where akj , k , j ∈ 1, 2, . . . , n is element k , j of A(λ)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation (GCV)

The Generalized Cross Validation (GCV)

Denition

Let A(λ) be the inuence matrix dened above, then the GCV

function is dened as

V (λ) =1

n||(I− A(λ))y ||2[1

ntr(I− A(λ))

]2 (11)

We say that the Generalized Cross-Validation Estimate of λ is

λ = argminλ∈R+

V (λ) (12)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation (GCV)

Similarity to OCV

By rewriting V (λ) we get

V (λ) =1

n

n∑i=1

(g[i ]n,λ(ti )− yi )

2wk(λ) (13)

where wk(λ) are given by

wk(λ) =

(1− akk(λ)

1

ntr(I− A(λ))

)2

(14)

wk(λ) have been shown to adjust V0(λ) for periodicity and

non-equidistant samples.

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation (GCV)

Comparison

Comparison of the true mean square error with the GCV function

for the example λ on gn,λ and g ′n,λ for yi = g(ti ) + εi whereg(t) = sin( πt

180), t ∈ [0, 360]

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Convergence Result

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Convergence Result

Convergence Result

Theorem

Let g(·) ∈W(m)2

and let tini=1satisfy

∫ ti0w(u)du = i

n, where

w(u) is a strictly positive continuous weight function. Then there

exists sequences λn∞n=1and λ∗n∞n=1

of minimas of E[V (λ)] andE[R(λ)] such that the expectation ineciency I ∗,

I ∗ =E[V (λn)]

E[R(λ∗n)]→ 1, as n→∞ (15)

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion

Mårten Marcus Generalized Cross Validation

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

An implicit assumption that the true function g(t) is smooth

GCV was applied eectively to problems such as:

Finding the right order of splines in regression

Regularized solution of the Fredholm integral equations of the

rst kind

It provides an estimate of the regularization parameter from

general assumptions on the data.

Mårten Marcus Generalized Cross Validation