59
Prediction / Co(variance – relation) Öngörü / Ko(varyans – relasyon) Ömer Nezih Gerek

Ömer Nezi̇h Gerek Presentation1

Embed Size (px)

DESCRIPTION

afyon_kış_okulu

Citation preview

Page 1: Ömer Nezi̇h Gerek Presentation1

Prediction / Co(variance – relation) Öngörü / Ko(varyans – relasyon)

Ömer Nezih Gerek

Page 2: Ömer Nezi̇h Gerek Presentation1

Data Information

Data is useless in raw form.

Since you measure it, it must carry some

information!

Signatures (statistical)…

◦ 1. Model based

◦ 2. (non)Parametric

System, Prediction, etc.

Histogram, mean, etc.

Page 3: Ömer Nezi̇h Gerek Presentation1

Non-parametric statistics

Histogram (a fair pdf estimate)

Page 4: Ömer Nezi̇h Gerek Presentation1

Symmetric unimodal Skewed right Bi-modal

Multi-modal Skewed left Symmetric

N i( ) = count X k( ) = i{ }i=0,k=1

i=max_val , k=data_ size

Page 5: Ömer Nezi̇h Gerek Presentation1

Things that we extract from pdf

Technically “everything”

All moments:

◦ from which, we can derive mean, variance,

etc..

But “not” the correlation characteristics

in a time series!

Pdf is stationary, and doesn’t care about

inter-symbol relations…

M x t( ) = E etx{ }

momn =¶n

¶t nM x t( )

t=0

Page 6: Ömer Nezi̇h Gerek Presentation1

Sample statistics:

m :Mean (average) m = E x n[ ]{ }

s 2 :Variance s 2 = E x n[ ]- m( )2{ }

Page 7: Ömer Nezi̇h Gerek Presentation1

What is this “correlation” thing

It is:

But this is an abstract def. Let’s estimate it

from data (with example )

Rx t( ) = E x t( )x t -t( ){ }

Rx 2( ) =1

Nx 2( )x 0( ) + x 3( )x 1( ) + x 4( )x 2( ) + ...{

...+ x N +1( )x N -1( ) + x N + 2( )x N( )}

t = 2

Page 8: Ömer Nezi̇h Gerek Presentation1

Correlation somewhat depends on

the mean Let’s normalize it:

and call it “COVARIANCE”.

Cx t( ) = E x t( )- méë ùû x t -t( )- méë ùû{ }= E x t( )x t -t( ){ } - E x t( )m{ } - E x t -t( )m{ } + m2

= Rx t( ) - m2

Page 9: Ömer Nezi̇h Gerek Presentation1

Rx n( ) :Autocorrelation

Rx n( ) = E x m[ ]x m+ n[ ]{ }

Cx n( ) :Autocovariance

Cx n( ) = E x m[ ]- m( ) x m+ n[ ]- m( ){ }Cx n( ) = E x m[ ]x m+ n[ ]{ }- m2

Page 10: Ömer Nezi̇h Gerek Presentation1

Properties

Correlation and Covariance carries:

◦ time relation (single time difference), and

◦ info. regarding up to 2nd order monents: mean and variance.

Naturally, covariance decreases by distance…

Page 11: Ömer Nezi̇h Gerek Presentation1

Example calculation:

2 3 1

× 2 3 1

------------------

2 3 1

6 9 3

4 6 2

------------------

2 9 14 9 2

R(0) R(1) R(2)

m = 2

C(0) = 10, C(1) = 5, C(2) = -2

Page 12: Ömer Nezi̇h Gerek Presentation1

Covariance elements

is the “spectral

density!

◦ shows how much “power” the random signal

has at each “frequency”.

◦ remember “equalizers”, “woometers”, “radio

stations”…

C 0( ) = s 2 = E x t( ) - m( )2{ }

Sxx f( ) = FT C t( ){ }

Page 13: Ömer Nezi̇h Gerek Presentation1

Remember the Fourier Transform

From time (x(t)) to frequency (X(f))

In case x(t) = R(t), X(f) will be Sx(f).

… and integral of power signal density is

power (which is always positive):

X f( ) = x t( )e- j 2p ft dt-¥

¥

ò

Sx f( )dff1

f2

ò ³ 0 Sx f( ) ³ 0

Page 14: Ömer Nezi̇h Gerek Presentation1

Our source of info: Sx(f) & Rx(t)

We will extract significant information

from autocorrelation and power spectral

density:

◦ Best linear prediction (what will come in the

series?)

◦ The best linear model of the “process” that

produces a certain outcome

◦ The power of the process at different

frequencies.

Page 15: Ömer Nezi̇h Gerek Presentation1

What happens to a random

signal when it passes through a

linear system?

Do you remember what happens to a

signal that passes through a linear system?

where H(f) is the frequency response.

Y f( ) = X f( ) × H f( )

Page 16: Ömer Nezi̇h Gerek Presentation1

So, what happens to a random

signal when it passes through a

linear system?

Now, we don’t have the signal, x(t). We

only have statistical parameters, R(t) or

S(t):

where H(f) is the still frequency response.

Notice that Sy(f) is still positive

Sy f( ) = Sx f( ) × H f( )2

Page 17: Ömer Nezi̇h Gerek Presentation1

implies Power at output:

Mean at output:

Variance at output:

Sy f( ) = Sx f( ) × H f( )2

Sy f( )dfò = Sx f( ) × H f( )2

dfò = Ry 0( )

my = mx × H 0( )

s y

2 = Ry 0( )- my

2

Page 18: Ömer Nezi̇h Gerek Presentation1

So much math to do: Prediction

…x[1],x[2],…,x[n-2],x[n-1], what next?

A linear predictor “filters” incoming

samples to predict x[n]:

1, 2, 1, 2, 1, 2, 1, 2, x[n]=?

x[n] = 0*x[n-1] + 1*x[n-2]

◦ So our filter is: {0z-1 +1z-2}

Page 19: Ömer Nezi̇h Gerek Presentation1

Series examples:

1, 2, 3, 4, 5, x[n] = ?

x[n] = 2*x[n-1] – x[n-2]

◦ So, our filter is: {2z-1 – 1z-2}

1, 1, 2, 4, 7, 13, 24, x[n] = ?

x[n] = x[n-1] + x[n-2] + x[n-2]

◦ So, our filter is: {1z-1 + 1z-2 + 1z-3}

But these series’ are too deterministic.

Besides, how long is our filter?

Page 20: Ömer Nezi̇h Gerek Presentation1

If this is close to x[n]

Then this will be small!

Page 21: Ömer Nezi̇h Gerek Presentation1

Ideal predictor:

Linear predictor:

Page 22: Ömer Nezi̇h Gerek Presentation1

Stochastic series:

First order prediction:

Question: What is the optimum h1?

Answer: Such a value that minimizes:

x̂ n[ ] = h1 × x n-1[ ]

d n[ ] = x n[ ]- x̂ n[ ]

Page 23: Ömer Nezi̇h Gerek Presentation1

Minimzation of

Equivalently:

Which has a power magnitude:

d n[ ] = x n[ ]- x̂ n[ ]

d n[ ] = x n[ ]- h1 × x n-1[ ]

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]( )2{ }

Page 24: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

Expanding:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]( )2{ }

= E x2 n[ ]- 2h1 × x n[ ] × x n-1[ ]{ +h1

2 × x2 n-1[ ]} = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }

+ h1

2 ×E x2 n-1[ ]{ }

Page 25: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

s d

2 = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }

+ h1

2 ×E x2 n-1[ ]{ } = s x

2 - 2h1 ×Rx 1( ) + h1

2 ×s x

2

= 1+ h1

2 - 2h1r1éë ùûs x

2

Page 26: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

We have:

which can be minimized (according to h1)

by taking its derivative w.r.t h1 and

equating to zero!

s d

2 = 1+ h1

2 - 2h1r1éë ùûs x

2

¶h1

s d

2 = 0

Page 27: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

¶h1

s d

2 = 0

¶h1

1+ h1

2 - 2h1r1éë ùûs x

2 = 0

2h1 - 2h1r1 = 0

Þ h1 = r1

The filter coefficient is the

same as the correlation

coefficient!

x̂ n[ ] = r1 × x n-1[ ]

Page 28: Ömer Nezi̇h Gerek Presentation1

Optimum 1st order prediction

See that, the best prediction coefficient

depends on R(t):

Is this true for “longer” prediction filters?

Let’s take a look at 2nd order prediction

filter…

r1 =Rx 1( )s x

2: First correlation coefficient

Page 29: Ömer Nezi̇h Gerek Presentation1

2nd order prediction

Question: What are optimum h1 and h2?

Answer: Values that minimize:

x̂ n[ ] = h1 × x n-1[ ]+ h2 × x n- 2[ ]

d n[ ] = x n[ ]- x̂ n[ ]

Page 30: Ömer Nezi̇h Gerek Presentation1

Minimzation of

Equivalently:

Which has a power magnitude:

d n[ ] = x n[ ]- x̂ n[ ]

d n[ ] = x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }

Page 31: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

Expanding:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }

= E x2 n[ ]+ h1

2 × x2 n-1[ ]+ h2

2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]

+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}

Page 32: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

Expanding:

s d

2 = E x2 n[ ]+ h1

2 × x2 n-1[ ]+ h2

2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]

+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}

= E x2 n[ ]{ } + h1

2 ×E x2 n-1[ ]{ } + h2

2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }

Page 33: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

s d

2 = E x2 n[ ]{ } + h1

2 ×E x2 n-1[ ]{ } + h2

2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }= s x

2 + h1

2s x

2 + h2

2s x

2 - 2h1Rx 1( ) - 2h2Rx 2( ) + 2h1h2Rx 1( )

= s x

2 1+ h1

2 + h2

2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû

Page 34: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

We now have:

which can be minimized (according to h1

and h2) by taking its derivative w.r.t h1,h2

and equating to zero!

s d

2 =s x

2 1+ h1

2 + h2

2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû

¶h1

s d

2 =¶

¶h2

s d

2 = 0

Page 35: Ömer Nezi̇h Gerek Presentation1

Minimization (cont.)

¶h1

s d

2 = 0 Þ h1 =r1 1- r2( )

1- r1

2( )

The filter coefficients have

somewhat changed…

¶h2

s d

2 = 0 Þ h1 =r2 - r1

2( )1- r1

2( )

Page 36: Ömer Nezi̇h Gerek Presentation1

Comparison of 1st and 2nd orders

s d,min,1

2 = 1- r1

2éë ùûs x

2

s d,min,2

2 = 1- r1

2 -r1

2 - r2( )1- r1

2( )

é

ë

êê

ù

û

úús x

2

1³ r1

2 ³ r2Þs d,min,1

2 ³s d,min,2

2

Page 37: Ömer Nezi̇h Gerek Presentation1

By increasing the prediction window size,

prediction error “definitely” increases!

Note that 2nd and 1st order are the same

if:

s d,min,2

2 = 1- r1

2 -r1

2 - r2( )1- r1

2( )

é

ë

êê

ù

û

úús x

2 = 1- r1

2éë ùûs x

2

Þ r2 = r1

2

Page 38: Ömer Nezi̇h Gerek Presentation1

A first order system property:

Page 39: Ömer Nezi̇h Gerek Presentation1

Nth order prediction:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- x̂ n[ ]( )2{ }

= E x n[ ]- hj x n- j[ ]j=1

N

åæ

èç

ö

ø÷

íï

îï

ü

ýï

ýï

Page 40: Ömer Nezi̇h Gerek Presentation1

s d

2 = E x n[ ]- hj x n- j[ ]j=1

N

åæ

èç

ö

ø÷

íï

îï

ü

ýï

ýï

¶s d

2

¶hj

= E 2 x n[ ]- x̂ n[ ]( )¶

¶hj

- x̂ n[ ]( )ìíï

îï

üýï

ýï

¶hj

- x̂ n[ ]( ) = x n- j[ ]

E x n[ ]- x̂ n[ ]( )x n- j[ ]{ } = E d n[ ]x n- j[ ]{ } = 0

Important observation : d n[ ] ^ x n- j[ ]

Page 41: Ömer Nezi̇h Gerek Presentation1

Nth order prediction

E x n[ ]- hi x n- i[ ]i=1

N

åæ

èçö

ø÷x n- j[ ]

ìíî

üýý

= 0, "j

Rx j( ) - hi Rx j - i( )i=1

N

å = 0, "i

Page 42: Ömer Nezi̇h Gerek Presentation1

Nth order prediction

In short:

or:

with:

rx = Rx ´ hopt

hopt = Rx

-1 ´ rx

s d,min

2 =s x

2 - rx

TRx

-1rx

Good

correlations

reduce the

error faster!

Page 43: Ömer Nezi̇h Gerek Presentation1

Remarks

is not a cheap operation

There must be a good N estimate.

The result is only the best “linear” predictor.

Multidimensional extensions exist.

hopt = Rx

-1 ´ rx

Page 44: Ömer Nezi̇h Gerek Presentation1

Graphically…

Page 45: Ömer Nezi̇h Gerek Presentation1

Example

R(0) =1, R(1) = 0.9, R(2) = 0.81

What is the optimum 2-tap prediction filter?

hopt = 1 0.90.9 1

é

ëêù

ûú

-1

× 0.90.81

é

ëêù

ûú

= 5.263 -4.737-4.737 5.263

é

ëêù

ûú× 0.9

0.81é

ëêù

ûú= 0.9

ëêù

ûú

hopt = R-1 ×r

Page 46: Ömer Nezi̇h Gerek Presentation1

Example (cont.)

R(0) =1, R(1) = 0.9, R(2) = 0.81

Why is h=[0.9, 0] ?

R= 0.9R= 0.9

R= 0.9 ×0.9 = 0.81

Because

process is

1st order

x n[ ]x n-1[ ]x n- 2[ ]

Page 47: Ömer Nezi̇h Gerek Presentation1

. . . . . . . . .

. . . x(n-1,m-1) x(n-1,m)

. . . x(n,m-1) x(n,m)

R(0) = RMS found...

R(1) = ave horiz-1( )R(2) = ave vert -1( )R(3) = ave diag-1( )

hopt = R-1 ×r same formula!...

"Template" and filter size may vary.

Page 48: Ömer Nezi̇h Gerek Presentation1

Reasons for multidimensional

prediction: Image processing?

Consider solar radiation: We have relation to “last hour”, but;

◦ Don’t we have relation to “yesterday, same hour”?

◦ What about “last year, same hour”?

◦ What about “yesterday’s wind speed”?

◦ What about “last week’s electricity consumption”?

Do we need extremely long prediction size, N?

Page 49: Ömer Nezi̇h Gerek Presentation1

The trick is to put “related” terms

near to each other

Page 50: Ömer Nezi̇h Gerek Presentation1

instead of

Page 51: Ömer Nezi̇h Gerek Presentation1

Then we can use 2D prediction

with similar correlation

definitions…

Page 52: Ömer Nezi̇h Gerek Presentation1

… and achieve low prediction error

Page 53: Ömer Nezi̇h Gerek Presentation1

Putting related items near each other

is good for other

methods, too…

Page 54: Ömer Nezi̇h Gerek Presentation1

Other methods may include:

◦ Nonlinear prediction

◦ Neural networks

◦ Transformation (Fourier, wavelet, etc.)

◦ Adaptive methods

A case for solar radiation prediction:

Page 55: Ömer Nezi̇h Gerek Presentation1

Correlation may include an

auxilliary signal Yielding a “hidden” process:

◦ Hidden Markov Model

We observe

pressure

We predict the

wind speed!

Page 56: Ömer Nezi̇h Gerek Presentation1

Result is “wind measurement” using a barometer

The results are accurate enough to make RES sizing!

Page 57: Ömer Nezi̇h Gerek Presentation1

Some examples with nonlinear prediction

(thanks to graduate students)

7860 7880 7900 7920 7940 7960 7980 8000 -2

0

2

4

6

8

10

12

Saat

Rüzg

ar Hı

zı (m

/s)

Ölçülen Tahmin

7860 7880 7900 7920 7940 7960 7980 8000 0

1

2

3

4

5

Saat

Rüzg

ar Hı

zı (m

/s)

Ölçülen Tahmin

İzmir

Antalya

Page 58: Ömer Nezi̇h Gerek Presentation1

and their distributions (İzmir)

0

2

4

6

8

10

12

14

16

18

20

Dağılım Yüzdeleri

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Rüzgar Hızı Durum Aralığı (m/s)

0

2

4

6

8

10

12

14

16

18

20

Dağılım Yüzdeleri

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Rüzgar Hızı Durum Aralığı (m/s)

Real

Predicted

Page 59: Ömer Nezi̇h Gerek Presentation1

Motto:

Know your math.

or, keep a SP guy around you