Ömer Nezi̇h Gerek Presentation1

Prediction / Co(variance – relation) Öngörü / Ko(varyans – relasyon)

Ömer Nezih Gerek

Data Information

Data is useless in raw form.

Since you measure it, it must carry some

information!

Signatures (statistical)…

◦ 1. Model based

◦ 2. (non)Parametric

System, Prediction, etc.

Histogram, mean, etc.

Non-parametric statistics

Histogram (a fair pdf estimate)

Symmetric unimodal Skewed right Bi-modal

Multi-modal Skewed left Symmetric

N i( ) = count X k( ) = i{ }i=0,k=1

i=max_val , k=data_ size

Things that we extract from pdf

Technically “everything”

All moments:

◦ from which, we can derive mean, variance,

etc..

But “not” the correlation characteristics

in a time series!

Pdf is stationary, and doesn’t care about

inter-symbol relations…

M x t( ) = E etx{ }

momn =¶n

¶t nM x t( )

t=0

Sample statistics:

m :Mean (average) m = E x n[ ]{ }

s 2 :Variance s 2 = E x n[ ]- m( )2{ }

What is this “correlation” thing

It is:

But this is an abstract def. Let’s estimate it

from data (with example )

Rx t( ) = E x t( )x t -t( ){ }

Rx 2( ) =1

Nx 2( )x 0( ) + x 3( )x 1( ) + x 4( )x 2( ) + ...{

...+ x N +1( )x N -1( ) + x N + 2( )x N( )}

t = 2

Correlation somewhat depends on

the mean Let’s normalize it:

and call it “COVARIANCE”.

Cx t( ) = E x t( )- méë ùû x t -t( )- méë ùû{ }= E x t( )x t -t( ){ } - E x t( )m{ } - E x t -t( )m{ } + m2

= Rx t( ) - m2

Rx n( ) :Autocorrelation

Rx n( ) = E x m[ ]x m+ n[ ]{ }

Cx n( ) :Autocovariance

Cx n( ) = E x m[ ]- m( ) x m+ n[ ]- m( ){ }Cx n( ) = E x m[ ]x m+ n[ ]{ }- m2

Properties

Correlation and Covariance carries:

◦ time relation (single time difference), and

◦ info. regarding up to 2nd order monents: mean and variance.

Naturally, covariance decreases by distance…

Example calculation:

2 3 1

× 2 3 1

------------------

2 3 1

6 9 3

4 6 2

------------------

2 9 14 9 2

R(0) R(1) R(2)

m = 2

C(0) = 10, C(1) = 5, C(2) = -2

Covariance elements

is the “spectral

density!

◦ shows how much “power” the random signal

has at each “frequency”.

◦ remember “equalizers”, “woometers”, “radio

stations”…

C 0( ) = s 2 = E x t( ) - m( )2{ }

Sxx f( ) = FT C t( ){ }

Remember the Fourier Transform

From time (x(t)) to frequency (X(f))

In case x(t) = R(t), X(f) will be Sx(f).

… and integral of power signal density is

power (which is always positive):

X f( ) = x t( )e- j 2p ft dt-¥

¥

ò

Sx f( )dff1

f2

ò ³ 0 Sx f( ) ³ 0

Our source of info: Sx(f) & Rx(t)

We will extract significant information

from autocorrelation and power spectral

density:

◦ Best linear prediction (what will come in the

series?)

◦ The best linear model of the “process” that

produces a certain outcome

◦ The power of the process at different

frequencies.

What happens to a random

signal when it passes through a

linear system?

Do you remember what happens to a

signal that passes through a linear system?

where H(f) is the frequency response.

Y f( ) = X f( ) × H f( )

So, what happens to a random

signal when it passes through a

linear system?

Now, we don’t have the signal, x(t). We

only have statistical parameters, R(t) or

S(t):

where H(f) is the still frequency response.

Notice that Sy(f) is still positive

Sy f( ) = Sx f( ) × H f( )2

implies Power at output:

Mean at output:

Variance at output:

Sy f( ) = Sx f( ) × H f( )2

Sy f( )dfò = Sx f( ) × H f( )2

dfò = Ry 0( )

my = mx × H 0( )

s y

2 = Ry 0( )- my

2

So much math to do: Prediction

…x[1],x[2],…,x[n-2],x[n-1], what next?

A linear predictor “filters” incoming

samples to predict x[n]:

1, 2, 1, 2, 1, 2, 1, 2, x[n]=?

x[n] = 0*x[n-1] + 1*x[n-2]

◦ So our filter is: {0z-1 +1z-2}

Series examples:

1, 2, 3, 4, 5, x[n] = ?

x[n] = 2*x[n-1] – x[n-2]

◦ So, our filter is: {2z-1 – 1z-2}

1, 1, 2, 4, 7, 13, 24, x[n] = ?

x[n] = x[n-1] + x[n-2] + x[n-2]

◦ So, our filter is: {1z-1 + 1z-2 + 1z-3}

But these series’ are too deterministic.

Besides, how long is our filter?

If this is close to x[n]

Then this will be small!

Ideal predictor:

Linear predictor:

Stochastic series:

First order prediction:

Question: What is the optimum h1?

Answer: Such a value that minimizes:

x̂ n[ ] = h1 × x n-1[ ]

d n[ ] = x n[ ]- x̂ n[ ]

Minimzation of

Equivalently:

Which has a power magnitude:

d n[ ] = x n[ ]- x̂ n[ ]

d n[ ] = x n[ ]- h1 × x n-1[ ]

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]( )2{ }

Minimization (cont.)

Expanding:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]( )2{ }

= E x2 n[ ]- 2h1 × x n[ ] × x n-1[ ]{ +h1

2 × x2 n-1[ ]} = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }

+ h1

2 ×E x2 n-1[ ]{ }


s d

2 = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }

+ h1

2 ×E x2 n-1[ ]{ } = s x

2 - 2h1 ×Rx 1( ) + h1

2 ×s x

2

= 1+ h1

2 - 2h1r1éë ùûs x

2


We have:

which can be minimized (according to h1)

by taking its derivative w.r.t h1 and

equating to zero!

s d

2 = 1+ h1


2

¶

¶h1

s d

2 = 0


¶

¶h1

s d

2 = 0

¶

¶h1

1+ h1


2 = 0

2h1 - 2h1r1 = 0

Þ h1 = r1

The filter coefficient is the

same as the correlation

coefficient!

x̂ n[ ] = r1 × x n-1[ ]

Optimum 1st order prediction

See that, the best prediction coefficient

depends on R(t):

Is this true for “longer” prediction filters?

Let’s take a look at 2nd order prediction

filter…

r1 =Rx 1( )s x

2: First correlation coefficient

2nd order prediction

Question: What are optimum h1 and h2?

Answer: Values that minimize:

x̂ n[ ] = h1 × x n-1[ ]+ h2 × x n- 2[ ]

d n[ ] = x n[ ]- x̂ n[ ]

Minimzation of

Equivalently:

Which has a power magnitude:

d n[ ] = x n[ ]- x̂ n[ ]

d n[ ] = x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }


Expanding:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }

= E x2 n[ ]+ h1

2 × x2 n-1[ ]+ h2

2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]

+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}


Expanding:

s d

2 = E x2 n[ ]+ h1

2 × x2 n-1[ ]+ h2

2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]

+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}

= E x2 n[ ]{ } + h1

2 ×E x2 n-1[ ]{ } + h2

2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }


s d

2 = E x2 n[ ]{ } + h1

2 ×E x2 n-1[ ]{ } + h2

2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }= s x

2 + h1

2s x

2 + h2

2s x

2 - 2h1Rx 1( ) - 2h2Rx 2( ) + 2h1h2Rx 1( )

= s x

2 1+ h1

2 + h2

2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû


We now have:

which can be minimized (according to h1

and h2) by taking its derivative w.r.t h1,h2

and equating to zero!

s d

2 =s x

2 1+ h1

2 + h2

2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû

¶

¶h1

s d

2 =¶

¶h2

s d

2 = 0


¶

¶h1

s d

2 = 0 Þ h1 =r1 1- r2( )

1- r1

2( )

The filter coefficients have

somewhat changed…

¶

¶h2

s d

2 = 0 Þ h1 =r2 - r1

2( )1- r1

2( )

Comparison of 1st and 2nd orders

s d,min,1

2 = 1- r1

2éë ùûs x

2

s d,min,2

2 = 1- r1

2 -r1

2 - r2( )1- r1

2( )

é

ë

êê

ù

û

úús x

2

1³ r1

2 ³ r2Þs d,min,1

2 ³s d,min,2

2

By increasing the prediction window size,

prediction error “definitely” increases!

Note that 2nd and 1st order are the same

if:

s d,min,2

2 = 1- r1

2 -r1

2 - r2( )1- r1

2( )

é

ë

êê

ù

û

úús x

2 = 1- r1

2éë ùûs x

2

Þ r2 = r1

2

A first order system property:

Nth order prediction:

s d

2 = E d2 n[ ]{ }

= E x n[ ]- x̂ n[ ]( )2{ }

= E x n[ ]- hj x n- j[ ]j=1

N

åæ

èç

ö

ø÷

2ì

íï

îï

ü

ýï

ýï

s d

2 = E x n[ ]- hj x n- j[ ]j=1

N

åæ

èç

ö

ø÷

2ì

íï

îï

ü

ýï

ýï

¶s d

2

¶hj

= E 2 x n[ ]- x̂ n[ ]( )¶

¶hj

- x̂ n[ ]( )ìíï

îï

üýï

ýï

¶

¶hj

- x̂ n[ ]( ) = x n- j[ ]

E x n[ ]- x̂ n[ ]( )x n- j[ ]{ } = E d n[ ]x n- j[ ]{ } = 0

Important observation : d n[ ] ^ x n- j[ ]

Nth order prediction

E x n[ ]- hi x n- i[ ]i=1

N

åæ

èçö

ø÷x n- j[ ]

ìíî

üýý

= 0, "j

Rx j( ) - hi Rx j - i( )i=1

N

å = 0, "i

Nth order prediction

In short:

or:

with:

rx = Rx ´ hopt

hopt = Rx

-1 ´ rx

s d,min

2 =s x

2 - rx

TRx

-1rx

Good

correlations

reduce the

error faster!

Remarks

is not a cheap operation

There must be a good N estimate.

The result is only the best “linear” predictor.

Multidimensional extensions exist.

hopt = Rx

-1 ´ rx

Graphically…

Example

R(0) =1, R(1) = 0.9, R(2) = 0.81

What is the optimum 2-tap prediction filter?

hopt = 1 0.90.9 1

é

ëêù

ûú

-1

× 0.90.81

é

ëêù

ûú

= 5.263 -4.737-4.737 5.263

é

ëêù

ûú× 0.9

0.81é

ëêù

ûú= 0.9

0é

ëêù

ûú

hopt = R-1 ×r

Example (cont.)

R(0) =1, R(1) = 0.9, R(2) = 0.81

Why is h=[0.9, 0] ?

R= 0.9R= 0.9

R= 0.9 ×0.9 = 0.81

Because

process is

1st order

x n[ ]x n-1[ ]x n- 2[ ]

. . . . . . . . .

. . . x(n-1,m-1) x(n-1,m)

. . . x(n,m-1) x(n,m)

R(0) = RMS found...

R(1) = ave horiz-1( )R(2) = ave vert -1( )R(3) = ave diag-1( )

hopt = R-1 ×r same formula!...

"Template" and filter size may vary.

Reasons for multidimensional

prediction: Image processing?

Consider solar radiation: We have relation to “last hour”, but;

◦ Don’t we have relation to “yesterday, same hour”?

◦ What about “last year, same hour”?

◦ What about “yesterday’s wind speed”?

◦ What about “last week’s electricity consumption”?

Do we need extremely long prediction size, N?

The trick is to put “related” terms

near to each other

instead of

Then we can use 2D prediction

with similar correlation

definitions…

… and achieve low prediction error

Putting related items near each other

is good for other

methods, too…

Other methods may include:

◦ Nonlinear prediction

◦ Neural networks

◦ Transformation (Fourier, wavelet, etc.)

◦ Adaptive methods

A case for solar radiation prediction:

Correlation may include an

auxilliary signal Yielding a “hidden” process:

◦ Hidden Markov Model

We observe

pressure

We predict the

wind speed!

Result is “wind measurement” using a barometer

The results are accurate enough to make RES sizing!

Some examples with nonlinear prediction

(thanks to graduate students)

7860 7880 7900 7920 7940 7960 7980 8000 -2

0

2

4

6

8

10

12

Saat

Rüzg

ar Hı

zı (m

/s)

Ölçülen Tahmin

7860 7880 7900 7920 7940 7960 7980 8000 0

1

2

3

4

5

Saat

Rüzg

ar Hı

zı (m

/s)

Ölçülen Tahmin

İzmir

Antalya

and their distributions (İzmir)

0

2

4

6

8

10

12

14

16

18

20

Dağılım Yüzdeleri

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Rüzgar Hızı Durum Aralığı (m/s)

0

2

4

6

8

10

12

14

16

18

20

Dağılım Yüzdeleri

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Rüzgar Hızı Durum Aralığı (m/s)

Real

Predicted

Motto:

Know your math.

or, keep a SP guy around you

Documents

Ömer Nezi̇h Gerek Presentation1