67
c 2011 by Taejeong Kim 1 Random process We would like to extend random vectors to infinite dimensions. That is, we would like to mathematically describe an infinite number of random variables simultaneously, eg, infinite trials of tossing a die. Or, we would like to mathematically describe a function or signal that is random or not specific, eg, thermal noise. random process X t (ω ),t I : 1. random sequence, random function, or random signal: X t the set of all sequences or functions 2. indexed family of infinite number of random variables: X t : I set of all random variables defined on Ω 3. X t × I I R

rp_gub_13_05

Embed Size (px)

DESCRIPTION

adada

Citation preview

Page 1: rp_gub_13_05

c©2011 by Taejeong Kim 1

Random process

We would like to extend random vectors to infinite dimensions.That is, we would like to mathematically describe an infinitenumber of random variables simultaneously, eg, infinite trialsof tossing a die.

Or, we would like to mathematically describe a function orsignal that is random or not specific, eg, thermal noise.

• random process Xt(ω), t ∈ I :

1. random sequence, random function, or random signal:

Xt : Ω → the set of all sequences or functions

2. indexed family of infinite number of random variables:

Xt : I → set of all random variables defined on Ω

3. Xt : Ω× I → IR

Page 2: rp_gub_13_05

c©2011 by Taejeong Kim 2

s s

ss

ss

6 5

41

23

Ω

Xt(1)

Xt(2)

Xt(3)

t0

-

t

qqq

R

j

:

s s

ss

ss

6 5

41

23

Ω

6 6 60

UUUXt0

qqq

Xt0(1) Xt0(2)Xt0(3)

Page 3: rp_gub_13_05

c©2011 by Taejeong Kim 3

For a fixed t, Xt(ω) is a random variable.

For a fixed ω, Xt(ω) is a deterministic function of t,which is called a sample path or sample function.

What is random here?How many Ωs can there be?What is the result of carrying out the random experiment?What is measurable in these mappings?

example:

surface temperature of a space shuttle

thermal noise of a semiconductor device

total number of customers visiting a store up to time t

sequence of iid Bernoulli random variables: Bernoulli process

Page 4: rp_gub_13_05

c©2011 by Taejeong Kim 4

types of random processes

1. discrete-time: t = · · · , −1, 0, 1, 2, · · ·2. continuous-time: t ∈ IR

3. disc-valued: For a fixed t, Xt is a disc random variable.

4. cont-valued: For a fixed t, Xt is a cont random variable.

discrete-time, discrete-valued - Bernoulli process

- t-

-

discrete-time, continuous-valued - iid Gaussian process

- t-

-

Page 5: rp_gub_13_05

c©2011 by Taejeong Kim 5

continuous-time, discrete-valued - Poisson process

- t

-

-

-

-

continuous-time, continuous-valued - see page 2

Page 6: rp_gub_13_05

c©2011 by Taejeong Kim 6

example: Xt = A cos 2πt, A ∼ unif(-1,1)cont time, cont valued

X0 = A, X1/8 =√

22

A, X1/4 = 0, X1/2 = −A

-

t-

t

example: Xt = cos 2π(t + Θ), Θ ∼ unif(0,1)cont time, cont valued

X0 = cos(2πΘ), X1/8 = cos(2πΘ + π4),

X1/4 = cos(2πΘ + π2), X1/2 = cos(2πΘ + π)

Page 7: rp_gub_13_05

c©2011 by Taejeong Kim 7

example: Xn: iid N(0, 1)disc time, cont valued

fXiXjXk(xi, xj, xk) = 1

(2π)3/2e−

x2i+x2

j+x2k

2

A random process Xt is completely characterized if any of thefollowing is known.

1. P ((Xt1, · · · , Xtk

) ∈ B) for any B, k, and t1, · · · , tk2. pXt1

···Xtk(x1, · · · , xk) for any k and t1, · · · , tk

3. fXt1···Xtk

(x1, · · · , xk) for any k and t1, · · · , tk4. FXt1

···Xtk(x1, · · · , xk) for any k and t1, · · · , tk

5. ϕXt1···Xtk

(u1, · · · , uk) for any k and t1, · · · , tk

Page 8: rp_gub_13_05

c©2011 by Taejeong Kim 8

Note that given a random process, only “finite-dimensional”probabilities or probability functions can be specified. Thus,for a continuous-time Xt,

P (|Xt| ≤ 1,∀ t ∈ t1, t2, · · · , tk) is defined.

P (|Xt| ≤ 1,∀ t ∈ [0, 1]) is not defined.

-

t-

t

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

Conversely, a family of finite dimensional cdfs determines arandom process if the family is “consistent”.

Page 9: rp_gub_13_05

c©2011 by Taejeong Kim 9

A family of finite dimensional cdfs Ft1··· tk(a1, · · · , ak) is saidto be consistent if the following hold. [Kolmogorov]

1. The cdfs are invariant under index permutation.

2. The cdfs satisfy the dimension reduction rule.

Condition 1 means, for example,

Ft1t2 t3··· tk(a1, a2, a3, · · · , ak) = Ft2 t1t3··· tk(a2, a1, a3, · · · , ak)

Condition 2 means, for example,

Ft1t2(a1, a2) = Ft1t2 t3··· tk(a1, a2,∞, · · · ,∞)

Page 10: rp_gub_13_05

c©2011 by Taejeong Kim 10

• Two random processes Xt and Yt are defined on the samesample space Ω as a natural extension of a random process.

Two random processes Xt and Yt are completely character-ized if any of the following is known.

1. P ((Xt1· · ·Xtk

, Ys1· · ·Ysl

) ∈ B)

2. pXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

3. fXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

4. FXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

5. ϕXt1···Xtk

Ys1···Ysl(u1, · · · , uk, v1, · · · , vl)

The idea extends to multiple random processes.

Page 11: rp_gub_13_05

c©2011 by Taejeong Kim 11

independent processes Xt and Yt:

(Xt1· · ·Xtk

) and (Ys1· · ·Ysl

) are independent

for any k, l, t1, · · · , tk, and s1, · · · , sl.

Equivalently,

1. P ((Xt1· · ·Xtk

) ∈ BX, (Ys1· · ·Ysl

) ∈ BY )

= P ((Xt1· · ·Xtk

) ∈ BX)P ((Ys1· · ·Ysl

) ∈ BY )

2. pXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= pXt1···Xtk

(x1, · · · , xk)pYs1···Ysl(y1, · · · , yl)

3. fXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= fXt1···Xtk

(x1, · · · , xk)fYs1···Ysl(y1, · · · , yl)

4. FXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= FXt1···Xtk

(x1, · · · , xk)FYs1···Ysl(y1, · · · , yl)

Page 12: rp_gub_13_05

c©2011 by Taejeong Kim 12

5. ϕXt1···Xtk

Ys1···Ysl(u1, · · · , uk, v1, · · · , vl)

= ϕXt1···Xtk

(u1, · · · , uk)ϕYs1···Ysl(v1, · · · , vl)

The idea extends to multiple random processes.

example: independent random processes

Xt = A cos 2πt, Yt = B cos 2πt, where

A and B are independent random variables.

Xt = cos 2π(t + Θ), Yt = cos 2π(t + Ψ), where

Θ, Ψ ∼ unif(0,1) and independent random variables.

Xn, Yn: iid N(0, 1)

fXiXjYk(xi, xj, yk) = 1

(2π)3/2e−

x2i+x2

j+y2k

2

Page 13: rp_gub_13_05

c©2011 by Taejeong Kim 13

Moment

mean function:

mX(t) := EXt =

∑x xpXt(x), disc valued

∫xfXt(x)dx, cont valued

auto-correlation function, acf :

RX(t, s) := EXtXs =

∑u

∑v uvpXtXs(u, v), disc valued

∫ ∫uvfXtXs(u, v)dudv, cont valued

X = (Xt1, · · · , Xtk

) ⇒ RX =

RX(t1, t1) · · · RX(t1, tk)... ...

RX(tk, t1) · · · RX(tk, tk)

,

where RX(ti, ti) = EX2ti.

Page 14: rp_gub_13_05

c©2011 by Taejeong Kim 14

auto-covariance function, acvf :

CX(t, s) := E(Xt −mX(t))(Xs −mX(s))

= RX(t, s)−mX(t)mX(s)

X = (Xt1, · · · , Xtk

) ⇒ CX =

CX(t1, t1) · · · CX(t1, tk)... ...

CX(tk, t1) · · · CX(tk, tk)

,

where CX(ti, ti) = var(Xti).

cross-correlation function, ccf :

RXY (t, s) := EXtYs =

∑u

∑v uvpXtYs(u, v), jointly disc

∫ ∫uvfXtYs(u, v)dudv, jointly cont

Page 15: rp_gub_13_05

c©2011 by Taejeong Kim 15

cross-covariance function, ccvf :

CXY (t, s) := E(Xt −mX(t))(Ys −mY (s))

= RXY (t, s)−mX(t)mY (s)

Note that these functions are discrete-time functions for discrete-time random processes (t and s are integers) and continuous-time functions for continuous-time random processes (t and sare real numbers).

uncorrelated processes Xt and Yt: EXtYs = EXtEYs

for any t and s.

RXY (t, s) = mX(t)mY (s)

CXY (t, s) = 0

independent ⇒ uncorrelateduncorrelated 6⇒ independent (except j Gaussian processes)

Page 16: rp_gub_13_05

c©2011 by Taejeong Kim 16

covariance matrix of (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) =

C1 OO C2

orthogonal processes Xt and Yt: EXtYs = 0

for any t and s.

RXY (t, s) = 0

correlation matrix of (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) =

R1 OO R2

example: Form two Bernoulli processes, taking values 0 and 1,by independent Bernoulli trials. Then they are independent,uncorrelated, but not orthogonal.

If you instead similarly form modified Bernoulli processes, tak-ing values ±1, they additionally become orthogonal.

Page 17: rp_gub_13_05

c©2011 by Taejeong Kim 17

example: Ω = ω1, ω2, P (ω1) = P (ω2) = 12

Xt(ω1) = cos t, Xt(ω2) = sin t

Yt(ω1) = 1, Yt(ω2) = −1

Ωuω1

uω2

Xt(ω1) -

0 t

Xt(ω2) -

0 t

Yt(ω1) -

0 t

Yt(ω2) -

0 t

pmf varies in time ⇒ Computing expectation is easier on Ω.

mX(t) = EXt = 12cos t + 1

2sin t =

√2

2cos(t− π

4)

mY (t) = EYt = 12· 1 + 1

2· (−1) = 0

Page 18: rp_gub_13_05

c©2011 by Taejeong Kim 18

RX(t, s) = EXtXs = 12cos t cos s + 1

2sin t sin s = 1

2cos(t− s)

RY (t, s) = EYtYs = 12· 1 · 1 + 1

2· (−1) · (−1) = 1

RXY (t, s) = EXtYs = 12cos t− 1

2sin t =

√2

2cos(t + π

4)

Xt and Yt are dependent, correlated, and not orthogonal.

example: Ω = ω1, ω2, ω3, ω4, P (ωi) = 14, i = 1, 2, 3, 4

Xt(ω1) = Xt(ω2) = cos t, Xt(ω3) = Xt(ω4) = − cos t

Yt(ω1) = Yt(ω3) = sin t, Yt(ω2) = Yt(ω4) = − sin t

Page 19: rp_gub_13_05

c©2011 by Taejeong Kim 19

Ω

uω1

uω2

uω3

uω4

Xt(ω1)

Xt(ω2)-

0 t

Xt(ω3)

Xt(ω4)-

0 t

Yt(ω1)

Yt(ω3)-

0 t

Yt(ω2)

Yt(ω4)-

0 t

mX(t) = mY (t) = 0

RX(t, s) = EXtXs = 2 · 14cos t cos s + 2 · 1

4(− cos t)(− cos s)

= cos t cos s

RXY (t, s) = EXtYs

= 14cos t sin s+1

4cos t(− sin s)+1

4(− cos t) sin s+1

4(− cos t)(− sin s)

= 0

Page 20: rp_gub_13_05

c©2011 by Taejeong Kim 20

Xt and Yt are independent, uncorrelated, and orthogonal.

Xt = U cos t, Yt = V sin t, where U and V are independentBer(1/2) with values +1 and −1.

example:

Xt = cos 2π(t + Θ), Yt = sin 2π(t + Θ), Θ ∼ unif(0,1)

-

t

Xt

Yt

Θ = 0

-

t

Xt

Yt

Θ = 1/16

-

t

Xt

Yt

Θ = 13/16

mX(t) = EXt =∫xfXt(x)dx =

∫ 10 cos 2π(t + θ)dθ = 0

mY (t) = EYt =∫yfYt(y)dy =

∫ 10 sin 2π(t + θ)dθ = 0

Page 21: rp_gub_13_05

c©2011 by Taejeong Kim 21

RX(t, s) = E cos 2π(t + Θ) cos 2π(s + Θ)

= E12[cos 2π(t− s) + cos 2π(t + s + 2Θ)]

= 12cos 2π(t− s) + 1

2

∫ 10 cos 2π(t + s + 2θ)dθ

= 12cos 2π(t− s)

= CX(t, s)

RXY (t, s) = E cos 2π(t + Θ) sin 2π(s + Θ)

= E12[− sin 2π(t− s) + sin 2π(t + s + 2Θ)]

= −12sin 2π(t− s) + 1

2

∫ 10 sin 2π(t + s + 2θ)dθ

= −12sin 2π(t− s)

= CXY (t, s)

Xt and Yt are dependent, correlated, and not orthogonal.

Page 22: rp_gub_13_05

c©2011 by Taejeong Kim 22

In this last example the mean function, acf, acvf, ccf, andccvf are all shift-invariant, ie,

mX(t) = mX(t + τ ),

RX(t, s) = RX(t + τ, s + τ ),

CX(t, s) = CX(t + τ, s + τ ),

RY (t, s) = RY (t + τ, s + τ ),

CY (t, s) = CY (t + τ, s + τ ),

RXY (t, s) = RXY (t + τ, s + τ ),

CXY (t, s) = CXY (t + τ, s + τ ).

Shift invariance is generally called stationarity.

Page 23: rp_gub_13_05

c©2011 by Taejeong Kim 23

Stationarity

(strict-sense) stationary, sss, Xt:

P ((Xt1+τ , · · · , Xtk+τ) ∈ B) = P ((Xt1, · · · , Xtk

) ∈ B)

Equivalently, shift-invariance of jpmf, jpdf, jcdf, or jchf alsodefines sss. That is,

pXt1+τ ···Xtk+τ (x1, · · · , xk) = pXt1···Xtk

(x1, · · · , xk)

fXt1+τ ···Xtk+τ (x1, · · · , xk) = fXt1···Xtk

(x1, · · · , xk)

FXt1+τ ···Xtk+τ (x1, · · · , xk) = FXt1···Xtk

(x1, · · · , xk)

ϕXt1+τ ···Xtk+τ (u1, · · · , uk) = ϕXt1···Xtk

(u1, · · · , uk).

(Xt1, · · · , Xtk

) and (Xt1+τ , · · · , Xtk+τ) are identical.

Page 24: rp_gub_13_05

c©2011 by Taejeong Kim 24

If Xt is sss,

mX(t + τ ) =

∑x xpXt+τ (x) =

∑x xpXt(x)

∫xfXt+τ (x)dx =

∫xfXt(x)dx

= mX(t)

RX(t + τ, s + τ ) = RX(t, s)

CX(t + τ, s + τ ) = CX(t, s)

Eg(Xt1+τ , · · · , Xtk+τ) = Eg(Xt1, · · · , Xtk

)

example: cos 2π(t + Θ), Θ ∼ unif(0, 1), Bernoulli rp, and anyiid rp are stationary, but A cos 2πt with random A is not.

wide-sense stationary, wss, Xt:

1. mX(t + τ ) = mX(t)

2. RX(t + τ, s + τ ) = RX(t, s)

Only the first and second moments are shift invariant.

Page 25: rp_gub_13_05

c©2011 by Taejeong Kim 25

If Xt is wss,

mX(t) = mX

RX(t, s) = RX(t− s) or RX(τ ) = EXt+τXt

CX(t, s) = CX(t− s) or CX(τ ) = RX(τ )−mX2

sss ⇒ wss; wss 6⇒ sss (except Gaussian processes)

example: U = ±1 equiprobable, V =

−√2, prob 1/3√2/2, prob 2/3

U and V are independent.

EU = EV = 0, EU 2 = EV 2 = 1

Xt =

U, t is oddV, t is even

⇒ Xt is wss but not sss.

Page 26: rp_gub_13_05

c©2011 by Taejeong Kim 26

example: random telegraph process

P (Xt = 1) = P (Xt = −1) = 1/2

The probability that k traversals occur in a time interval of

length τ is (λτ )k

k!e−λτ .

-

λ = 1

t-

λ = 1/2

t

RX(τ )

-

6

τ

1 RX(τ )

-

6

τ

1

mX(t) = 1 · P (Xt = 1) + (−1) · P (Xt = −1) = 0

Page 27: rp_gub_13_05

c©2011 by Taejeong Kim 27

EXt+τXt (for τ ≥ 0)

= 1 · 1 · P (Xt+τ = 1, Xt = 1) + 1 · (−1) · P (Xt+τ = 1, Xt = −1)

+(−1) ·1 ·P (Xt+τ = −1, Xt = 1)+(−1) · (−1) ·P (Xt+τ = −1, Xt = −1)

= P (Xt+τ = 1|Xt = 1) · 12 − P (Xt+τ = 1|Xt = −1) · 1

2

−P (Xt+τ = −1|Xt = 1) · 12 + P (Xt+τ = −1|Xt = −1) · 1

2

= P (even traversals in [t, t + τ ])− P (odd traversals in [t, t + τ ])

=∑

k even(λτ )k

k!e−λτ − ∑

k odd(λτ )k

k!e−λτ

= 12

∑∞k=0

(λτ )k+(−λτ )k

k!e−λτ − 1

2∑∞

k=0(λτ )k−(−λτ )k

k!e−λτ

= 12(eλτ + e−λτ)e−λτ − 1

2(eλτ − e−λτ)e−λτ = e−2λτ

⇒ RX(τ ) = e−2λ|τ |

⇒ Xt is wss.

Page 28: rp_gub_13_05

c©2011 by Taejeong Kim 28

(strict-sense) cyclostationary Xt with period T :

(Xt1, · · · , Xtk

) and (Xt1+mT , · · · , Xtk+mT ) are identical

for any integer m.

When a stationary process is processed in blocks of size T ,eg, DFT, the result is cyclostationary with period T .

(strict-sense) asymptotically stationary Xt:

(XT+t1, · · · , XT+tk

) and (XT+t1+τ , · · · , XT+tk+τ) become

identical as T →∞.

example: homogeneous Markov chain

Cyclostationarity and asymptotic stationarity are also definedin wide-sense.

Page 29: rp_gub_13_05

c©2011 by Taejeong Kim 29

jointly (strict-sense) stationary, jsss, Xt and Yt:

P ((Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ) ∈ B)

= P ((Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) ∈ B)

Equivalently, shift-invariance of jpmf, jpdf, jcdf, or jchf alsodefines jsss. That is,

pXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= pXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)

fXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= fXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)

FXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= FXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)

Page 30: rp_gub_13_05

c©2011 by Taejeong Kim 30

ϕXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (u1, · · · , uk, v1, · · · , vl)

= ϕXt1,···,Xtk

Ys1,···,Ysl(u1, · · · , uk, v1, · · · , vl).

Also equivalently, we can say (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

)

and (Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ) are identical.

If Xt and Yt are jsss,

Each is sss.

RXY (t + τ, s + τ ) = RXY (t, s)

CXY (t + τ, s + τ ) = CXY (t, s)

Eg(Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ)

= Eg(Xt1, · · · , Xtk

, Ys1, · · · , Ysl

)

Page 31: rp_gub_13_05

c©2011 by Taejeong Kim 31

example: Consider two iid Bernoulli process Xt and Yt, inde-

pendent of each other, and let Zt =

Xt, t oddYt, t even

.

Then all three processes are individually stationary, but Xt andZt are not jointly stationary; neither are Yt and Zt.

jointly wide-sense stationary, jwss, Xt and Yt:

1. Xt and Yt are each wss.

2. RXY (t + τ, s + τ ) = RXY (t, s)

If Xt and Yt are jwss,

RXY (t, s) = RXY (t− s) or RXY (τ ) = EXt+τYt

CXY (t, s) = CXY (t− s) or CXY (τ ) = RXY (τ )−mXmY

jsss ⇒ jwss; jwss 6⇒ jsss (except jointly Gaussian processes)

Page 32: rp_gub_13_05

c©2011 by Taejeong Kim 32

Joint cyclostationarity and joint asymptotic stationarity, bothin strict-sense and in wide-sense, are also defined.

example:

Xt = cos 2π(t + Θ), Yt = sin 2π(t + Θ), Θ ∼ unif(0,1)

Xt and Yt are jsss, and therefore jwss. RX(τ )? RXY (τ )?

-

t

Xt

Yt

Θ = 0

-

t

Xt

Yt

Θ = 1/16

-

t

Xt

Yt

Θ = 13/16

Also, two independent Bernoulli processes are jsss, and there-fore jwss. What are the acf and ccf?

Page 33: rp_gub_13_05

c©2011 by Taejeong Kim 33

Properties of acf

properties of RX(τ ) of a wss process:

1. second moment, total average power: RX(0) = EXt2

2. even symmetry: RX(τ ) = RX(−τ )

3. |RX(τ )| ≤ RX(0) = EXt2

proof: Schwarz inequality.

4. sample path behavior:

P (|Xt+τ −Xt| ≥ ε) ≤ 2ε2(RX(0)−RX(τ ))

proof: Square the arguments and use Markov inequality.

Page 34: rp_gub_13_05

c©2011 by Taejeong Kim 34

example: random telegraph process

P (Xt = 1) = P (Xt = −1) = 1/2

The probability that k traversals occur in a time interval of

length τ is (λτ )k

k!e−λτ .

-

λ = 1

t-

λ = 1/2

t

RX(τ )

-

6

τ

1 RX(τ )

-

6

τ

16?

6?

Page 35: rp_gub_13_05

c©2011 by Taejeong Kim 35

5. RX(T ) = RX(0) for some T > 0⇒ RX(τ ) is periodic with period T .

proof: [E(Xt+τ+T −Xt+τ)Xt]2 ≤ E(Xt+τ+T −Xt+τ)

2EX2t

= E(X2t+τ+T + X2

t+τ − 2Xt+τ+TXt+τ)EX2t

⇒ [RX(τ + T )−RX(τ )]2 ≤ 2RX(0)[RX(0)−RX(T )]

example: Xt = cos 2π(t + Θ), Θ ∼ unif(0,1)

mX(t) = 0, RX(τ ) = cos 2πτ = CX(τ )

6. For a continuous-time process,RX(τ ) is continuous at τ = 0.⇒ RX(τ ) is continuous everywhere.

proof: Let T → 0 in the proof of 5.

7. inverse FT of non-negative even function (psd, see below)

Page 36: rp_gub_13_05

c©2011 by Taejeong Kim 36

example: Are these acfs?

-

6

τ0-

6

τ0-

6

τ0

-

6

τ0-

6

τ0-

6

τ0

-

6

τ0

-

6

τ0-

6

τ0

Page 37: rp_gub_13_05

c©2011 by Taejeong Kim 37

Power spectral density

The power spectral density SX(f ) is the function describing howthe power of Xt is distributed over the frequency axis.

⇒ the power in [f, f + ∆f ) = SX(f )∆f

power spectral density, psd, SX(f ) for wss Xt:

SX(f ) :=

∑∞τ=−∞RX(τ )e−j2πfτ disc-time

∫ ∞−∞RX(τ )e−j2πfτdτ cont-time

Why does the Fourier transform of the acf come to be thepsd in the above sense? → Wiener-Kinchin theorem

Fourier inversion: RX(τ ) =

∫ 1/2−1/2 SX(f )ej2πfτdf disc-time

∫ ∞−∞ SX(f )ej2πfτdf cont-time

real symmetric RX(τ ) ⇒ real symmetric SX(f )

Page 38: rp_gub_13_05

c©2011 by Taejeong Kim 38

non-negative definite RX(τ ) ⇒ SX(f ) ≥ 0

Any real, nonnegative, even function can be a psd, and theinverse FT of any such function can be a acf.

non-negative definite function R(t, s):

∀(t1, · · · , tk), the k×k matrix with the (i,j)th element R(ti, tj)is non-negative definite.

Also called positive semidefinite.

positive definite function R(t, s): R(ti, tj) formpositive definite matrix.

average power of Xt:

limT→∞ 12T+1

∑Tt=−T Xt

2 disc-time

limT→∞ 12T

∫ T−T Xt

2dt cont-time

This is a random variable. → Expectation is needed

Page 39: rp_gub_13_05

c©2011 by Taejeong Kim 39

expected average power:

PX :=

E limT→∞ 12T+1

∑Tt=−T Xt

2

E limT→∞ 12T

∫ T−T Xt

2dt

=

limT→∞ 12T+1

∑Tt=−T EXt

2

limT→∞ 12T

∫ T−T EXt

2dt

= EXt2 = RX(0) =

∫ 1/2−1/2 SX(f )df

∫ ∞−∞ SX(f )df

for wss Xt.

The last equality is due to Wiener-Khinchin theorem.

This shows that the area under the graph of SX(f ) representsthe total expected average power of Xt, but it is not enoughto justify its being the spectral “density”.→ linear time-invariant system, narrow band-pass filter

Page 40: rp_gub_13_05

c©2011 by Taejeong Kim 40

Wiener-Khinchin theorem

We discuss continuous-time processes.The discussion goes parallel for discrete-time processes.

deterministic finite-energy signal xt:

xf =∫ ∞−∞ xte

−j2πftdt and xt =∫ ∞−∞ xfe

j2πftdf

time acf rx(τ ) of xt: rx(τ ) =∫ ∞−∞ xt+τxtdt

energy spectral density: |xf |2 =∫ ∞−∞ rx(τ )e−j2πfτdτ

rx(τ ) =∫ |xf |2ej2πfτdf [inverse FT]

rx(0) =∫ |xf |2df =

∫xt

2dt: energy of xt, Parseval’s relation

Page 41: rp_gub_13_05

c©2011 by Taejeong Kim 41

Let h(t) be an ideal band-passfilter with the pass band

[f0 −∆f, f0 + ∆f ].

- -h(t)xt yt

-

6

f0

|xf |2

f0−f0

H(f )¡¡ª

-

6

f0

|yf |2

f0−f0

ry(0) =∫yt

2dt

=∫ |yf |2df

=∫ |H(f )|2|xf |2df

=∫ f0+∆f

f0−∆f|xf |2df

+∫ −f0+∆f

−f0−∆f|xf |2df

This justifies that |xf |2 isthe energy spectral “density”:the energy in a band is theintegral of |xf |2 in the band.

Page 42: rp_gub_13_05

c©2011 by Taejeong Kim 42

Random processes have infinite energy.

⇒ XTt :=

Xt, |t| ≤ T0, else

: truncated for finite energy

XTf =

∫ ∞−∞XT

t e−j2πftdt

energy spectral density of XTt : |XT

f |2energy of XT

t :∫ ∞−∞ (XT

t )2dt =∫ T−T (XT

t )2dt =∫ T−T Xt

2dt

=∫ ∞−∞ |XT

f |2dfaverage power of XT

t : 12T

∫ T−T Xt

2dt = 12T

∫ ∞−∞ |XT

f |2df=

∫ ∞−∞

12T|XT

f |2dfperiodogram, an approximation to the psd: 1

2T|XT

f |2But this is a random variable for each f , ie, a random processwith the parameter f .

Page 43: rp_gub_13_05

c©2011 by Taejeong Kim 43

expected average power of XTt :

12T

∫ T−T EXt

2dt =∫ ∞−∞

12T

E|XTf |2df

PX = limT→∞ 12T

∫ T−T EXt

2dt

= limT→∞∫ ∞−∞

12T

E|XTf |2df

=∫ ∞−∞ (limT→∞ 1

2TE|XT

f |2)df

Page 44: rp_gub_13_05

c©2011 by Taejeong Kim 44

•Wiener-Khinchin theorem: limT→∞ 12T

E|XTf |2 = SX(f )

proof: 12T

E|XTf |2 = 1

2TE|∫ XT

t e−j2πftdt|2

= 12T

E(∫ T−T Xte

−j2πftdt)(∫ T−T Xse

−j2πfsds)∗

= 12T

∫ T−T

∫ T−T EXtXse

−j2πf (t−s)dtds

= 12T

∫ T−T

∫ T−T RX(t− s)e−j2πf (t−s)dtds

-

6

t

s

- -

6@

@@

@@

@

@@

@@

@@

τ

s

(t, s)→(τ, s), τ = t− s

= 12T

∫ 0−2T

∫ T−T−τ RX(τ )e−j2πfτdsdτ

+ 12T

∫ 2T0

∫ T−τ−T RX(τ )e−j2πfτdsdτ

Page 45: rp_gub_13_05

c©2011 by Taejeong Kim 45

=∫ 2T−2T

2T−|τ |2T

RX(τ )e−j2πfτdτ

limT→∞ 12T

E|XTf |2 = limT→∞

∫ 2T−2T

2T−|τ |2T

RX(τ )e−j2πfτdτ

=∫ ∞−∞RX(τ )e−j2πfτdτ

The last equality holds if∫ ∞−∞ |RX(τ )|dτ < ∞, owing to the

dominated convergence theorem.

-

´´

´´

´´

´

QQ

QQ

QQ

QQQ

0

1

τ−2T 2T

2T−|τ |2T

Page 46: rp_gub_13_05

c©2011 by Taejeong Kim 46

Cross power spectral density

cross power spectral density, cpsd, SXY (f ) for jwss Xt

and Yt:

SXY (f ) :=

∑∞τ=−∞RXY (τ )e−j2πfτ disc-time

∫ ∞−∞RXY (τ )e−j2πfτdτ cont-time

Fourier inversion: RXY (τ ) =

∫ 1/2−1/2 SXY (f )ej2πfτdf

∫ ∞−∞ SXY (f )ej2πfτdf

The properties of the psd do not hold.

It indicates the power spectral density of the component thatcontributes to “linear” inter-dependence between Xt and Yt.

Why have we limited our discussion on psd and cpsd to wssand jwss processes?

Page 47: rp_gub_13_05

c©2011 by Taejeong Kim 47

Linear time-invariant system

linear time-invariant system with impulse response h(t)

- -h(t)Xt Yt

Yt =

∑k h(t− k)Xk =

∑k h(k)Xt−k disc-time

∫ ∞−∞ h(t− τ )Xτdτ =

∫ ∞−∞ h(τ )Xt−τdτ cont-time

We discuss continuous-time cases.

moments:

mY (t) =∫h(t− τ )mX(τ )dτ =

∫h(τ )mX(t− τ )dτ

RY (t, s) = E(∫h(u)Xt−udu)(

∫h(v)Xs−vdv)

=∫ ∫

h(u)h(v)EXt−uXs−vdudv

=∫ ∫

h(u)h(v)RX(t− u, s− v)dudv

Page 48: rp_gub_13_05

c©2011 by Taejeong Kim 48

RXY (t, s) = EXt∫h(u)Xs−udu

=∫h(u)EXtXs−udu

=∫h(u)RX(t, s− u)du

If Xt is wss,

mY (t) =∫h(τ )mXdτ ⇒ mY = mX

∫h(τ )dτ

RY (t, s) =∫ ∫

h(u)h(v)RX(t− s− u + v)dudv

⇒ RY (τ ) =∫ ∫

h(u)h(v)RX(τ − u + v)dudv

= RX(τ ) ∗ h(τ ) ∗ h(−τ ) = RX(τ ) ∗ rh(τ )

Yt is also wss.

SY (f ) = SX(f )|H(f )|2

Page 49: rp_gub_13_05

c©2011 by Taejeong Kim 49

Let h(t) be an ideal band-passfilter with the pass band

[f0 −∆f, f0 + ∆f ].

- -h(t)Xt Yt

-

6

f0

SX(f )

f0−f0

H(f )¡¡ª

-

6

f0

SY (f )

f0−f0

RY (0) = EY 2t

=∫SY (f )df

=∫ |H(f )|2SX(f )df

=∫ f0+∆f

f0−∆fSX(f )df

+∫ −f0+∆f

−f0−∆fSX(f )df

This justifies that SX(f ) isthe power spectral “density”:the power in a band is theintegral of SX(f ) in the band.

Page 50: rp_gub_13_05

c©2011 by Taejeong Kim 50

RXY (t, s) =∫h(u)RX(t− s + u)du

⇒ RXY (τ ) =∫h(u)RX(τ + u)du = RX(τ ) ∗ h(−τ )

Xt and Yt are jwss.

RY X(τ ) = RX(τ ) ∗ h(τ )

SXY (f ) = SX(f )H∗(f )

SY X(f ) = SX(f )H(f )

Page 51: rp_gub_13_05

c©2011 by Taejeong Kim 51

Convergence of random sequence

four convergence types of X1(ω), X2(ω), X3(ω), · · ·:1. mean square convergence

2. convergence in probability, convergence in measure

3. convergence in distribution

4. convergence with probability one, almost sure convergence

©©©*

HHHj-

almost sure

msin prob in dist

cf. convergence of a sequence functions: point-wise, uniform

Types of convergence of random sequences define types ofcontinuity of random processes in continuous time cases.

Page 52: rp_gub_13_05

c©2011 by Taejeong Kim 52

mean square (ms) convergence of Xn to X ,

Xnms−→ X : limn→∞E|Xn −X|2 = 0

mean square (ms) continuity of Xt at t:

lims→t E|Xs −Xt|2 = 0

ms continuity of Xt: ms continuity of Xt at all t

Xt is ms continuous at u ⇔ RX(t, s) is continuous at (u, u).

proof: “⇐”: Assume RX(t, s) is continuous at (u, u).

⇒ E(Xt −Xu)2 = RX(t, t)− 2RX(t, u) + RX(u, u)

= (RX(t, t)−RX(u, u))− 2(RX(t, u)−RX(u, u)) → 0

as t → u.

Page 53: rp_gub_13_05

c©2011 by Taejeong Kim 53

“⇒”: Assume Xt is ms continuous.

⇒ |RX(t, s)−RX(u, u)|≤ |RX(t, s)−RX(u, s)| + |RX(u, s)−RX(u, u)|= |E(Xt −Xu)Xs| + |EXu(Xs −Xu)|≤

E(Xt −Xu)2EX2s +

EX2uE(Xs −Xu)2 [Schwarz ineq]

Since EXt2 < ∞ for ms continuous Xt (see exercise 13-18,

Gubner), the proof is complete.

The proof also shows that Xt is ms continuous if and only ifRX(t, s) is continuous, ie, everywhere.

wss Xt is ms continuous at u.⇔ RX(τ ) is continuous at 0.⇔ RX(τ ) is continuous, ie, everywhere.⇔ wss Xt is ms continuous, ie, everywhere.

Page 54: rp_gub_13_05

c©2011 by Taejeong Kim 54

example: random telegraph process

convergence of Xn in probability to X , Xnpr−→ X :

∀ ε > 0, limn→∞P (|Xn −X| ≥ ε) = 0

example WLLN: Xi iid, Mn = 1n

∑ni=1 Xi

pr−→ EXi

ms convergence ⇒ convergence in probability

proof: By Markov inequality,

P (|Xn −X| ≥ ε) = P (|Xn −X|2 ≥ ε2) ≤ E|Xn −X|2/ε2

Xnpr−→ X and Yn

pr−→ Y

⇒ For any continuous function g, g(Xn, Yn)pr−→ g(X,Y ).

convergence of Xn in distribution to X , Xndist−→ X :

limn→∞FXn(x) = FX(x) for all continuity points of FX(x).

Page 55: rp_gub_13_05

c©2011 by Taejeong Kim 55

Convergence in distribution can be defined in terms of pmf,pdf, or chf.

example: central limit theorem:

For iid Xn with mean m and variance σ2,1√n

∑ni=1

Xi−m

σ

dist−→ X , where X ∼ N(0, 1).

convergence in prob ⇒ convergence in dist

proof:

FXn(x) = P (Xn ≤ x,X ≤ x + ε) + P (Xn ≤ x,X > x + ε)

≤ FX(x + ε) + P (Xn −X < −ε)

≤ FX(x + ε) + P (|Xn −X| ≥ ε)

⇒ lim supn→∞FXn(x) ≤ FX(x + ε)

Xn ≤ x©©*

X ≤ x + ε?

Page 56: rp_gub_13_05

c©2011 by Taejeong Kim 56

FX(x−ε) = P (X ≤ x−ε,Xn ≤ x)+P (X ≤ x−ε,Xn > x)≤ FXn(x) + P (X −Xn < −ε)

≤ FXn(x) + P (|Xn −X| ≥ ε)

⇒ FX(x− ε) ≤ lim infn→∞FXn(x)

Xn ≤ x©©*

X ≤ x− ε?

At each continuity point of FX(x), by letting ε→ 0,

FX(x) ≤ lim infn→∞FXn(x) ≤ lim supn→∞FXn(x) ≤ FX(x)

convergence in dist

⇔ For any bounded continuous g, limn→∞Eg(Xn) = Eg(X).

Page 57: rp_gub_13_05

c©2011 by Taejeong Kim 57

almost sure convergence, Xna.s.−→ X :

P (ω ∈ Ω : limn→∞Xn(ω) 6= X(ω)) = 0

almost sure convergence ⇒ convergence in prob

proof: (limn→∞ xn = x ⇔ ∀ ε > 0, ∃N > 0 such that

n ≥ N ⇒ |xn − x| < ε)

⇒ Xn → X = ∩ε>0 (∪∞N=1 (∩∞n=N |Xn −X| < ε))= ∩ε>0 (lim infn→∞|Xn −X| < ε)

⇒ Xn 6→ X = ∪ε>0 (∩∞N=1 (∪∞n=N |Xn −X| ≥ ε))= ∪ε>0 (lim supn→∞|Xn −X| ≥ ε)⊇ ∩ε>0 (lim supn→∞|Xn −X| ≥ ε)

Therefore Xna.s.−→ X implies that

∀ ε > 0, P (limN→∞(∪∞n=N |Xn −X| ≥ ε)) = 0.

Page 58: rp_gub_13_05

c©2011 by Taejeong Kim 58

∀ ε > 0,

0 = P (limN→∞(∪∞n=N |Xn −X| ≥ ε))= limN→∞P (∪∞n=N |Xn −X| ≥ ε) [continuity of P ]

≥ limn→∞P (|Xn −X| ≥ ε) ≥ 0

example:

strong law of large numbers (SLLN):

For iid Xn with finite mean m,

sample mean Mn := 1n

∑ni=1 Xi

a.s.−→ m

weak law of large numbers (WLLN):

For iid Xn with finite mean m,

sample mean Mn := 1n

∑ni=1 Xi

pr−→ m

Page 59: rp_gub_13_05

c©2011 by Taejeong Kim 59

example: a.s. convergence 6⇒ ms convergence

Let Ω = [0, 1] and assume uniform probability allocation f (ω).

X1(ω) = 1, ω ∈ [0, 1] X2(ω) =

2, ω ∈ [0, 1/2]0, else

X3(ω) =

4, ω ∈ [0, 1/4]0, else

X4(ω) =

8, ω ∈ [0, 1/8]0, else

· · ·

0 1

2

4

8

X1X2

X3

X4

Xna.s.−→ 0, but E|Xn −X|2 6→ 0, ie Xn

ms−→/ 0.

Page 60: rp_gub_13_05

c©2011 by Taejeong Kim 60

example: ms convergence 6⇒ a.s. convergenceLet Ω = [0, 1] and assume uniform probability allocation f (ω).

X1(ω) = 1, ω ∈ [0, 1]

X2(ω) =

1, ω ∈ [0, 1/2]0, else

, X3(ω) =

1, ω ∈ [1/2, 1]0, else

,

X4(ω) =

1, ω ∈ [0, 1/4]0, else

, X5(ω) =

1, ω ∈ [1/4, 1/2]0, else

,

X6(ω) =

1, ω ∈ [1/2, 3/4]0, else

, X7(ω) =

1, ω ∈ [3/4, 1]0, else

,

X8(ω) =

1, ω ∈ [0, 1/8]0, else

, X9(ω) =

1, ω ∈ [1/8, 1/4]0, else

,

X10(ω) =

1, ω ∈ [1/4, 3/8]0, else

, X11(ω) =

1, ω ∈ [3/8, 1/2]0, else

,

X12(ω) =

1, ω ∈ [1/2, 5/8]0, else

, · · ·

Page 61: rp_gub_13_05

c©2011 by Taejeong Kim 61

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

X21

· · · · · ·

Xnms−→ 0, but Xn

a.s.−→/ 0.

Page 62: rp_gub_13_05

c©2011 by Taejeong Kim 62

Convergence theorems for expectation

monotone convergence theorem:

0 ≤ Xn ↑ X almost surely (a.s.) ⇒ EXn ↑ EX

proof: Choose simple functions Xkn’s such that

0 ≤ Xkn ↑ Xk as n →∞.

Let Yn = maxk≤n Xkn.

Yn are simple andnon-decreasing.

X11 X12 X13 X14 · · · → X1

X21 X22 X23 X24 · · · → X2

X31 X32 X33 X34 · · · → X3

X41 X42 X43 X44 · · · → X4

qqq

qqq

qqq

qqq ↓

Y1 Y2 Y3 Y4 · · · X

For k ≤ n, Xkn ≤ Yn ≤ maxk≤nXk = Xn (1)

Page 63: rp_gub_13_05

c©2011 by Taejeong Kim 63

For k ≤ n, Xkn ≤ Yn ≤ Xn (1)

⇒ Xk ≤ lim Yn ≤ X [n →∞]

⇒ X ≤ lim Yn ≤ X [k →∞]

⇒ lim Yn = X

Since Yn are simple and non-decreasing,

⇒ lim EYn = EX . (2) [def of Lebesgue integral of X ]

(1) ⇒ EXkn ≤ EYn ≤ EXn.

⇒ limn EXkn ≤ lim EYn ≤ lim EXn

Since Xkn’s are simple and non-dec

⇒ EXk ≤ lim EYn ≤ lim EXn [def of Lebesgue int of Xk]

⇒ limk EXk ≤ lim EYn ≤ lim EXn [k →∞]

⇒ lim EYn = lim EXn. Compare this and (2).

Page 64: rp_gub_13_05

c©2011 by Taejeong Kim 64

example: Monotone convergence theorem does not apply (notmonotonic).

0 1

2

4

8

X1X2

X3

X4

For uniform prob allocation, Xn → X = 0 a.s.

But lim EXn = lim 1 = 1 6= EX = 0.

Page 65: rp_gub_13_05

c©2011 by Taejeong Kim 65

Fatou-Lebesgue theorem:

For some U and V such that EU and EV exist,

U ≤ Xn ⇒ E lim inf Xn ≤ lim inf EXn;

Xn ≤ V ⇒ lim sup EXn ≤ E lim sup Xn.

proof: Let Xn be non-negative. (This case is called Fatou’slemma; U = 0.)

Let Yn = infk≥n Xk.

⇒ Yn ≤ Xn, EYn ≤ EXn, lim inf EYn ≤ lim inf EXn (1)

⇒ Yn ↑ lim inf Xn [lim inf Xn = supn≥0(infk≥n Xk) = limn→0(infk≥n Xk)]

⇒ EYn ↑ E lim inf Xn [monotone convergence theorem]

⇒ lim inf EYn = lim EYn = E lim inf Xn ≤ lim inf EXn [(1)]

For general Xn, we apply this to Xn − U .

Page 66: rp_gub_13_05

c©2011 by Taejeong Kim 66

The proof for limsup and Xn ≤ 0 is similar, and for generalXn, we apply the same to V −Xn.

dominated convergence theorem:

If |Xn| ≤ U , EU exists, and Xn→X a.s., then EXn→EX .

proof: Xna.s.−→ X

⇒ E lim sup Xn = E lim Xn = E lim inf Xn

⇒ lim sup EXn ≤ E lim sup Xn [F-L theorem]

= E lim Xn

= E lim inf Xn ≤ lim inf EXn [F-L theorem]

⇒ lim sup EXn = lim inf EXn = lim EXn = E lim Xn

[lim inf EXn ≤ lim sup EXn]

Page 67: rp_gub_13_05

c©2011 by Taejeong Kim 67

example: Dominated convergence theorem does not apply.

0 1

2

4

8

X1X2

X3

X4

For uniform prob allocation, Xn → X = 0 a.s.

But lim EXn = lim 1 = 1 6= EX = 0.

If we let U = max (Xi, i = 1, 2, · · ·), we have |Xn| ≤ U , butEU = ∞.

These theorems apply not only to expectation but also to otherintegrations. See the proof of Wiener-Khinchin theorem.