rp_gub_13_05

c©2011 by Taejeong Kim 1

Random process

We would like to extend random vectors to infinite dimensions.That is, we would like to mathematically describe an infinitenumber of random variables simultaneously, eg, infinite trialsof tossing a die.

Or, we would like to mathematically describe a function orsignal that is random or not specific, eg, thermal noise.

• random process Xt(ω), t ∈ I :

1. random sequence, random function, or random signal:

Xt : Ω → the set of all sequences or functions

2. indexed family of infinite number of random variables:

Xt : I → set of all random variables defined on Ω

3. Xt : Ω× I → IR


s s

ss

ss

6 5

41

23

Ω

Xt(1)

Xt(2)

Xt(3)

t0

-

t

qqq

R

j

:

s s

ss

ss

6 5

41

23

Ω

6 6 60

UUUXt0

qqq

Xt0(1) Xt0(2)Xt0(3)


For a fixed t, Xt(ω) is a random variable.

For a fixed ω, Xt(ω) is a deterministic function of t,which is called a sample path or sample function.

What is random here?How many Ωs can there be?What is the result of carrying out the random experiment?What is measurable in these mappings?

example:

surface temperature of a space shuttle

thermal noise of a semiconductor device

total number of customers visiting a store up to time t

sequence of iid Bernoulli random variables: Bernoulli process


types of random processes

1. discrete-time: t = · · · , −1, 0, 1, 2, · · ·2. continuous-time: t ∈ IR

3. disc-valued: For a fixed t, Xt is a disc random variable.

4. cont-valued: For a fixed t, Xt is a cont random variable.

discrete-time, discrete-valued - Bernoulli process

- t-

-

discrete-time, continuous-valued - iid Gaussian process

- t-

-


continuous-time, discrete-valued - Poisson process

- t

-

-

-

-

continuous-time, continuous-valued - see page 2


example: Xt = A cos 2πt, A ∼ unif(-1,1)cont time, cont valued

X0 = A, X1/8 =√

22

A, X1/4 = 0, X1/2 = −A

-

t-

t

example: Xt = cos 2π(t + Θ), Θ ∼ unif(0,1)cont time, cont valued

X0 = cos(2πΘ), X1/8 = cos(2πΘ + π4),

X1/4 = cos(2πΘ + π2), X1/2 = cos(2πΘ + π)


example: Xn: iid N(0, 1)disc time, cont valued

fXiXjXk(xi, xj, xk) = 1

(2π)3/2e−

x2i+x2

j+x2k

2

A random process Xt is completely characterized if any of thefollowing is known.

1. P ((Xt1, · · · , Xtk

) ∈ B) for any B, k, and t1, · · · , tk2. pXt1

···Xtk(x1, · · · , xk) for any k and t1, · · · , tk

3. fXt1···Xtk

(x1, · · · , xk) for any k and t1, · · · , tk4. FXt1

···Xtk(x1, · · · , xk) for any k and t1, · · · , tk

5. ϕXt1···Xtk

(u1, · · · , uk) for any k and t1, · · · , tk


Note that given a random process, only “finite-dimensional”probabilities or probability functions can be specified. Thus,for a continuous-time Xt,

P (|Xt| ≤ 1,∀ t ∈ t1, t2, · · · , tk) is defined.

P (|Xt| ≤ 1,∀ t ∈ [0, 1]) is not defined.

-

t-

t

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

ppppppppppppppp

Conversely, a family of finite dimensional cdfs determines arandom process if the family is “consistent”.


A family of finite dimensional cdfs Ft1··· tk(a1, · · · , ak) is saidto be consistent if the following hold. [Kolmogorov]

1. The cdfs are invariant under index permutation.

2. The cdfs satisfy the dimension reduction rule.

Condition 1 means, for example,

Ft1t2 t3··· tk(a1, a2, a3, · · · , ak) = Ft2 t1t3··· tk(a2, a1, a3, · · · , ak)

Condition 2 means, for example,

Ft1t2(a1, a2) = Ft1t2 t3··· tk(a1, a2,∞, · · · ,∞)


• Two random processes Xt and Yt are defined on the samesample space Ω as a natural extension of a random process.

Two random processes Xt and Yt are completely character-ized if any of the following is known.

1. P ((Xt1· · ·Xtk

, Ys1· · ·Ysl

) ∈ B)

2. pXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

3. fXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

4. FXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

5. ϕXt1···Xtk

Ys1···Ysl(u1, · · · , uk, v1, · · · , vl)

The idea extends to multiple random processes.


independent processes Xt and Yt:

(Xt1· · ·Xtk

) and (Ys1· · ·Ysl

) are independent

for any k, l, t1, · · · , tk, and s1, · · · , sl.

Equivalently,

1. P ((Xt1· · ·Xtk

) ∈ BX, (Ys1· · ·Ysl

) ∈ BY )

= P ((Xt1· · ·Xtk

) ∈ BX)P ((Ys1· · ·Ysl

) ∈ BY )

2. pXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= pXt1···Xtk

(x1, · · · , xk)pYs1···Ysl(y1, · · · , yl)

3. fXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= fXt1···Xtk

(x1, · · · , xk)fYs1···Ysl(y1, · · · , yl)

4. FXt1···Xtk

Ys1···Ysl(x1, · · · , xk, y1, · · · , yl)

= FXt1···Xtk

(x1, · · · , xk)FYs1···Ysl(y1, · · · , yl)


5. ϕXt1···Xtk

Ys1···Ysl(u1, · · · , uk, v1, · · · , vl)

= ϕXt1···Xtk

(u1, · · · , uk)ϕYs1···Ysl(v1, · · · , vl)

The idea extends to multiple random processes.

example: independent random processes

Xt = A cos 2πt, Yt = B cos 2πt, where

A and B are independent random variables.

Xt = cos 2π(t + Θ), Yt = cos 2π(t + Ψ), where

Θ, Ψ ∼ unif(0,1) and independent random variables.

Xn, Yn: iid N(0, 1)

fXiXjYk(xi, xj, yk) = 1

(2π)3/2e−

x2i+x2

j+y2k

2


Moment

mean function:

mX(t) := EXt =

∑x xpXt(x), disc valued

∫xfXt(x)dx, cont valued

auto-correlation function, acf :

RX(t, s) := EXtXs =

∑u

∑v uvpXtXs(u, v), disc valued

∫ ∫uvfXtXs(u, v)dudv, cont valued

X = (Xt1, · · · , Xtk

) ⇒ RX =

RX(t1, t1) · · · RX(t1, tk)... ...

RX(tk, t1) · · · RX(tk, tk)

,

where RX(ti, ti) = EX2ti.


auto-covariance function, acvf :

CX(t, s) := E(Xt −mX(t))(Xs −mX(s))

= RX(t, s)−mX(t)mX(s)

X = (Xt1, · · · , Xtk

) ⇒ CX =

CX(t1, t1) · · · CX(t1, tk)... ...

CX(tk, t1) · · · CX(tk, tk)

,

where CX(ti, ti) = var(Xti).

cross-correlation function, ccf :

RXY (t, s) := EXtYs =

∑u

∑v uvpXtYs(u, v), jointly disc

∫ ∫uvfXtYs(u, v)dudv, jointly cont


cross-covariance function, ccvf :

CXY (t, s) := E(Xt −mX(t))(Ys −mY (s))

= RXY (t, s)−mX(t)mY (s)

Note that these functions are discrete-time functions for discrete-time random processes (t and s are integers) and continuous-time functions for continuous-time random processes (t and sare real numbers).

uncorrelated processes Xt and Yt: EXtYs = EXtEYs

for any t and s.

RXY (t, s) = mX(t)mY (s)

CXY (t, s) = 0

independent ⇒ uncorrelateduncorrelated 6⇒ independent (except j Gaussian processes)


covariance matrix of (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) =

C1 OO C2

orthogonal processes Xt and Yt: EXtYs = 0

for any t and s.

RXY (t, s) = 0

correlation matrix of (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) =

R1 OO R2

example: Form two Bernoulli processes, taking values 0 and 1,by independent Bernoulli trials. Then they are independent,uncorrelated, but not orthogonal.

If you instead similarly form modified Bernoulli processes, tak-ing values ±1, they additionally become orthogonal.


example: Ω = ω1, ω2, P (ω1) = P (ω2) = 12

Xt(ω1) = cos t, Xt(ω2) = sin t

Yt(ω1) = 1, Yt(ω2) = −1

Ωuω1

uω2

Xt(ω1) -

0 t

Xt(ω2) -

0 t

Yt(ω1) -

0 t

Yt(ω2) -

0 t

pmf varies in time ⇒ Computing expectation is easier on Ω.

mX(t) = EXt = 12cos t + 1

2sin t =

√2

2cos(t− π

4)

mY (t) = EYt = 12· 1 + 1

2· (−1) = 0


RX(t, s) = EXtXs = 12cos t cos s + 1

2sin t sin s = 1

2cos(t− s)

RY (t, s) = EYtYs = 12· 1 · 1 + 1

2· (−1) · (−1) = 1

RXY (t, s) = EXtYs = 12cos t− 1

2sin t =

√2

2cos(t + π

4)

Xt and Yt are dependent, correlated, and not orthogonal.

example: Ω = ω1, ω2, ω3, ω4, P (ωi) = 14, i = 1, 2, 3, 4

Xt(ω1) = Xt(ω2) = cos t, Xt(ω3) = Xt(ω4) = − cos t

Yt(ω1) = Yt(ω3) = sin t, Yt(ω2) = Yt(ω4) = − sin t


Ω

uω1

uω2

uω3

uω4

Xt(ω1)

Xt(ω2)-

0 t

Xt(ω3)

Xt(ω4)-

0 t

Yt(ω1)

Yt(ω3)-

0 t

Yt(ω2)

Yt(ω4)-

0 t

mX(t) = mY (t) = 0

RX(t, s) = EXtXs = 2 · 14cos t cos s + 2 · 1

4(− cos t)(− cos s)

= cos t cos s

RXY (t, s) = EXtYs

= 14cos t sin s+1

4cos t(− sin s)+1

4(− cos t) sin s+1

4(− cos t)(− sin s)

= 0


Xt and Yt are independent, uncorrelated, and orthogonal.

Xt = U cos t, Yt = V sin t, where U and V are independentBer(1/2) with values +1 and −1.

example:

Xt = cos 2π(t + Θ), Yt = sin 2π(t + Θ), Θ ∼ unif(0,1)

-

t

Xt

Yt

Θ = 0

-

t

Xt

Yt

Θ = 1/16

-

t

Xt

Yt

Θ = 13/16

mX(t) = EXt =∫xfXt(x)dx =

∫ 10 cos 2π(t + θ)dθ = 0

mY (t) = EYt =∫yfYt(y)dy =

∫ 10 sin 2π(t + θ)dθ = 0


RX(t, s) = E cos 2π(t + Θ) cos 2π(s + Θ)

= E12[cos 2π(t− s) + cos 2π(t + s + 2Θ)]

= 12cos 2π(t− s) + 1

2

∫ 10 cos 2π(t + s + 2θ)dθ

= 12cos 2π(t− s)

= CX(t, s)

RXY (t, s) = E cos 2π(t + Θ) sin 2π(s + Θ)

= E12[− sin 2π(t− s) + sin 2π(t + s + 2Θ)]

= −12sin 2π(t− s) + 1

2

∫ 10 sin 2π(t + s + 2θ)dθ

= −12sin 2π(t− s)

= CXY (t, s)

Xt and Yt are dependent, correlated, and not orthogonal.


In this last example the mean function, acf, acvf, ccf, andccvf are all shift-invariant, ie,

mX(t) = mX(t + τ ),

RX(t, s) = RX(t + τ, s + τ ),

CX(t, s) = CX(t + τ, s + τ ),

RY (t, s) = RY (t + τ, s + τ ),

CY (t, s) = CY (t + τ, s + τ ),

RXY (t, s) = RXY (t + τ, s + τ ),

CXY (t, s) = CXY (t + τ, s + τ ).

Shift invariance is generally called stationarity.


Stationarity

(strict-sense) stationary, sss, Xt:

P ((Xt1+τ , · · · , Xtk+τ) ∈ B) = P ((Xt1, · · · , Xtk

) ∈ B)

Equivalently, shift-invariance of jpmf, jpdf, jcdf, or jchf alsodefines sss. That is,

pXt1+τ ···Xtk+τ (x1, · · · , xk) = pXt1···Xtk

(x1, · · · , xk)

fXt1+τ ···Xtk+τ (x1, · · · , xk) = fXt1···Xtk

(x1, · · · , xk)

FXt1+τ ···Xtk+τ (x1, · · · , xk) = FXt1···Xtk

(x1, · · · , xk)

ϕXt1+τ ···Xtk+τ (u1, · · · , uk) = ϕXt1···Xtk

(u1, · · · , uk).

(Xt1, · · · , Xtk

) and (Xt1+τ , · · · , Xtk+τ) are identical.


If Xt is sss,

mX(t + τ ) =

∑x xpXt+τ (x) =

∑x xpXt(x)

∫xfXt+τ (x)dx =

∫xfXt(x)dx

= mX(t)

RX(t + τ, s + τ ) = RX(t, s)

CX(t + τ, s + τ ) = CX(t, s)

Eg(Xt1+τ , · · · , Xtk+τ) = Eg(Xt1, · · · , Xtk

)

example: cos 2π(t + Θ), Θ ∼ unif(0, 1), Bernoulli rp, and anyiid rp are stationary, but A cos 2πt with random A is not.

wide-sense stationary, wss, Xt:

1. mX(t + τ ) = mX(t)

2. RX(t + τ, s + τ ) = RX(t, s)

Only the first and second moments are shift invariant.


If Xt is wss,

mX(t) = mX

RX(t, s) = RX(t− s) or RX(τ ) = EXt+τXt

CX(t, s) = CX(t− s) or CX(τ ) = RX(τ )−mX2

sss ⇒ wss; wss 6⇒ sss (except Gaussian processes)

example: U = ±1 equiprobable, V =

−√2, prob 1/3√2/2, prob 2/3

U and V are independent.

EU = EV = 0, EU 2 = EV 2 = 1

Xt =

U, t is oddV, t is even

⇒ Xt is wss but not sss.


example: random telegraph process

P (Xt = 1) = P (Xt = −1) = 1/2

The probability that k traversals occur in a time interval of

length τ is (λτ )k

k!e−λτ .

-

λ = 1

t-

λ = 1/2

t

RX(τ )

-

6

τ

1 RX(τ )

-

6

τ

1

mX(t) = 1 · P (Xt = 1) + (−1) · P (Xt = −1) = 0


EXt+τXt (for τ ≥ 0)

= 1 · 1 · P (Xt+τ = 1, Xt = 1) + 1 · (−1) · P (Xt+τ = 1, Xt = −1)

+(−1) ·1 ·P (Xt+τ = −1, Xt = 1)+(−1) · (−1) ·P (Xt+τ = −1, Xt = −1)

= P (Xt+τ = 1|Xt = 1) · 12 − P (Xt+τ = 1|Xt = −1) · 1

2

−P (Xt+τ = −1|Xt = 1) · 12 + P (Xt+τ = −1|Xt = −1) · 1

2

= P (even traversals in [t, t + τ ])− P (odd traversals in [t, t + τ ])

=∑

k even(λτ )k

k!e−λτ − ∑

k odd(λτ )k

k!e−λτ

= 12

∑∞k=0

(λτ )k+(−λτ )k

k!e−λτ − 1

2∑∞

k=0(λτ )k−(−λτ )k

k!e−λτ

= 12(eλτ + e−λτ)e−λτ − 1

2(eλτ − e−λτ)e−λτ = e−2λτ

⇒ RX(τ ) = e−2λ|τ |

⇒ Xt is wss.


(strict-sense) cyclostationary Xt with period T :

(Xt1, · · · , Xtk

) and (Xt1+mT , · · · , Xtk+mT ) are identical

for any integer m.

When a stationary process is processed in blocks of size T ,eg, DFT, the result is cyclostationary with period T .

(strict-sense) asymptotically stationary Xt:

(XT+t1, · · · , XT+tk

) and (XT+t1+τ , · · · , XT+tk+τ) become

identical as T →∞.

example: homogeneous Markov chain

Cyclostationarity and asymptotic stationarity are also definedin wide-sense.


jointly (strict-sense) stationary, jsss, Xt and Yt:

P ((Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ) ∈ B)

= P ((Xt1, · · · , Xtk

, Ys1, · · · , Ysl

) ∈ B)

Equivalently, shift-invariance of jpmf, jpdf, jcdf, or jchf alsodefines jsss. That is,

pXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= pXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)

fXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= fXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)

FXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (x1, · · · , xk, y1, · · · , yl)

= FXt1,···,Xtk

Ys1,···,Ysl(x1, · · · , xk, y1, · · · , yl)


ϕXt1+τ ···Xtk+τYs1+τ ···Ysl+τ (u1, · · · , uk, v1, · · · , vl)

= ϕXt1,···,Xtk

Ys1,···,Ysl(u1, · · · , uk, v1, · · · , vl).

Also equivalently, we can say (Xt1, · · · , Xtk

, Ys1, · · · , Ysl

)

and (Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ) are identical.

If Xt and Yt are jsss,

Each is sss.

RXY (t + τ, s + τ ) = RXY (t, s)

CXY (t + τ, s + τ ) = CXY (t, s)

Eg(Xt1+τ , · · · , Xtk+τ , Ys1+τ , · · · , Ysl+τ)

= Eg(Xt1, · · · , Xtk

, Ys1, · · · , Ysl

)


example: Consider two iid Bernoulli process Xt and Yt, inde-

pendent of each other, and let Zt =

Xt, t oddYt, t even

.

Then all three processes are individually stationary, but Xt andZt are not jointly stationary; neither are Yt and Zt.

jointly wide-sense stationary, jwss, Xt and Yt:

1. Xt and Yt are each wss.

2. RXY (t + τ, s + τ ) = RXY (t, s)

If Xt and Yt are jwss,

RXY (t, s) = RXY (t− s) or RXY (τ ) = EXt+τYt

CXY (t, s) = CXY (t− s) or CXY (τ ) = RXY (τ )−mXmY

jsss ⇒ jwss; jwss 6⇒ jsss (except jointly Gaussian processes)


Joint cyclostationarity and joint asymptotic stationarity, bothin strict-sense and in wide-sense, are also defined.

example:

Xt = cos 2π(t + Θ), Yt = sin 2π(t + Θ), Θ ∼ unif(0,1)

Xt and Yt are jsss, and therefore jwss. RX(τ )? RXY (τ )?

-

t

Xt

Yt

Θ = 0

-

t

Xt

Yt

Θ = 1/16

-

t

Xt

Yt

Θ = 13/16

Also, two independent Bernoulli processes are jsss, and there-fore jwss. What are the acf and ccf?


Properties of acf

properties of RX(τ ) of a wss process:

1. second moment, total average power: RX(0) = EXt2

2. even symmetry: RX(τ ) = RX(−τ )

3. |RX(τ )| ≤ RX(0) = EXt2

proof: Schwarz inequality.

4. sample path behavior:

P (|Xt+τ −Xt| ≥ ε) ≤ 2ε2(RX(0)−RX(τ ))

proof: Square the arguments and use Markov inequality.



P (Xt = 1) = P (Xt = −1) = 1/2

The probability that k traversals occur in a time interval of

length τ is (λτ )k

k!e−λτ .

-

λ = 1

t-

λ = 1/2

t

RX(τ )

-

6

τ

1 RX(τ )

-

6

τ

16?

6?


5. RX(T ) = RX(0) for some T > 0⇒ RX(τ ) is periodic with period T .

proof: [E(Xt+τ+T −Xt+τ)Xt]2 ≤ E(Xt+τ+T −Xt+τ)

2EX2t

= E(X2t+τ+T + X2

t+τ − 2Xt+τ+TXt+τ)EX2t

⇒ [RX(τ + T )−RX(τ )]2 ≤ 2RX(0)[RX(0)−RX(T )]

example: Xt = cos 2π(t + Θ), Θ ∼ unif(0,1)

mX(t) = 0, RX(τ ) = cos 2πτ = CX(τ )

6. For a continuous-time process,RX(τ ) is continuous at τ = 0.⇒ RX(τ ) is continuous everywhere.

proof: Let T → 0 in the proof of 5.

7. inverse FT of non-negative even function (psd, see below)


example: Are these acfs?

-

6

τ0-

6

τ0-

6

τ0

-

6

τ0-

6

τ0-

6

τ0

-

6

τ0

-

6

τ0-

6

τ0


Power spectral density

The power spectral density SX(f ) is the function describing howthe power of Xt is distributed over the frequency axis.

⇒ the power in [f, f + ∆f ) = SX(f )∆f

power spectral density, psd, SX(f ) for wss Xt:

SX(f ) :=

∑∞τ=−∞RX(τ )e−j2πfτ disc-time

∫ ∞−∞RX(τ )e−j2πfτdτ cont-time

Why does the Fourier transform of the acf come to be thepsd in the above sense? → Wiener-Kinchin theorem

Fourier inversion: RX(τ ) =

∫ 1/2−1/2 SX(f )ej2πfτdf disc-time

∫ ∞−∞ SX(f )ej2πfτdf cont-time

real symmetric RX(τ ) ⇒ real symmetric SX(f )


non-negative definite RX(τ ) ⇒ SX(f ) ≥ 0

Any real, nonnegative, even function can be a psd, and theinverse FT of any such function can be a acf.

non-negative definite function R(t, s):

∀(t1, · · · , tk), the k×k matrix with the (i,j)th element R(ti, tj)is non-negative definite.

Also called positive semidefinite.

positive definite function R(t, s): R(ti, tj) formpositive definite matrix.

average power of Xt:

limT→∞ 12T+1

∑Tt=−T Xt

2 disc-time

limT→∞ 12T

∫ T−T Xt

2dt cont-time

This is a random variable. → Expectation is needed


expected average power:

PX :=

E limT→∞ 12T+1

∑Tt=−T Xt

2

E limT→∞ 12T

∫ T−T Xt

2dt

=

limT→∞ 12T+1

∑Tt=−T EXt

2

limT→∞ 12T

∫ T−T EXt

2dt

= EXt2 = RX(0) =

∫ 1/2−1/2 SX(f )df

∫ ∞−∞ SX(f )df

for wss Xt.

The last equality is due to Wiener-Khinchin theorem.

This shows that the area under the graph of SX(f ) representsthe total expected average power of Xt, but it is not enoughto justify its being the spectral “density”.→ linear time-invariant system, narrow band-pass filter


Wiener-Khinchin theorem

We discuss continuous-time processes.The discussion goes parallel for discrete-time processes.

deterministic finite-energy signal xt:

xf =∫ ∞−∞ xte

−j2πftdt and xt =∫ ∞−∞ xfe

j2πftdf

time acf rx(τ ) of xt: rx(τ ) =∫ ∞−∞ xt+τxtdt

energy spectral density: |xf |2 =∫ ∞−∞ rx(τ )e−j2πfτdτ

rx(τ ) =∫ |xf |2ej2πfτdf [inverse FT]

rx(0) =∫ |xf |2df =

∫xt

2dt: energy of xt, Parseval’s relation


Let h(t) be an ideal band-passfilter with the pass band

[f0 −∆f, f0 + ∆f ].

- -h(t)xt yt

-

6

f0

|xf |2

f0−f0

H(f )¡¡ª

-

6

f0

|yf |2

f0−f0

ry(0) =∫yt

2dt

=∫ |yf |2df

=∫ |H(f )|2|xf |2df

=∫ f0+∆f

f0−∆f|xf |2df

+∫ −f0+∆f

−f0−∆f|xf |2df

This justifies that |xf |2 isthe energy spectral “density”:the energy in a band is theintegral of |xf |2 in the band.


Random processes have infinite energy.

⇒ XTt :=

Xt, |t| ≤ T0, else

: truncated for finite energy

XTf =

∫ ∞−∞XT

t e−j2πftdt

energy spectral density of XTt : |XT

f |2energy of XT

t :∫ ∞−∞ (XT

t )2dt =∫ T−T (XT

t )2dt =∫ T−T Xt

2dt

=∫ ∞−∞ |XT

f |2dfaverage power of XT

t : 12T

∫ T−T Xt

2dt = 12T

∫ ∞−∞ |XT

f |2df=

∫ ∞−∞

12T|XT

f |2dfperiodogram, an approximation to the psd: 1

2T|XT

f |2But this is a random variable for each f , ie, a random processwith the parameter f .


expected average power of XTt :

12T

∫ T−T EXt

2dt =∫ ∞−∞

12T

E|XTf |2df

PX = limT→∞ 12T

∫ T−T EXt

2dt

= limT→∞∫ ∞−∞

12T

E|XTf |2df

=∫ ∞−∞ (limT→∞ 1

2TE|XT

f |2)df


•Wiener-Khinchin theorem: limT→∞ 12T

E|XTf |2 = SX(f )

proof: 12T

E|XTf |2 = 1

2TE|∫ XT

t e−j2πftdt|2

= 12T

E(∫ T−T Xte

−j2πftdt)(∫ T−T Xse

−j2πfsds)∗

= 12T

∫ T−T

∫ T−T EXtXse

−j2πf (t−s)dtds

= 12T

∫ T−T

∫ T−T RX(t− s)e−j2πf (t−s)dtds

-

6

t

s

- -

6@

@@

@@

@

@@

@@

@@

τ

s

(t, s)→(τ, s), τ = t− s

= 12T

∫ 0−2T

∫ T−T−τ RX(τ )e−j2πfτdsdτ

+ 12T

∫ 2T0

∫ T−τ−T RX(τ )e−j2πfτdsdτ


=∫ 2T−2T

2T−|τ |2T

RX(τ )e−j2πfτdτ

limT→∞ 12T

E|XTf |2 = limT→∞

∫ 2T−2T

2T−|τ |2T

RX(τ )e−j2πfτdτ

=∫ ∞−∞RX(τ )e−j2πfτdτ

The last equality holds if∫ ∞−∞ |RX(τ )|dτ < ∞, owing to the

dominated convergence theorem.

-

6´

´´

´´

´´

´

QQ

QQ

QQ

QQQ

0

1

τ−2T 2T

2T−|τ |2T


Cross power spectral density

cross power spectral density, cpsd, SXY (f ) for jwss Xt

and Yt:

SXY (f ) :=

∑∞τ=−∞RXY (τ )e−j2πfτ disc-time

∫ ∞−∞RXY (τ )e−j2πfτdτ cont-time

Fourier inversion: RXY (τ ) =

∫ 1/2−1/2 SXY (f )ej2πfτdf

∫ ∞−∞ SXY (f )ej2πfτdf

The properties of the psd do not hold.

It indicates the power spectral density of the component thatcontributes to “linear” inter-dependence between Xt and Yt.

Why have we limited our discussion on psd and cpsd to wssand jwss processes?


Linear time-invariant system

linear time-invariant system with impulse response h(t)

- -h(t)Xt Yt

Yt =

∑k h(t− k)Xk =

∑k h(k)Xt−k disc-time

∫ ∞−∞ h(t− τ )Xτdτ =

∫ ∞−∞ h(τ )Xt−τdτ cont-time

We discuss continuous-time cases.

moments:

mY (t) =∫h(t− τ )mX(τ )dτ =

∫h(τ )mX(t− τ )dτ

RY (t, s) = E(∫h(u)Xt−udu)(

∫h(v)Xs−vdv)

=∫ ∫

h(u)h(v)EXt−uXs−vdudv

=∫ ∫

h(u)h(v)RX(t− u, s− v)dudv


RXY (t, s) = EXt∫h(u)Xs−udu

=∫h(u)EXtXs−udu

=∫h(u)RX(t, s− u)du

If Xt is wss,

mY (t) =∫h(τ )mXdτ ⇒ mY = mX

∫h(τ )dτ

RY (t, s) =∫ ∫

h(u)h(v)RX(t− s− u + v)dudv

⇒ RY (τ ) =∫ ∫

h(u)h(v)RX(τ − u + v)dudv

= RX(τ ) ∗ h(τ ) ∗ h(−τ ) = RX(τ ) ∗ rh(τ )

Yt is also wss.

SY (f ) = SX(f )|H(f )|2


Let h(t) be an ideal band-passfilter with the pass band

[f0 −∆f, f0 + ∆f ].

- -h(t)Xt Yt

-

6

f0

SX(f )

f0−f0

H(f )¡¡ª

-

6

f0

SY (f )

f0−f0

RY (0) = EY 2t

=∫SY (f )df

=∫ |H(f )|2SX(f )df

=∫ f0+∆f

f0−∆fSX(f )df

+∫ −f0+∆f

−f0−∆fSX(f )df

This justifies that SX(f ) isthe power spectral “density”:the power in a band is theintegral of SX(f ) in the band.


RXY (t, s) =∫h(u)RX(t− s + u)du

⇒ RXY (τ ) =∫h(u)RX(τ + u)du = RX(τ ) ∗ h(−τ )

Xt and Yt are jwss.

RY X(τ ) = RX(τ ) ∗ h(τ )

SXY (f ) = SX(f )H∗(f )

SY X(f ) = SX(f )H(f )


Convergence of random sequence

four convergence types of X1(ω), X2(ω), X3(ω), · · ·:1. mean square convergence

2. convergence in probability, convergence in measure

3. convergence in distribution

4. convergence with probability one, almost sure convergence

©©©*

HHHj-

almost sure

msin prob in dist

cf. convergence of a sequence functions: point-wise, uniform

Types of convergence of random sequences define types ofcontinuity of random processes in continuous time cases.


mean square (ms) convergence of Xn to X ,

Xnms−→ X : limn→∞E|Xn −X|2 = 0

mean square (ms) continuity of Xt at t:

lims→t E|Xs −Xt|2 = 0

ms continuity of Xt: ms continuity of Xt at all t

Xt is ms continuous at u ⇔ RX(t, s) is continuous at (u, u).

proof: “⇐”: Assume RX(t, s) is continuous at (u, u).

⇒ E(Xt −Xu)2 = RX(t, t)− 2RX(t, u) + RX(u, u)

= (RX(t, t)−RX(u, u))− 2(RX(t, u)−RX(u, u)) → 0

as t → u.


“⇒”: Assume Xt is ms continuous.

⇒ |RX(t, s)−RX(u, u)|≤ |RX(t, s)−RX(u, s)| + |RX(u, s)−RX(u, u)|= |E(Xt −Xu)Xs| + |EXu(Xs −Xu)|≤

√

E(Xt −Xu)2EX2s +

√

EX2uE(Xs −Xu)2 [Schwarz ineq]

Since EXt2 < ∞ for ms continuous Xt (see exercise 13-18,

Gubner), the proof is complete.

The proof also shows that Xt is ms continuous if and only ifRX(t, s) is continuous, ie, everywhere.

wss Xt is ms continuous at u.⇔ RX(τ ) is continuous at 0.⇔ RX(τ ) is continuous, ie, everywhere.⇔ wss Xt is ms continuous, ie, everywhere.



convergence of Xn in probability to X , Xnpr−→ X :

∀ ε > 0, limn→∞P (|Xn −X| ≥ ε) = 0

example WLLN: Xi iid, Mn = 1n

∑ni=1 Xi

pr−→ EXi

ms convergence ⇒ convergence in probability

proof: By Markov inequality,

P (|Xn −X| ≥ ε) = P (|Xn −X|2 ≥ ε2) ≤ E|Xn −X|2/ε2

Xnpr−→ X and Yn

pr−→ Y

⇒ For any continuous function g, g(Xn, Yn)pr−→ g(X,Y ).

convergence of Xn in distribution to X , Xndist−→ X :

limn→∞FXn(x) = FX(x) for all continuity points of FX(x).


Convergence in distribution can be defined in terms of pmf,pdf, or chf.

example: central limit theorem:

For iid Xn with mean m and variance σ2,1√n

∑ni=1

Xi−m

σ

dist−→ X , where X ∼ N(0, 1).

convergence in prob ⇒ convergence in dist

proof:

FXn(x) = P (Xn ≤ x,X ≤ x + ε) + P (Xn ≤ x,X > x + ε)

≤ FX(x + ε) + P (Xn −X < −ε)

≤ FX(x + ε) + P (|Xn −X| ≥ ε)

⇒ lim supn→∞FXn(x) ≤ FX(x + ε)

Xn ≤ x©©*

X ≤ x + ε?


FX(x−ε) = P (X ≤ x−ε,Xn ≤ x)+P (X ≤ x−ε,Xn > x)≤ FXn(x) + P (X −Xn < −ε)

≤ FXn(x) + P (|Xn −X| ≥ ε)

⇒ FX(x− ε) ≤ lim infn→∞FXn(x)

Xn ≤ x©©*

X ≤ x− ε?

At each continuity point of FX(x), by letting ε→ 0,

FX(x) ≤ lim infn→∞FXn(x) ≤ lim supn→∞FXn(x) ≤ FX(x)

convergence in dist

⇔ For any bounded continuous g, limn→∞Eg(Xn) = Eg(X).


almost sure convergence, Xna.s.−→ X :

P (ω ∈ Ω : limn→∞Xn(ω) 6= X(ω)) = 0

almost sure convergence ⇒ convergence in prob

proof: (limn→∞ xn = x ⇔ ∀ ε > 0, ∃N > 0 such that

n ≥ N ⇒ |xn − x| < ε)

⇒ Xn → X = ∩ε>0 (∪∞N=1 (∩∞n=N |Xn −X| < ε))= ∩ε>0 (lim infn→∞|Xn −X| < ε)

⇒ Xn 6→ X = ∪ε>0 (∩∞N=1 (∪∞n=N |Xn −X| ≥ ε))= ∪ε>0 (lim supn→∞|Xn −X| ≥ ε)⊇ ∩ε>0 (lim supn→∞|Xn −X| ≥ ε)

Therefore Xna.s.−→ X implies that

∀ ε > 0, P (limN→∞(∪∞n=N |Xn −X| ≥ ε)) = 0.


∀ ε > 0,

0 = P (limN→∞(∪∞n=N |Xn −X| ≥ ε))= limN→∞P (∪∞n=N |Xn −X| ≥ ε) [continuity of P ]

≥ limn→∞P (|Xn −X| ≥ ε) ≥ 0

example:

strong law of large numbers (SLLN):

For iid Xn with finite mean m,

sample mean Mn := 1n

∑ni=1 Xi

a.s.−→ m

weak law of large numbers (WLLN):

For iid Xn with finite mean m,

sample mean Mn := 1n

∑ni=1 Xi

pr−→ m


example: a.s. convergence 6⇒ ms convergence

Let Ω = [0, 1] and assume uniform probability allocation f (ω).

X1(ω) = 1, ω ∈ [0, 1] X2(ω) =

2, ω ∈ [0, 1/2]0, else

X3(ω) =

4, ω ∈ [0, 1/4]0, else

X4(ω) =

8, ω ∈ [0, 1/8]0, else

· · ·

0 1

2

4

8

X1X2

X3

X4

Xna.s.−→ 0, but E|Xn −X|2 6→ 0, ie Xn

ms−→/ 0.


example: ms convergence 6⇒ a.s. convergenceLet Ω = [0, 1] and assume uniform probability allocation f (ω).

X1(ω) = 1, ω ∈ [0, 1]

X2(ω) =

1, ω ∈ [0, 1/2]0, else

, X3(ω) =

1, ω ∈ [1/2, 1]0, else

,

X4(ω) =

1, ω ∈ [0, 1/4]0, else

, X5(ω) =

1, ω ∈ [1/4, 1/2]0, else

,

X6(ω) =

1, ω ∈ [1/2, 3/4]0, else

, X7(ω) =

1, ω ∈ [3/4, 1]0, else

,

X8(ω) =

1, ω ∈ [0, 1/8]0, else

, X9(ω) =

1, ω ∈ [1/8, 1/4]0, else

,

X10(ω) =

1, ω ∈ [1/4, 3/8]0, else

, X11(ω) =

1, ω ∈ [3/8, 1/2]0, else

,

X12(ω) =

1, ω ∈ [1/2, 5/8]0, else

, · · ·


0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

X21

· · · · · ·

Xnms−→ 0, but Xn

a.s.−→/ 0.


Convergence theorems for expectation

monotone convergence theorem:

0 ≤ Xn ↑ X almost surely (a.s.) ⇒ EXn ↑ EX

proof: Choose simple functions Xkn’s such that

0 ≤ Xkn ↑ Xk as n →∞.

Let Yn = maxk≤n Xkn.

Yn are simple andnon-decreasing.

X11 X12 X13 X14 · · · → X1

X21 X22 X23 X24 · · · → X2

X31 X32 X33 X34 · · · → X3

X41 X42 X43 X44 · · · → X4

qqq

qqq

qqq

qqq ↓

Y1 Y2 Y3 Y4 · · · X

For k ≤ n, Xkn ≤ Yn ≤ maxk≤nXk = Xn (1)


For k ≤ n, Xkn ≤ Yn ≤ Xn (1)

⇒ Xk ≤ lim Yn ≤ X [n →∞]

⇒ X ≤ lim Yn ≤ X [k →∞]

⇒ lim Yn = X

Since Yn are simple and non-decreasing,

⇒ lim EYn = EX . (2) [def of Lebesgue integral of X ]

(1) ⇒ EXkn ≤ EYn ≤ EXn.

⇒ limn EXkn ≤ lim EYn ≤ lim EXn

Since Xkn’s are simple and non-dec

⇒ EXk ≤ lim EYn ≤ lim EXn [def of Lebesgue int of Xk]

⇒ limk EXk ≤ lim EYn ≤ lim EXn [k →∞]

⇒ lim EYn = lim EXn. Compare this and (2).


example: Monotone convergence theorem does not apply (notmonotonic).

0 1

2

4

8

X1X2

X3

X4

For uniform prob allocation, Xn → X = 0 a.s.

But lim EXn = lim 1 = 1 6= EX = 0.


Fatou-Lebesgue theorem:

For some U and V such that EU and EV exist,

U ≤ Xn ⇒ E lim inf Xn ≤ lim inf EXn;

Xn ≤ V ⇒ lim sup EXn ≤ E lim sup Xn.

proof: Let Xn be non-negative. (This case is called Fatou’slemma; U = 0.)

Let Yn = infk≥n Xk.

⇒ Yn ≤ Xn, EYn ≤ EXn, lim inf EYn ≤ lim inf EXn (1)

⇒ Yn ↑ lim inf Xn [lim inf Xn = supn≥0(infk≥n Xk) = limn→0(infk≥n Xk)]

⇒ EYn ↑ E lim inf Xn [monotone convergence theorem]

⇒ lim inf EYn = lim EYn = E lim inf Xn ≤ lim inf EXn [(1)]

For general Xn, we apply this to Xn − U .


The proof for limsup and Xn ≤ 0 is similar, and for generalXn, we apply the same to V −Xn.

dominated convergence theorem:

If |Xn| ≤ U , EU exists, and Xn→X a.s., then EXn→EX .

proof: Xna.s.−→ X

⇒ E lim sup Xn = E lim Xn = E lim inf Xn

⇒ lim sup EXn ≤ E lim sup Xn [F-L theorem]

= E lim Xn

= E lim inf Xn ≤ lim inf EXn [F-L theorem]

⇒ lim sup EXn = lim inf EXn = lim EXn = E lim Xn

[lim inf EXn ≤ lim sup EXn]


example: Dominated convergence theorem does not apply.

0 1

2

4

8

X1X2

X3

X4

For uniform prob allocation, Xn → X = 0 a.s.

But lim EXn = lim 1 = 1 6= EX = 0.

If we let U = max (Xi, i = 1, 2, · · ·), we have |Xn| ≤ U , butEU = ∞.

These theorems apply not only to expectation but also to otherintegrations. See the proof of Wiener-Khinchin theorem.

Documents

rp_gub_13_05