Fatou's Lemma

8/11/2019 Fatou's Lemma

http://slidepdf.com/reader/full/fatous-lemma 1/60

Chapter 1

1. Probability, Indicator function and expectation.

The present chapter contains a brief review of some of the basic notions in probability

theory and Statistics. The reader is advised to skim according to taste and background.

Basic Set Theory.

Let Ω be an abstract set representing the sample space of a random experiment. The

power set of Ω by P (Ω) is defined to be the set of all subsets of Ω. Elements of Ω are

outcomes and its subsets are events. Therefore

P (Ω) = A : A ⊂ Ω.

For A, B ∈ P (Ω) define

A ∪ B = x : x ∈ A or x ∈ B, A ∩ B = x : x ∈ A and x ∈ B, A = Ac = x : x /∈ A.

Also define

A∆B = (A ∪ B) − (A ∩ B).

In terms of events A ∪ B occurs if and only if at least one of the two events A and B

occurs. Also A ∩ B occurs if both A and B occurs. The empty set is denoted by Φ.

A summary of basic properties.

(1) A ⊂ A, Φ ⊂ A

(2) A ⊂ B and B ⊂ A implies A = B.

(3) A ⊂ C and B ⊂ C implies A ∪ B ⊂ C and A ∩ B ⊂ C .

(4) A ⊂ B if and only if B c ⊂ Ac.

(5) (Ac)c = A, Φc = Ω, Ωc = Φ.

1



(6) A ∪ B = B ∪ A, A ∩ B = B ∩ A, A ∪ A = A, A ∩ Ω = A, A ∪ Ac = Ω, A ∩ Ac = Φ.

(7) A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C ), A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C )

(8) (A ∪ B)c = Ac ∩ Bc, (A ∩ B)c = Ac ∪ Bc

Example. We have

∞n=1

0,

n

n + 1

= [0, 1),

∞n=1

0,

1

n

= Φ.

Example. Prove A∆B = Ac∆Bc.

Proof . Note that

AC ∆Bc = (Ac ∪ Bc) − (Ac ∩ Bc) = (A ∩ B)c ∩ ((A ∪ B)c)c = (A ∩ B)c ∩ (A ∪ B)

= (A ∪ B) − (A ∩ B) = A∆B

Definition 1.1. The indicator function of a set A is defined by

I (x ∈ A) = I A(x) =

1, if x ∈ A,0, if x ∈ Ac.

Properties.

(1) I A∪B = max(I A, I B), I (A ∩ B) = I A · I B, I A∆B = I A + I B (mod 2).

(2) A ⊂ B if and only if I A ≤ I B

(3) I ∪Ai ≤

i I Ai.

Example. Show that

(1)

I ∪∞i=1Ai = 1 − Π∞

i=1(1 − I Ai).

and (2)

I A∆B = (I A − I B)2.

2



Definition 1.2. Let An be a sequence of events. Then

lim inf An =∞

n=1

∞m=n

Am

and

limsup An =∞

n=1

∞m=n

Am.

Lemma 1.1. We have

limsup An = ω :∞

i=1

I Ai(ω) = ∞

and

lim inf An = ω :∞

i=1

I Aci(ω) < ∞

Note. For this reason sometime we write lim sup An = An i.o. (infinitely often). If

lim inf An = lim sup An then

lim An = lim inf An = lim sup An.

Proof . (1) If ω ∈ lim sup An then ω ∈ ∞m=n Am for all integers n. Therefore for any

integer n there exists an integer kn such that ω ∈ Akn. Since

∞i=1

I Ai(ω) ≥

∞i=1

I Aki(ω) = ∞.

Conversely, for any integer n,∞

i=n

I Ai(ω) = ∞.

This implies that ω ∈ ∞ j=n A j, for any integer n. Therefore

ω ∈∞

n=1

∞m=n

Am = lim sup An.

For (2), notice that

ω ∈ liminf An =∞

n=1

∞m=n

Am,

3



implies that there exist an integer n0 such that ω ∈ ∞k=n0

Ak. Therefore

∞n=1

I Acn

(ω) =n0−1n=1

I Acn

(ω) ≤ n0 < ∞.

Remark. The proof in the above can be simplified by noticing the fact that

(lim sup An)c =∞

n=1

∞m=n

Acm = lim inf Ac

n.

and

(lim inf An)c =∞

n=1

∞

m=n

Acm = limsup Ac

n.

Lemma 1.2. (1) If An ⊂ An+1, for any integer n, then

lim An =∞

n=1

An

(2) If An+1 ⊂ An, for any integer n, then

lim An =∞

n=1

An.

Proof . We prove (1) and (2) similarly follows. Note that in this case,∞

m=n

Am =∞

m=1

Am

for all integers n. Therefore∞

n=1

∞m=n

Am =∞

n=1

An.

On the other hand∞

m=n

Am = An

which implies∞

n=1

∞m=n

Am =∞

n=1

An.

Therefore

lim sup An = liminf An =∞

n=1

An = lim An.

4



Also notice that

lim∞

m=n

Am =∞

n=1

∞m=n

Am, lim∞

m=n

Am =∞

n=1

∞m=n

Am.

Example. We have

lim[0, 1 − 1/n] = lim[0, 1 − 1/n) = [0, 1)

and

lim[0, 1 + 1/n] = lim[0, 1 + 1/n) = [0, 1]

Note. Let B , C ⊂ Ω and define the sequence

An =

B, if n is odd,C, if n is even.

We have∞

n=1

∞m=n

Am = B ∪ C

and∞

n=1

∞m=n

Am = B ∩ C.

If B ∩ C = B ∪ C then B ∩ C = liminf An = lim sup An = B ∪ C .

Definition 1.3. A field (algebra) is a class of subsets of Omega (called events) that

contains Ω, closed under finite union, finite intersection and complements.

In other words a family of subsets of Ω (say A) is a field if

(1) Ω ∈ A(2) If A ∈ A then Ac ∈ A(3) If A, B ∈ A then A ∪ B ∈ A.

Remarks. If A, B ∈ A then A ∩ B ∈ A. This is true because

(Ac ∪ Bc)c = A ∩ B.

5



Definition (σ-field). A σ-field (σ-algebra) is a field that is closed under countable union.

(Observe that this implies that a σ-field is also closed under countable intersection).

Example. Here are 4 σ-fields.

(1) The power set P (Ω).

(2) Check that F = Ω, Φ.

(3) The family of subsets of which are either countable or their complements are

countable.

(4) Let B the smallest σ-field containing all open sets. This σ-field is called the Borel

σ-field.

Definition 1.4. (Probability measure). Let Ω be a sample spaces and F be a σ-field on

Ω. A probability measure P is defined on F such that

(i) P (Ω) = 1

(ii) If A1, A2, . . . are in F and they are disjoint then

P ∞

i=1

Ai

=

∞i=1

P (Ai)

Basic properties. We have

(i) Since P (Ω) = 1 = P (Ω

Φ) = P (Ω) + P (Φ) we have P (Φ) = 0.

(ii) Since (A − B)

(A ∩ B) = A and (A − B)

(A ∩ B) = Φ we have

P (A − B) = P (A) − P (A

B).

(iii) Similarly note that (A − B)

B = A

B and (A − B)

B = Φ which implies

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

(iv) If A ⊂ B then A

(B − A) = B. Therefore P (A) + P (B − A) = P (B). Obviously

P (A) ≤ P (B).

6



Approach based on expectation.

Definition 1.5. Let X : Ω −→ and assume E be an operator with the following

properties

(1) If X ≥ 0 then E (X ) ≥ 0

(2) If c ∈ is a constant then E (cX ) = cE (X )

(3) E (X 1 + X 2) = E (X 1) + E (X 2)

(4) E (1) = 1

(5) If X n(ω) is monotonically increasing and X n(ω) ↑ X (ω) then

limn→∞

E (X n) = E (X ).

Example. Let

E (X ) = limD→∞

D−D

X (ω)dω

2D .

Check if E satisfies definition for expectation.

Solution. It is easy to check (1)-(4). The axiom (5) fails to be correct. For example

take Ω = and X n(ω) = I [ − n, n](ω). We have

limD→∞

D−D

X n(ω)dω

2D = 0

while X n(ω) ↑ 1. So the operator in this example is not a proper form of expectation.

Definition 1.6. For any event A define

P (A) = E (I A(ω)).

For simplicity sometimes, we drop ω and write P (A) = E (I A). It is easy to verify the

axioms of probability measure given in definition 1.4.

Properties. It is easy to conclude that (1)

E

ni=1

ciX i

=

ni=1

ciE (X i)

7



and

(2) if X ≤ Y ≤ Z then E (X ) ≤ E (Y ) ≤ E (Z ).

(3) If Ai is a sequence of events then

P

∞

i=1

Ai

≤

∞i=1

P (Ai).

To see this notice that

I ∞i=1 Ai

≤∞

i=1

I Ai

and take expectation.(4)

P

ni=1

Ai

=

ni=1

P (Ai) −

1≤i<j≤n

(−1)3P (Ai ∩ A j) +

1≤i<j<k≤n

P (Ai ∩ A j ∩ Ak)

− · · · + (−1)n+1P (A1 ∩ A2 · · · ∩ An).

For proof use the fact that

I ∪ni=1Ai = 1 − Πn

i=1(1 − I Ai).

Theorem 1.1. (Fatou’s Lemma). If An is a family of events, then

(1)

P (lim inf An) ≤ liminf P (An) ≤ limsup P (An) ≤ P (lim sup An).

(2) If lim An = A then lim P (An) = P (A).

We first prove the following lemma.

Lemma 1.3. If An ⊂ An+1 for any positive integer n then lim P (An) = P (A). Similarly,

if An+1 ⊂ An for any positive integer n then lim P (An) = P (A) where in both cases

A = lim An.

Proof . Define

B1 = A1, Bi = Ai − Ai−1, i = 2, 3, . . . .

8



Since A1 ⊂ A2 ⊂ A3 ⊂ · · · we have

ni=1

Bi =n

i=1

Ai = An,∞

i=1

Bi =∞

i=1

Ai = A.

Therefore

P (A) = P

∞

i=1

Bi

=

∞i=1

P (Bi) = limn→∞

ni=1

P (Bi) = limn→∞

P

ni=1

Bi

= lim

n→∞P (An).

For the second case notice that for any nonnegative integer n, Acn ⊂ Ac

n+1. Therefore from

the first part of this lemma we have

limn→∞

P (Acn) = lim

n→∞(1 − P (An)) = P (Ac) = 1 − P (A).

Therefore

limn→∞

P (An) = P (A).

Now we are ready to prove Theorem 1.1. Notice that from the first part of Lemma 1.3

write

P (lim inf An) = P

limn→∞

∞i=n

Ai

= limn→∞P ∞

i=nAi ≤ lim inf P (An)

since ∞

i=n Ai ⊂ An. Likewise

P (lim sup An) = P

lim

n→∞

∞i=n

Ai

= lim

n→∞P

∞

i=n

Ai

≥ lim sup P (An)

since An ⊂∞

i=n Ai.

(2) If A = limsup An = liminf An we have

P (A) = P (lim inf An) ≤ liminf P (An) ≤ limsup P (An) ≤ P (lim sup An) = P (A).

Conclude limn→∞ P (An) = P (A).

Definition 1.6. (Independence) Events A and B are independent if

P (A ∩ B) = P (A)P (B).

9



Properties.

(1) If A and B are independent then Ac and B are independent. To see this notice

that

P (Ac ∩ B) = P (B − A) = P (B) − P (A ∩ B) = P (B) − P (A)P (B)

= P (B)(1 − P (A)) = P (B)P (Ac).

(2) If A,B,C are independent then A and B ∪ C are independent. Similarly A and

B ∩ C are independent.

(3) An event A is independent of itself if and only if A = Φ or Ac

= Φ.(4) Any event A is independent from Ω.

Theorem 1.2 (Borel Cantelli). Let (Ω, F , P ) be a probability space and let E i be a

sequence of events.

(1) If ∞

i=1 P (E i) < ∞ then P (lim sup E n) = 0

(2) If E i is a sequence of independent events then P (lim sup An) = 0 or 1 according

as the series

∞i=1 P (E i) converges or diverges.

Proof . Set E = limsup E n. We have E = ∞

n=1 F n where F n = ∞

m=n E m. For every

positive integer n,

0 ≤ P (F n) ≤∞

m=n

P (E m).

Since ∞

i=1 P (E i) < ∞ then limn→∞ P (E n) = 0. Since F n ↓ E from lemma 1.3 we can

write

0 = limn→∞P (F n) = P

limn→∞F n

= P (lim sup E n)

thus (1) follows. for (2), suppose E 1, E 2, . . . are independent. From part (1) we know if ∞m=1 P (E m) < ∞ then P (lim sup E n) = 0. It remains to show that if

∞m=1 P (E m) = ∞

then P (lim sup E n) = 1. As in part (1) let E = lim sup E n. We get

E c = liminf E cn.

10



Since the sequence of events E cn are also independent we have

P

∞

m=n

E cm

≤ P

N

m=n

E cm

= ΠN

m=n(1 − P (E m)) ≤ exp

−

N m=n

P (E m)

.

As N → ∞, we get N

m=n E cm ↓ E c. Thus

P

N

m=n

E cm

= P (E c) ≤ exp

−

N m=n

P (E m)

≤ exp

−

N m=n

P (E m)

→ 0

as n → ∞. Therefore P (E c) = 0 and P (E ) = 1.

Corollary 1.1. If E i is a sequence of independent events then

P (lim sup E n) = 0 if and only if ∞

i=1

P (E i) < ∞.

Proof . If ∞

i=1 P (E i) < ∞ from part (1) of Theorem 1.2, we have ∞

i=1 P (E i) < ∞. Now

let P (lim sup E n) = 0. If ∞

i=1 P (E i) = ∞ then P (lim sup E n) = 1. (part (2) of Theorem

1.2).

Remark. Notice that independence is required in Corollary 1.1 and can not be dropped.

To see this let (Ω = [0, 1], B, P) be a probability space with

P (A) =

A

dx

for a Borel set A. It is easy to show that P is a probability measure on [0, 1]. Now define

E n = (0, 1/n) and notice that E n ↓ Φ. Therefore

P (lim sup E n) = P (lim E n) = 0.

Since P (E n) = 1/n we have ∞

n=11n = ∞. This does not violate Corollary 1.1 as E i are

not independent. For example, take E 2 = (0, 1/2) and E 3 = (0, 1/3). We have

P (E 2) = 1/2, P (E 3) = 1/3, P (E 2 ∩ E 3) = P (E 3) = 1/3 = P (E 2)P (E 3) = 1/6.

11



Example. In tossing of a fair coin, let E n denotes the event that a head turn up on both

the nth and (n + 1)st toss of a fair coin. Then lim sup E n is the event that in repeated

tossing of a fair coin two successive head appears infinitely often times. Since E 2n is an

independent sequence of events and

∞n=1

P (E 2n) =∞

n=1

(1/4) = ∞,

we have P (lim sup E 2n) = ∞. This implies that P (lim sup E n) = ∞. Note that

limsup E 2n ⊂ limsup E n.

12



Chapter 2, Some Inequalities in Probability

In this chapter we review some important inequalities. These inequalities will be used

later in different sections of this note.

Definition 1.7. Suppose E (X ) = µ and σ2 = V (X ) = E ((X − µ)2). Define

X m.s.

= µ if and only if E (X ) = µ.

With some knowledge in real analysis equality in mean square is equivalent to equality

with probability 1 (almost sure).

Theorem 2.1. (Markov inequality). Let X be a nonnegative random variable and a be

a positive constant. Then

P (X > a) ≤ E (X )

a .

Proof . Note that

I (X

≥a)

≤

X

a

.

(Use Figure 1). Now take expected value from both sides.

Remark. Similarly we can write

I (|X − b| ≥ a) ≤ (X − b)2

a2

for all a > 0 and b ∈ (See figure 2). Therefore

P (|X − b| ≥ a) ≤ E ((X

−b)2)

a2 .

Take b = µ and a = kσ where µ and σ2 are the mean and variance of X to get Chebyshev’s

inequality

P (|X − µ| > kσ) ≤ 1

k2.

Example. This example shows that chebyshev’s inequality can not be improved. Let

13



0 2 4 6 8 10

0

1

2

3

Figure 1: Graph for I (x > a) and x/a when a = 3.

14



−2 0 2 4 6

0 .

0

0 .

5

1 .

0

1 .

5

2 .

0

2 .

5

Figure 2: Graph for I (|x − b| > a) and (x − b)

2

/a

2

, when a = 3 and b = 2.

15



P (X =

−1) = P (X = 1) =

1

8, P (X = 0) =

6

8.

It is easy to see that E (X ) = 0 and σ2 = 1/4.. Let k = 2 and calculate

P (|X − µ| ≥ kσ) = P (|X | ≥ 1) = 1

4 =

1

k2.

Cauchy-Schwarz inequality. Let X

= (X 1, . . . , X p) be a random vector. Define the

matrix U = E (XX

). Notice that for any p × 1 vector c we have

c

Uc = E ((c

X)2)≥

0.

Therefore the matrix U is nonnegative definite. Take p = 2 to see that

0 ≤ det(U) = E (X 21X 22 ) − (E (X 1X 2))2 .

Equality holds if c

X m.s.

= 0. Some generalizations exist which is given next.

Example. If P (X ≥ 0) = 1 and E (X ) = µ then

P (X ≥ 2µ) ≤ 0.5.

Proof . Use Markov’s inequality

P (X ≥ 2µ) ≤ E (X )

2µ = 0.5.

Example. Let X be a random variable and M (t) = E (exp(tX )) < ∞ exists for t ∈(

−h, h), h > 0 (moment generating function). It is easy to show that

P (X ≥ a) ≤ exp(−at)M (t), h > t > 0

and

P (X ≤ a) ≤ exp(−at)M (t), −h < t < 0.

Holder, Lyapunov and Minkowski inequalities.

16



Lemma 2.1. If α ≥ 0, β ≥ 0,

1

p +

1

q = 1, p > 1, q > 1

then

0 ≤ αβ ≤ α p

p +

β q

q .

Proof . If αβ = 0 then inequality holds trivially. Therefore, let α > 0, β > 0. For t > 0,

define

φ(t) = t p

p

+ t−q

q and differentiate to get

φ

(t) = t p−1 − t−q−1.

We have φ

(1) = 0 and φ

(t) < 0 when t ∈ (0, 1) and φ

(t) > 0 when t > 1. Therefore

t = 1 minimizes φ(t) in (0, ∞), i.e.

φ(t) = t p

p +

t−q

q ≥ φ(1) = 1, t ∈ (0, ∞).

Set t = α1/q/β 1/p to getα p/q

pβ +

α−1

qβ −q/p ≥ 1.

Multiply both sides by αβ and use the fact that

p/q + 1 = p and q/p + 1 = q

to get

αβ

≤ α p/q+1

p

+ β

qβ −q/p

= α p

p

+ β q

q

.

Theorem 2.2 (Holder’s inequality). Let X and Y be two random variables and

1

p +

1

q = 1, p > 1, q > 1.

We have

E (|XY |) ≤ (E (|X | p))1/p (E (|Y |q))1/q .

17



Proof . In the case that E (|X | p)E (|Y |q) = 0, result follows easily (XY a.s.

= 0). Otherwise

in Lemma 2.1, take

α = |X |

(E (|X | p))1/p, β =

|Y |(E (|Y |q))1/q

to get|XY |

(E (|X | p))1/p (E (|Y |q))1/q ≤ |X | p

E (|X | p) +

|Y |q

E (|Y |q).

Take expected value to conclude the result.

Theorem 2.3 (Minkowski’s inequality). For p

≥1, we have

(E (|X + Y | p))1/p ≤ (E (|X | p))1/p + (E (|Y | p))1/p .

Proof . Since |X + Y | ≤ |X | + |Y | the case p = 1 is obvious. So let p > 1. Choose q > 1

such that1

p +

1

q = 1.

Use Holder’s inequality to write

E (|X + Y | p) = E |X + Y ||X + Y | p−1 ≤ E

|X ||X + Y | p−1+ E |Y ||X + Y | p−1

≤ (E (|X | p))1/p

E |X + Y |( p−1)q

1/q

+ (E (|Y | p))1/p E |X + Y |( p−1)q

1/q.

Since ( p − 1)q = p by dividing both sides by (E (|X + Y | p))1/q = 0 result follows (the case

(E (|X + Y | p))1/q = 0 is trivial).

Remarks.

1.Cauchy Schwarz inequality is the special case of Holder’s inequality ( p = q = 0.5).

2. On space of random variables defined on (Ω, F , P ) the distance function

d(x, y) = (E (|X − Y | p))1/p

18



is a metric. The space of random variables on Ω equipped with the distance d is called

the LP space.

Convexity and Jensen’s inequality. Convexity is an important topic in Mathematics.

For simplicity we only consider the smooth convex functions. In most of our cases for

convexity we only need to check that g

(x) ≥ 0. Sometimes to check the convexity of a

function g, it suffices to show that the tangent line of the function g at the point µ is

below the curve g(x). This means that

g(x) ≥ g(µ) + g

(µ)(x − µ).

If X is a random variable where µ = E (X ) and g is a convex function then

E (g(X )) ≥ g(E (X )).

Theorem 2.4.(Lyapunov inequality). If r > s > 0 then

(E (|X |r))1/r ≥ (E (|X |s))1/s.

Proof . The function g(x) = |x|u when u > 1 is convex in . Let r/s > 1 and use

E (g(X )) ≥ g(E (X )) to write

E (|X |r) = E

(|X |s)r/s ≥ (E (|X |s))r/s .

Now result follows immediately.

The law of least square. Let X and Y

= (Y 1, . . . , Y m)

be a random variable and

a random vector respectively. To predict X from a linear function of Y , a

Y where

a(m × 1) ∈ m, we need to minimize

D(a) = E (X − a

Y)2 = E (X 2) − 2a

E (X Y) + a

E (YY

)a.

19



2 4 6 8 10

− 1 0 0 0 0

0

1 0 0 0 0

2 0 0 0 0

Figure 3: The convex function f (x) = x + exp(x)

20



Define

U Y X = E (X Y

) = (E (XY 1), . . . , E (XY m))

, U Y Y = E (YY

).

Notice that the matrix U Y Y is nonnegative definite Differentiate D(a) solve for a to find

the optimum value for a. We get

∂D(a)

∂ a = −2E (X Y) + 2E (YY

)a = 0.

This gives a = U −1Y Y U Y X (if U Y Y is nonsingular). If we have the predictor X = a0 + a

Y,

we impose E (X ) = a0 + a

E (Y) to get

X − E (X ) = a

(Y − E (Y))

and a = V −1Y Y V Y X where V Y Y and V Y X are centered version of previous formula.

21



Chapter 3

Induced probability measure, Conditional probability, probability density

functions and cumulative distribution functions

Induced probability measure. Let (Ω, F , P ) be a probability space and E be a

Borel set. Define P X (E ) = P (X (ω) ∈ E ) = P (X −1(E )). We can check that

(i) P X () = 1

(ii) P X (

∞i=1 E i) = P (X ∈

∞i=1) =

∞i=1 P X (E i).

Note that X

−1

(∞

i=1 E i) = ∞

i=1 X

−1

(E i).

Definition 3.1. If X is a random variable the cumulative distribution function (c.d.f.)

for the random variable X is defined by

F X (x) = P (X (ω) ≤ x) = P (X ≤ x).

Theorem 3.1. The distribution function F of a random variable X is a nondecreasing,

right continuous function on such that

F (−∞) = 0, F (+∞) = 1.

Proof . Let h > 0 and x ∈ . We have

F (x + h) − F (x) = P (x < X ≤ x + h) ≥ 0.

Therefore F is nondecreasing. For right continuity, take

E n = (

−∞, x + hn), hn

↓0.

We have

E n ↓ ∩∞n=1E n = (−∞, x] ⇒ P (E n) = F (x + hn) ↓ P (E ) = F (x).

Finally we have

F (N ) − F (−N ) = P (−N < X ≤ N )

22



and

E N = (−N, N ] ↑ E = .

Therefore

F (N ) − F (−N ) ↑ 1

which proves F (+∞) = 1 and F (−∞) = 0.

Remark. A distribution function F is continuous at x ∈ if and only if P (X = x) = 0.

To see this notice that

P (X = x) = F (x) − F (x−)jump at x.

Since F is continuous F (x−) = limt→x− F (x) = F (x) result follows immediately.

Definition 3.2. A random variable X is of continuous type if F (x) = P (X ≤ x) is a

continuous function.

Example. Define

F (x) = (1/2) I [0,1)(x) + (2x/3) I [1,3/2](x) + I (3/2,∞)(x).

It is obvious that F is nondecreasing, F (+∞) = 1, F (−∞) = 0 and F is right continuous.

Therefore F is a distribution function. The c.d.f. F is not of continuous type. We have

P (X = 0) = 0.5, P (X = 3/2) = 0, and P (0 ≤ X ≤ 1) = 5/6.

Lemma 3.1. The set of discontinuity points of a c.d.f. is at most countable.

Proof . Define p(x) = F (x) − F (x−) > 0 (the size of jump at x). Let D the set of all

discontinuity points of F . Let n be a positive integer and

Dn = x : 1

n + 1 < p(x) ≤ 1

n.

23



We have D = ∪∞n=1Dn. Since the number of elements in Dn can not exceed n, then Dn is

finite and this proves that D is at most countable.

Lemma 3.2. Let X be a random variable with c.d.f. of F and p(x) = F (x)−F (x−) > 0.

Let D = x1, x2, . . . , be the set of discontinuities of F . Define the step function

G(x) =∞

n=1

p(xn)I (xi ≤ x).

Then H (x) = F (x) − G(x) is nondecreasing and continuous on .

Proof . Clearly H is right continuous. We next show that H is also left continuous. Note

that if x

< x,

H (x) − H (x

) = F (x) − F (x

) − [G(x) − G(x

)].

As x ↑ x then F (x) − F (x

) converges to the size of jump of F at x and G(x) − G(x

)

converges to the size of jump of G at x. The size of jumps in both cases are p(x) which

shows that H (x) − H (x

) → 0 as x ↑ x.. Next we show that H is nondecreasing. We first

prove that x<xn≤x

p(xn) ≤ F (x) − F (x

).

Suppose that F has only a finite number of discontinuity (say N ) in the interval (x

, x].

We may assume that

x

< x1 < x2 < · · · < xN ≤ x.

By the monotone property of F it follows that

F (x

) ≤ F (x1−) < F (x1) ≤ F (x2−) < F (x2) ≤ · · · ≤ F (xN −) < F (xN ) ≤ F (x).

Since p(xn) = F (xn) − F (xn−) we have

N n=1

p(xn) = F (xN )−F (x1−)+N −1 j=1

[F (x j)−F (x j+1−)] ≤ F (xN )−F (x1−) ≤ F (x)−F (x

).

24



Next suppose that N → ∞ (countable infinite set of discontinuity) to get

H (x) − H (x

) = F (x) − F (x

) −∞

n=1

p(xn)I (x

< xn ≤ x).

Note. H (−∞) = 0, H (+∞) = 1 − α.

Theorem 3.2. Let F be a c.d.f.. Then F admits the decomposition

F (x) = αF d(x) + (1 − α)F c(x)

such that both F d and F c are cumulative distribution functions and where F d is a step

function and F c is continuous.

Proof . With the same notation used in Lemma 3.2, let α ∈ (0, 1) and set F d(x) = G(x)/α

and F c(x) = H (x)/(1 − α). If α = 0 set F (x) = H (x) and if α = 1 set F (x) = G(x)

Example. Let

F (x) = (1/2) I [0,1)(x) + (2x/3) I [1,3/2)(x) + I [3/2,∞)(x).

find F c and F d.

Solution. Jump points are J = 0, 1 and p(0) = 1/2, p(1) = 2/3 − 1/2 = 1/6.

G(x) =

xn≤x

p(xn) = (1/2)I (0,1](x) + (2/3)I (x ≥ 1)

and

H (x) = F (x) − G(x) = (2/3)(x − 1)I [1,3/2)(x) + (1/3)I (−∞,3/2](x).

We also get 1 − α = H (+∞) = 1/3, α = 2/3. Therefore F d(x) = 3G(x)/2, F c(x) =

H (x)/3.

Note.

25



(1) A random variable is called of discrete type if

P (X ≤ x) = F (x) = F d(x), F c(x) = 0.

(2) For a discrete random variable X ,

P (X = x) = f (x) = F (x) − F (x−) ≥ 0

is called the p.d.f. (p.m.f.) for X .

(3) For a discrete random variable X we have x f (x) = 1

Example. Let X be a random variable with continuous cdf of F . Find distribution for

(i) U = F (X )

(ii) If U ∼ U [0, 1] then F −1(U ) d= X

(iii) Use (i) and (ii) to show that X = − ln U has an exponential distribution.

Solution. We have

G(u) = P (F (X )

≤u) = P (F −1(F (X ))

≤F −1(U )) = F (F −1(u)) = u.

Therefore g(u) = G

(u) = 1, u ∈ (0, 1).

For (ii)

P (F −1(U ) ≤ x) = P (F (F −1(U ) ≤ F (X )) = P (U ≤ F (x)) =

F (x)

0

du = F (x)

For (iii) note that

F (x) = P (

−ln U

≤x) = P (U

≥exp(

−x)) = 1

−exp(

−x)

and F

(x) = exp(−x), x > 0.

Definition 3.3. Let P (X ≤ x) = F (x) be continuous. If there exists a nonnegative

function f such that

P (X ≤ x) =

x

−∞

f (t)dt

26



then f is called the probability density function (p.d.f.) for the random variable X .

Remark. Notice that if definition 3.3 holds then ∞−∞

f (t)dt = 1

and if f is continuous at x then F

(x) = f (x).

2. Conditional Probability measure. Let (Ω, F , P ) be a probability space and B ∈ F such that P (B) > 0. Define the conditional probability measure given B as

QB(A) = P (A|B) = P (A

∩B)

P (B) .

Theorem 3.2. On (Ω, F , P ), the set function QB(A) is a probability measure.

Proof . If B ∈ F and Ai is a sequence of disjoint events then

(1) QB(Φ) = P (B ∩ Φ)/P (B) = 0,

(2) QB (∞

i=1) = P (B ∩ (A1 ∪ A2 ∪ · · · ))/P (B) = ∞

i=1 QB(Ai)

(3) QB(Ω) = P (Ω ∩ B)/P (B) = P (B)/P (B) = 1.

(4) Since P (A ∩ B) ≤ P (B) we have QB(A) ∈ [0, 1] for any A ∈ F .

Remarks.

(1) P (A|Bc) = 1 − P (A|B), P (Ac|B) = 1 − P (A|B)

(2) P (A|Ω) = P (A)

(3) IF A and B are independent and P (A)P (B) > 0 then

P (A|B) = P (A) and P (B|A) = P (B).

Theorem 2.1. If Ai is a sequence of disjoint events such that

∞i=1

Ai = Ω

and P (B) > 0 then

P (B) =∞

i=1

P (B|Ai)P (Ai)

27



and

P (Ai|B) = P (B|Ai)P (Ai)∞i=1 P (B|Ai)P (Ai)

.

Proof . We have

P (B) = P (B ∩ Ω) = P (B ∩ (A1 ∪ A2 ∪ · · · )) =∞

i=1

P (B ∩ Ai) =∞

i=1

P (B|Ai)P (Ai).

Similarly

P (Ai|B) = P (Ai ∩ B)

P (B) =

P (B|Ai)P (Ai)

P (B) =

P (B|Ai)P (Ai)

∞i=1 P (B

|Ai)P (Ai)

.

Example 1. Roll a die, then flip a coin as many times as the number on the die. Let X

be the number of heads. (1) Find the distribution for the random variable X . (2) If we

observe 3 heads what is the probability that the die is 4 ?

Solution. (1) Let Y be the number on the die. Then

P (X = k) =6

i=1

P (X = k|Y = i)P (Y = i) = 1

6

6

i=1

P (X = k|Y = i) = 1

6

6

i=k

i

k

1

2

i

.

For (2) use

P (Y = 4|X = 3) = P (X = 3|Y = 4)P (Y = 4)

P (X = 3) =

P (X = 3|Y = 4)P (Y = 4)16

6i=3

i3

12

i =

Since P (Y = 4) = 1/6, P (X = 3|Y = 4) =43

(1/2)4 = 1/4 and

6i=3

i

3

1

2

i

= 1

we get P (Y = 4|X = 3) = 0.25.

Example. An urn contains m + n chips of which m are white and the rest are black. A

chip is drawn at random and without observing its color another chip is drawn at random.

What is the probability that the first chip is white ? What is the probability that the

second chip is white ?

28



Solution. Let E 1 be the event that the first chip is white and E 2 be the event that the

first chip is black. Also let A be the event that the second chip is black. We have

P (E 1) = m

m + n, P (A|E 1) =

m − 1

m + n − 1, P (A|E 2) =

n

m + n − 1.

We have

P (A) = P (A|E 1)P (E 1) + P (A|E 2)P (E 2)

=

m − 1

m + n − 1

m

m + n

+

m

m + n − 1

n

m + n

=

m

m + n.

29



Chapter 4

Expectation, moments, moment generating function, characteristic function,

functions of a random variable..

Let X be a random variable with c.d.f. F such that ∞−∞

|U (x)|dF (x) < ∞.

Define

E (U (X )) = ∞−∞ U (x)dF (x).

We denote mean by E (X ), kth moment by µk = E (X k), variance by σ2 = E (X 2)−E [(X −E (X )2], moment generating function by M (t) = E (exp(tX )) and characteristic function

by Φ(t) = E (exp(itX )), where i =√ −1. If X is of discrete type then g(s) = E (sX ) =

k skP (X = k) is called the generating function.

Example 1. Let X be a random variable and

P (A) =

A

dxπ(1 + x2)

, x ∈ .

Show that

(i) f (x) = 1π(1+x2) , x ∈ is a p.d.f..

(ii) E (X k) does not exist if k ≥ 1.

Proof . Since f (x) ≥ 0 and

∞

−∞

dx

π(1 + x2

)

= 1.

(i) follows immediately.

We have E (|X |) = ∞ and

E (|X |k) = 2

∞0

xkdx

π(1 + x2) ≥ 2

π

∞1

xk

1 + x2 ≥ 2

π

∞1

xk

x2 = ∞

if k > 1..

30



Example 2. Let Z be a random variable with p.d.f. (check if f is a p.d.f.)

f (z ) =

z

−∞

1√ 2π

exp(−z 2/2)dz.

Find

(i) c.d.f. for X = σZ + µ, for σ > 0 and µ ∈ .

(ii) moment generating function

(iii) E (Z ) and V ar(Z ).

Solution. Note that f (x) ≥ 0. Let

I =

∞−∞

exp(−x2/2)dx

and notice that

I 2 =

∞−∞

∞−∞

exp(−(x2 + y2)/2)dxdy.

Now take x = r cos θ and y = r sin θ (polar coordinates) to get

I 2 = ∞0

2π

0exp(−r2/2)rdrdθ = 2π.

Therefore I =√

2π. A graph for f (x) is given below

We can calculate

F (x) = P (σZ + µ ≤ x) = P

Z ≤ x − µ

σ

=

x−µσ

0

f (z )dz.

We getdF (x)

dx = 1

σ√ 2π exp−

(x−

µ)2

σ2

for x ∈ . (This distribution is denoted by N (µ, σ2)).

Example 3. Let X be a random variable with Poisson distribution with mean λ. Find

the generating function.

31



Solution. We have

g(s) = E (sX ) =∞

k=0

sk exp(−λ)λk

k! = exp(λ(s − 1)).

Multivariate normal distribution. Let Z 1, . . . , Z p be an i.i.d. sequence of observations

from a standard normal distribution. We have

f (z 1, . . . , z p) = 1

(2π)n/2 exp(−z

z/2)

where z

= (z 1, . . . , z p). Let Σ( p×

p) be a positive definite matrix and x = Σ1/2z + m. We

have dx = |Σ1/2|dz. Therefore the p.d.f. for random variable X is

g(x) = 1

|2πΣ)1/2| exp−(x − m)

Σ−1/2(x − m)

.

A random variable with the p.d.f. above is called p-variate normal density with the mean

m and the variance-covariance matrix Σ and denoted by N p(m, Σ).

Notice that the result above implies that if x ∼ N p(m, Σ) then

Σ−1/2(x − m) ∼ Np(0, I).

Elementary properties of characteristic functions. The moment generating func-

tions of a random variable may not exist. For example it can be shown that the moment

generating function for Cauchy distribution may not exist as ∞−∞

exp(θx)

π(1 + x2)dx = ∞.

Therefore we define characteristic function which always exist (Theorem 4.1).

Theorem 4.1. If X is a random variable then

(i) Φ(θ) = E (exp(iθX )) always exists, Φ(0) = 1 and |Φ(θ)| ≤ 1.

(ii) ¯Φ(θ) = Φ(−θ)

(iii) If X is symmetric then Φ(θ) ∈ .

32



(iv) ΦaX +b(θ) = E (exp(iθ(aX + b)) = exp(iθa)ΦX (θb).

(v) The characteristic function for any random variable X is uniformly continuous.

Proof . Since exp(iθX ) = cos θX + i sin θX = U + iV and

|E (U + iV )|2 = (E (U ))2 + (E (V ))2 ≤ E (U 2) + E (V 2)

we have

|E (U + iV )| ≤ E (|U + iV |).

Therefore|E (exp(iθX ))| ≤ E (| exp(iθX )|) = 1.

For (ii), notice that

E (cos θX ) + iE (sin θX ) = E (cos θX ) − iE (sin θX ) = Φ(θX ).

For (iii) notice that X =d= −X . For (iv),

E (exp(iθ(aX + b)) = exp(iθb)E (exp(iθaX )).

To prove uniform continuity notice that

Φ(t + h) − Φ(t)| = |E (exp(itX )(exp(ihX ) − 1)| ≤ E (| exp(itX )|| exp(ihX ) − 1)|)

= E (| exp(ihX ) − 1)|).

Since | exp(ihX ) − 1)| ≤ 2 we have

limh→0

E (| exp(ihX ) − 1)|) = 0.

Theorem 4.2. Let X be a random variable with c.d.f. F such that E (X ) exists. Then

E (X ) =

∞0

(1 − F (x))dx − ∞0

F (−x)dx.

33



Proof . First assume P (X ≥ 0) = 1. We have

E (X ) =

∞

0

xdF (x) =

∞

0

x

0

dydF (x) =

∞

0

∞

y

dF (x)dy =

∞

0

(1 − F (y))dy.

In general X = X + − X − where

X + = max(0, X ) = (X + |X |)/2, X − = max(0, −X ) = (−X + |X |)/2.

Therefore E (X ) = E (X +) − E (X −). Since X +, X − ≥ 0 we have

E (X +) = ∞

0

P (max(0, X ) > x)dx−

∞

0

P (max(0,−

X ) > x)dx.

Since

P (max(0, X ) > x) = P ((X + |X |)/2 > x) = P (|X | > 2x − X )

= P (X > 2x − X, X < −2x + X ) = P (X > x)

and

P (X − > x) = P (|X | > 2x + X ) = P (X > 2x + X, X < −2x − X ) = P (X < −x).

Remark 1. From theorem 4.2. we can conclude that if X ≥ 0 is an integer-valued

random variables then

E (X ) =∞

n=0

P (X > n).

An easy and direct solution is taking expectation from the both sides of

X =∞

k=0

I (X > k).

Remark 2. We can also derive the following formula

E (X r) =

∞0

rxr−1P (X > x), r > 0

by using the fact that

E (X r) =

∞0

P (X r > x)dx =

∞0

P (X > x1/r)dx.

34



Use the change of variable x = ur to get the result.

Distribution for functions of a random variable. Let X be a random variable with

c.d.f. F . Our goal is to find c.d.f. (or p.d.f.) for Y = U (X ).

Example 4. Let X ∼ U [0, 1]. Find the p.d.f. for

(i) Y = a + (b − a)X where a < b.

(ii) W = tan (π(2X − 1)/2) .

Solution. For t

∈[a, b],

F (t) = P (a + (b − a)X < t) = P (X < (t − a)/(b − a)) = t − a

t − b.

This fives F

(t) = f (t) = 1b−a , t ∈ (a, b). This means that Y ∼ unif (a, b).

For (ii), let t ∈ and notice that

G(w) = P (tan (π(2X − 1)/2) ≤ w) = P

X ≤ 1

π tan−1 w + 1/2

=

1

π tan−1 w +

1

2.

Therefore

G(w) = g(w) = 1π(1 + w2)

, w ∈ .

which shows W has Cauchy distribution. In the next following examples the following

formulaΓ(α)

β α =

∞0

xα−1 exp(−βx)dx

is useful.

Example 5 (Beta distribution). Let X i, i = 1, 2 be two independent random variable

from the following gamma distribution

X i ∼ f (x) = 1

Γ(αi) exp(−x)xαi−1, x > 0, i = 1, 2.

Find the p.d.f. for the random variable

U = X

X + Y .

35



Solution. We have

G(u) = P

X

X + Y ≤ u

= P

X ≤ yu

1 − u

= 1

Γ(α1)Γ(α2)

∞0

yu1−u

0

exp(−(x + y))xα1−1yα2−1dxdy.

We get

g(u) = G

(u) = uα1−1(1 − u)−(α1+1)

Γ(α1)Γ(α2)

∞0

exp(−y/(1 − u))yα1+α2dy

= Γ(α1 + α2)Γ(α1)Γ(α2)

uα1−1(1 − u)α2−1, 0 < u < 1.

The resulting distribution is called beta(α1, α2) distribution. It is useful to notice that

Γ(α1)Γ(α2)

Γ(α1 + α2) =

10

uα1−1(1 − u)α2−1du.

Example 6 (t distribution). Let W ∼ N (0, 1) and V ∼ χ2(r) such that W and V are

independent. Find the p.d.f. for

T = W

V /r∼ t(r).

Solution. We have

G(t) = P (T ≤ t) = P

W ≤ t

V/r

=

∞0

t√

v/r

−∞

1√ 2π

exp(−w2/2) 1

Γ(r/2) exp(−v/2)v(r/2)−1dv.

Therefore

g(t) = G

(t) =

∞0

√ v√

2πrΓ(r/2)exp(−t2v/(2r)) exp(−v/2)v(r/2)−1dv

= 2r/2Γ((r + 1)/2)

Γ(r/2)√

πr

1 + t2

r

(r+1)/2, t ∈

36



x

y

−4 −2 0 2 4

0 .

0

0 . 1

0 .

2

0 .

3

0 .

4

Figure 4: t distribution with 7 degrees of freedom

37



is the density for t(r) distribution. Notice that as r → ∞

g(t) → 1√ 2π

exp(−t2/2), t ∈

which is the standard normal p.d.f..

Example 7 (f distribution). Let U i ∼ χ2(ri), i = 1, 2 be independent. Find the p.d.f.

for

F = U 1/r1U 2/r2

∼ f (r1, r2).

Solution. We have

G(w) = P (F ≤ w) = P (U ≤ r1V w/r2)

=

∞0

vwr1/r2

0

1

2(r1+r2)/2)Γ(r1/2)Γ(r2/2) exp(−(u + v)/2)ur1/2−1vr2/2−1dudv

=

∞0

1

2(r1+r2)/2Γ(r1/2)Γ(r2/2) exp

−v

2(1 + wr1/r2)

v(r1+r2)/2−1(r1/r2)r1/2w(r1/2)−1dv

=Γ((r1 + r2)/2)

r1r2

r1/2

wr12 −1

Γ(r1/2)Γ(r2/2)

1 + w r1r2

(r1+r2)/2, w > 0.

Example 8. Let X and Y be two independent random variables fromU [0, 1].

(i) Find the joint p.d.f. for U = X + Y and V = X − Y

(ii) Find marginal p.d.f.’s for U and V .

Solution. We have

P ((U, V ) ∈ A × B) =

A

B

g(u, v)dudv

where g(u, v) is the p.d.f. of (U, V ). On the other hand

P ((U, V ) ∈ A × B) = P ((X, Y ) ∈ g−1(A × B)) =

g−1(A×B)

f (x, y)dxdy.

38



x

y

0 2 4 6 8 10

0 .

0

0 . 2

0 .

4

0 .

6

0 .

8

Figure 5: Graph of f distribution with 16 and 10 degrees offreedom

39



Therefore the problem turns to the change of variables in multiple integrals. We have

g(u, v)dudv| det(J )| = f (x, y)dxdy.

where J is the Jacobian of the transformation. Notice that each point (X, Y ) corresponds

to only one and only one point (U, V ) (one-one transformation). From the definition of

the r.v.’s U and V conclude that X = (U + V )/2, Y = (U − V )/2 and | det(J )| = 1/2.

Therefore

g(u, v) = f (u, v)|det(J )| = 1/2I (|v| < u, |v| < 2 − u, 0 < u < 2, −1 < v < 1).

Therefore

g1(u) =

u

−u

du/2 = v, 0 < u < 1

and

g1(u) =

2−u

u−2

du/2 = 2 − u, 1 < u < 2.

Similarly when

|v

|< 1,

g2(v) =

2−|v|

|v|

du/2 = 1 − |v|.

Example 9 (sample generation from normal distribution). Let U and U 2 be two inde-

pendent random variable from U [0, 1]. Define

X 1 = cos(2πU 1)

−2log U 2, X 2 = sin(2πU 1)

−2log U 2.

Find the joint p.d.f. and marginals for X 1 and X 2.

Solution. Solve for U 1 and U 2 to get

U 1 = 1

2π tan−1(x2/x1), U 2 = exp(−(x2

1 + x22)/2)

and

|det(J )| = 1

2π exp(−(x2

1 + x22)/2).

40



Therefore

f (x1, x2) = 12π

exp(−(x21 + x2

2))/2, (x1, x2) ∈ 2.

which is the p.d.f. for two independent standard normal distribution.

Example 10. Let X and Y be two independent random variables such that X ∼ exp(λ)

and Y ∼ exp(µ). Find the distribution for U = X + Y .

Solution. Define V = X and U = X + Y to get X = V and Y = U − V . We can

easily check that

dxdy = |J |du dv = du dv, |J | = 1.

fince for x > 0, y > 0, the joint p.d.f. for (X, Y ) is

f (x, y) = λµ exp(−(λx + µy)).

Therefore the joint p.d.f. for (U, V ) is

g(u, v) = f (u

−v, v) = λµ exp(

−(λv + µ(u

−v))), 0 < v < u <

∞.

Therefore if λ = µ then

g1(u) =

u

0

λµ exp(−(λv + µ(u − v)))dv = λµ exp(−µu)

u

0

exp(−(λ − µ)v)dv

= λµ

λ − µ exp(−µu) [1 − exp(−(λ − µ)u)] , u > 0.

If λ = µ then

g1(u) = λ

2

u exp(−λu), u > 0.

Example 11. Let X ∼ Poisson(λ) and Y ∼ Poisson(µ) be two independent random

variables. Find the joint distribution and marginals for

U = X + Y, V = X.

41



Solution. We have

f (x, y) = exp(−(λ + µ))λxµy

x!y! , x, y = 0, 1, 2, . . . .

We get

X = V, Y = U − V

and

g(u, v) = g(v, u − v) = exp(−(λ + µ)) λvµu−v

v!(u−

v)!, u = v, v + 1, v + 2, . . . , v = 0, 1, 2, . . .

Therefore

g1(u) =u

v=0

exp(−(λ + µ)) λvµu−v

v!(u − v)! =

exp(−(λ + µ))

u!

uv=0

u

v

λvµu−v

= exp(−(λ + µ))

u! (λ + µ)u, u = 0, 1, 2, . . . .

42



Chapter 5

Convergence of random variables.

Let X n be a sequence of random variables. We say (i) X nP → X if

limn→∞

P (|X n − X | > ) = 0.

(ii) X na.s.→ X if

P (|X n − X | > i.o.) = 0

(iii) X n Lp→ X

limn→∞

E (|X n − X | p) = 0.

Another convergence which is very useful in statistics is convergence in distribution. We

say X nd→ X if

limn→∞

P (X n ≤ x) = F (x) = P (X ≤ x)

for all x in C F , the set of continuity points of F .

Example 1. Let X i be a sequence of random variables such that E (X 1) = µ and

V ar(X 1) = σ2 < ∞. Let

X = 1

n

ni=1

X i

be the sample mean. Prove X converges to µ in probability and in LP for p ∈ [1, 2].

Proof . We have E ( X ) = µ and

E ( X − µ)2 = V ar( X ) = σ2/n → 0.

Therefore X L2→ µ. From Lyapunov’s inequality E ( X − µ) p → 0 for p ∈ [1, 2]. Also

limn→∞

P (|X − µ| > ) ≤ E ( X − µ)2

2 =

σ2

n2 → 0.

43



Theorem 5.1. Let X n be a sequence of random variables such that E (X n) = µ and

V ar(X n) → 0. Then

X nP → µ.

Proof . Use Markov inequality similar to example 1 to prove this theorem.

Example 2. Let X n be a sequence of random variables with c.d.f. F n(x) = I (x ≥2 + 1/n). We have limn→∞ F n(x) = I (x > 2). The limit is not a c.d.f. but we can say

limn→∞ F n(x) = F (x) = I (x ≥ 2) for x ∈ C F (2 is excluded).

Theorem 5.2. Let X i be a sequence of random variables and c ∈ be a constant. We

have X nP → c if and only if X n

d→ c.

Proof . Let X nP → c. We have P (|X n − c| ≥ ) = F n((c + )−) − F n(c − ) → 1 for all

> 0. Therefore

limn→∞

F n((c + )−) = 1, limn→∞

F n((c − ) = 0.

Define F (x) = I (x ≥ c) and P (X n ≤ x) = F n(x). Notice that limn→∞ F n(x) = F (x) if

x ∈ C F (i.e. x = c). Therefore X nd→ c. Now let X n

d→ c. Therefore

limn→∞

F n(x) = I (x ≥ c), x = c.

For all > 0,

limn→∞

P (|X n − c| < ) = F n((c + )−) − F n(c − ) → 1 − 0 = 1.

Therefore X n

P

→c.

Example 3. Let X n be a sequence of i.i.d. random variables from U [0, θ]. Defined

Y n = M ax(X 1, . . . , X n).

(i) Find c.d.f. and p.d.f. for Y n.

(ii) Show Y nP → θ

44



(iii) Find the limiting distribution for n(θ − Y n).

Solution. For y ∈ (0, θ),

F n(y) = P (Max(X 1, . . . , X n) ≤ y) = P (X 1 ≤ y , . . . , X n ≤ y) = (P (X 1 ≤ y))n = (y/θ)n.

This gives

f n(y) = F

n(y) = nyn−1/θnI (y ∈ (0, θ)).

(ii) Now calculate

E (Y n) = θ

0

nyn/θndy = nθn+1

(n + 1)θn → θ

and since

E (Y 2n ) =

θ

0

nyn+1/θndy = nθn+2

(n + 2)θn → θ2.

Therefore V ar(Y n) → 0. Combine these to get Y nP → θ from Theorem 5.1.

(iii) We have

Gn(w) = P (n(θ − Y n) ≤ w) = 1 − P (Y n < θ − w/n) = 1 −θ

−y/n

θn

= 1 − 1 − y

nθn

→ G(w) = 1 − exp(−y/θ).

as n → ∞. We have

G

(θ) = 1

θ exp(−y/θ), y ≥ o

which is an exponential distribution.

Example 4. Let

X n

be a sequence of i.i.d. random variables from a continuous c.d.f.

F . Defined Y n = M ax(X 1, . . . , X n). Find the limiting distribution for Z n = n(1−F (Y n)).

Solution. Since F (X i) ∼ U [0, 1] for i = 1, 2, . . . , n result follows from the previous

example when θ = 1, i.e.

n(1 − F (Y n)) d→ V

45



where V ∼ Exp(1).

Example 5. Let X n be a sequence of i.i.d. random variables with p.d.f.

f (x) = exp(−(x − θ))I (x ≥ θ).

Find the limiting distribution for Y n = n(Min(X 1, . . . , X n) − θ).

Solution. We have

P (Y n ≤ y) = P (n(Min(X 1, . . . , X n) − θ) ≤ x) = 1 − P (n(Min(X 1, . . . , X n) − θ) > x)

= 1 − ∞

θ+y/n

exp(−(x − θ))dx

n

= 1 − exp(−y), y ≥ 0.

which is free from n and is c.d.f. of an exp(1) distribution.

Example 6. Let X n be a sequence of i.i.d. random variables with Bernoulli distribution

P (X 1 = x) = px(1 − p)1−x, x = 0, 1

and P (X = x) = 0, x /∈ 0, 1 where p ∈ [0, 1]. Prove

ˆ p =

ni=1 X i

nP → p.

Proof . We have E ( X ) = p and V ar( X ) = V ar(X 1)/n = p(1 − p)/n → 0. Now result

follows from Theorem 5.1.

Example 7. Let

X n

be a sequence of i.i.d. random variables with d.d.f. F . Show that

F n(x) = 1

nI (X i ≤ x)

P → F (x).

Since E (F n(x)) = F (x) and V ar(F n)(x) = F (x)(1−F (x))/n → 0 as n → ∞ use Theorem

5.1 to conclude the result. More can be said about the empirical process which will be

discussed later.

46



Theorem 5.3. If X na.s.→ X then X n

P → X.

Proof . Since X n a.s.→ X then for all > 0,

limn→∞

P (∪∞m=n|X m − X | ≥ ) = 0

On the other hand

P (∪∞m=n|X m − X | ≥ ) ≥ P (|X m − X | ≥ ).

Therefore

0 ≤ lim sup P (|X n − X | ≥ ) ≤ limn→∞

P (∪∞m=n|X m − X | ≥ ) = 0

which shows that

limn→∞

P (|X n − X | ≥ ) = 0.

Remark. (complete convergence). We say X nc→ X if

∞n=1 P (|X n − X | ≥ ) < ∞. If

X nc→ X then X n

a.s.→ X . This is true since

P (∪∞m=n|X m − X | ≥ ) ≤∞

m=n

P (|X m − X | ≥ )

and∞

m=n

P (|X m − X | ≥ ) → 0

as ∞

n=1 P (|X n − X | ≥ ) < ∞. Another important fact is that if X na.s.→ X and g is a

bounded continuous function on then g(X n) a.s.→ g(X ).

Theorem 5.4. If X nP → X then X n

d→ X .

Proof . Let X ∼ F and x ∈ C F and x

be two real numbers. We have

P (X ≤ x

) = P (X n ≤ x, X > x

)+P (X ≤ x

, X n > x) ≤ P (X n ≤ x)+P (X ≤ x

, X n > x).

(5.1)

47



If x

< x then

P (X ≤ x

) ≤ P (X n ≤ x) + P (|X n − X | > x − x

).

Since X nP → X take limits as n → ∞ to get

F (x

) ≤ lim inf n→∞

F n(x).

Now consider x

> x. Similarly prove

F n(x) ≤ F (x

) + P (|X n − X | ≥ x − x).

Now converge n∞ to get limsup F n(x) ≤ F (x

). Now let x ↓ x to get lim sup F n(x) ≤

F (x). We have

F (x−) = F (x) ≤ lim inf F n(x) ≤ limsup F n(x) ≤ F (x).

Since x ∈ C F ,

lim F n(x) = F (x).

Theorem 5.5.. Let X nP

→ X and Y nP

→ Y . Then

(i) X n + Y nP → X + Y ,

(ii) X nY P → XY ,

(iii) X nY nP → XY

Proof . (i) for any > 0, we have

P (|X n + Y n − X − Y | ≥ ) ≤ P (|X n − X | ≥ /2) + P (|Y n − Y | ≥ /2)

and result follows as n → ∞.

For (ii), let k > 0 be a constant such that P (|Y | > K ) < δ for δ > 0. Therefore

P (|X nY − XY | > ) = P (|X n − X ||Y | > , |Y | > k) + P (|X n − X ||Y | > , |Y | ≤ k)

≤ δ + P (|X n − X | > /k).

48



now converge n → ∞ to get the result. Now since Y n − Y P → 0.

For (iii), it is enough to show the result holds when X = 0 and Y = 0 for constants

a and b as U nP → U is equivalent to U n − U

P → 0 and result proved part (ii) holds. For

large enough n and δ > 0 there exists a positive constant k such that P (|X n| ≥ k) < δ .

Therefore

P (|X nY n| > ) = P (|X n||Y n| > , |X n| ≥ k) + P (|X n||Y n| > , |X n| < k)

≤P (

|X n

| ≥k) + P (

|Y n

|> /k).

This gives the desired result as n → ∞.

Remarks.

(1) If X nP → X and X n

P → Y then P (X = Y ) = 1. This is true because

P (|X − Y | > ) ≤ P (|X n − X | > /2) + P (|X n − Y | > /2)

and rsult follows as n → ∞ and ↓ 0.

(2) If X nP

→ X then aX nP

→ aX and if a = 0 then X n/a P

→ X/a.

We use extensively the continuity theorem which is presented here without proof.

Theorem 5.6 (Continuity Theorem). Let X n be a sequence of random variables

with X n having distribution F n(·) and characteristic function Φn(·). Then X nd→ X if and

only if φn(t) → φ(t) for all t ∈ where φ(t) is continuous in 0.

Lemma 5.1. Let a, b are two constants and let φ(n) → 0. Then (1 + a/n + φ(n)/n)bn →eab.

Proof. We have

bn ln(1 + a/n + φ(n)/n) ∼ bn(a/n + φ(n)/n) + o(n) → ab.

Now result follows easily.

49



Example 8. Let X n ∼ χ2(n). Prove

X n − n√ 2n

d→ Z ∼ N (0, 1).

Proof. We have

E (exp(itX n)) = (1 − 2it)−n/2.

Therefore

M n(t) = E

t(X n − n)√

2n

= exp(−nt/

√ 2n)(1 − 2t/

√ 2n)−n/2.

Use the fact that

ln(1 − x) = x + x2/2 + x3/3 + . . . , −1 ≤ x < 1

to show that

ln(M n(t)) = −tn√

2n− nt√

2n+

4t2n

8n + o(n) → t2/2.

Therefore M n(t) → exp(t2/2) which is the m.g.f. of Z ∼ N (0, 1).

The above example is special case of central limit theorem which we intend to prove in

this section

Theorem 5.7. If E (|X |m) < ∞ for a given integer m then

E (exp(iθX )) = Φ(θ) =m

j=0

(iθ) j

j! E (X j ) + o(θm).

Without Proof (see page 127 of the textbook).

Theorem 5.8. Let X i be a sequence of i.i.d. random variables such that E (X 1) = µ.

We have

X P → µ.

Proof . We have

E (exp(iθ X ) = (φ(θ/n))n = (1 + iµθ/n + o(θ/n))n → exp(θµ).

50



Therefore X d→ µ which is equivalent to X

P → µ (Theorem 5.2).

Theorem 5.9. Let X i be a sequence of i.i.d. random variables such that E (X 1) = 0

(without loss of generality) and σ2 = V ar(X 1) < ∞. We have

X

σ/√

nd→ N (0, 1)

Proof . We have

E (exp(iθX )) = Φ(θ) = 1 − σ2θ2

2 + o(θ2).

Therefore

E

exp

iθ

X

σ/√

n

= (1 − σ2θ2

2n + o((θ/n)2))n → exp(−θ2σ2/2).

Remark. The above result can be generalized to multivariate easily by proving the result

for all the linear combinations (real valued) entries of the mean vector.

Theorem 5.10. Let X n, Y n be two sequences of random variables such that |X n−Y n| P

→0 and Y n

d→ Y . Then X nd→ Y.

Proof. Let x be a continuity point of F (y) = P (Y ≤ y) and > 0. Then

P (X n ≤ x) = P (Y n ≤ x + Y n − X n) = P (Y n ≤ x + Y n − X n, Y n − X n ≤ )

+P (Y n ≤ x + Y n − X n, Y n − X n > ) ≤ P (Y n ≤ x + ) + P (|Y n − X n| ≥ ).

Thereforelim sup

n→∞P (X n ≤ x) ≤ lim inf

n→∞P (Y n ≤ x + ).

Similarly

P (Y n ≤ x − ) = P (X n ≤ x + X n − Y n − )

= P (X n ≤ x + X n − Y n − , X n − Y n ≤ ) + P (X n ≤ x + X n − Y n, X n − Y n > )

51



≤ P (X n ≤ x + X n − Y n − , X n − Y n ≤ ) + P (|X n − Y n| > )

≤ P (X n ≤ x) + P (|X n − Y n| > ).

Let n → ∞ tp get

liminf n→∞

P (X n ≤ x) ≥ limsupn→∞

P (Y n ≤ x − ).

Since > 0 is arbitrary and x ∈ C F result follows by letting n → ∞.

Remark. If X nd→ X and Y n

P → c X n + Y nd→ X + c. To see why notice that

X n + c d→ X + c and

(Y n + X n) − (X n + c) = Y n − c d→ 0.

This implies the result (use Theorem 5.10). Also we have X nY nd→ cX . To see this first

let c = 0. In this case

P (|X nY n| > ) = P (|X nY n| > , |Y n| ≤ /k) + P (|X nY n| > , |Y n| > /k)

≤ P (|X n| > k) + P (|Y n| > /k).

As n → ∞ we have

lim supn→∞

P (|X nY n| > ) ≤ P (|X | > k)

and choosing k large implies the result. If c = 0 then

X nY n − cX n = X n(Y n − c) P → 0.

Use theorem 5.10 to get the required result.

Example 9. Let X i be a sequence of i.i.d. random variables such that E (X 21) < ∞.

We have √ n( X − µ)

S d→ N (0, 1)

where S 2 is the sample variance.

52



Proof. We have

S 2 = 1n − 1

ni=1

(X i − X )2 P → σ2

and √ n( X − µ)

σd→ N (0, 1).

Theorem 5.10 implies the result.

Theorem 5.11 (Skorohod Representation). Let X n and X are defined on the probability

space (Ω,

F , P ). Also let X n

d

→ X . On ([0, 1],

B ([0, 1]), P ∗) (P ∗ is uniform probability

measure on [0, 1]) and random variables X ∗n and X ∗ defined on this new probability space

such that X ∗nd

= X n for any fixed integer n and X ∗ d= X and X ∗n

a.s.→ X ∗. (Note: The

random variables X ∗n matches the distribution of X n but ignores the dependence structure

of X n.)

Without proof .

Some applications.

Theorem 5.12. (Continuous mapping theorem). Let X nd→ X and g(·) be a real valued

function which is continuous. We have

g(X n) d→ g(X ).

Proof . Since X nd→ X then there exists a sequence of random variables X ∗n and a random

variable X ∗

defined on another probability space such that X ∗

n

d

= X n and X ∗ d

= X suchthat X ∗n

a.s.→ X ∗. Therefore g(X ∗n) a.s.→ g(X ∗) which implies g(X ∗n)

d→ g(X ∗). Combining

continuity Theorem and uniqueness Theorem shows that

E (exp(iθg(X ∗n)) = E (exp(iθg(X n)) → E (exp(iθg(X ∗)) = E (exp(iθg(X )).

From continuity Theorem, g(X n) d→ g(X ).

53



Example 10. Let X n be a sequence of random variables such that E (X 1) = µ and

V ar(X 1) = σ2. We have √ n( X − µ)

σd→ Z

where Z ∼ N (0, 1). We haven( X − µ)2

σ2

d→ χ2(1).

(Z 2 d= χ2(1)).

Theorem 5.13. (Delta method). Let

X n

be a sequence of random variables such that

an(X n − θ) d→ X . If g is a differentiable function then an(g(X n) − g(θ))

d→ g

(θ)X .

Proof . There exists a sequence of random variables X ∗n and a random variable X ∗ defined

on another probability space such that X ∗nd= X n and X ∗

d= X such that an(X ∗n−θ)

a.s.→ X ∗.

Since

an(g(X ∗n) − g(θ)) = an(X ∗n − θ)g(X ∗n) − g(θ)

X ∗n − θ

and

g(X ∗n) − g(θ)X ∗n − θ

a.s.→ g

(θ)

we have

an(g(X ∗n) − g(θ)) a.s.→ g

(θ)X ∗.

Therefore

an(g(X n) − g(θ)) d→ g

(θ)X.

Example 11. Let X n be a sequence of random variables such that E (X 1) = µ and

V ar(X 1) = σ2. We have √ n( X − µ)

σ

d→ Z

We can use the delta method to get

√ n( X 2 − µ2)

d→ 2µσZ.

54



Note that 2µσZ ∼ N (0, 4µ2σ2).

Example 12. Let X n be a sequence of i.i.d. random variables such that E (X 1) = 0

and V ar(X 1) = σ2. Let g(x) = cos x and note that g

(0) = 0. Therefore in the proof of

delta method we need to expand beyond the mean value theorem. We have

√ n X

d→ Z ∼ N (0, 1).

Now use Skorohod representation theorem and the fact that cos x−1 ≈ −x2/2 to conclude

that

2n cos(1 − X ) d→ σ2χ2(1).

Theorem 5.12. (Multivariate delta method). Let Un be a sequence of random vectors

such that an(Un − m) d→ N p(0, Σ). If f : p → and the first and secon derivatives of

f exists in a neighborhood of θ then

√ n(f (Un) − f (m))

d→ N p(0, a

(m)Σa(m)

where

a

(m) =

∂f (m)∂x1

. . . ∂f (m)∂x p

.

Proof . A similar argument to the proof of Theorem 5.12 can be given here which is

omitted here.

Example 13. Let X n be an i.i.d. sequence of random variables with E (X 1) =

µ, E (X 2i ) = µ2 + σ2, E (X 31) = µ3 and E (X 41) = µ4 < ∞. Find sequences of constants an

and bn such that an(S 2n − bn) converges in distribution to a nontrivial random variable

where S 2n = 1n

ni=1(X i − X )2.

Solution. Let Y

i = (X i, X 2i ), i = 1, 2, . . . , n. We have m = E (Y

i) = (µ, µ2 + σ2). The

variance and covariance matrix to get

Σ =

σ2 µ3 − µ(µ2 + σ2)

µ3 − µ(µ2 + σ2) µ4 − (µ2 + σ2)2

55



We have

√ n(Y − m) d→ N 2(0, Σ).

Define g(x, y) = y − x2 and notice that

√ n(g( X, X 2) − g(µ, µ2 + σ2))

d→ N (0, θ2).

To find θ2, calculate θ2 = a

(m)Σa(m) where

a

(m) = ∂g(m)

∂x ,

∂g(m)

∂y .

We have

∂g∂x , ∂g

∂y

= [−2x, 1] which shows that a(m)

= [−2µ, 1]. For the case that µ = 0

we get θ2 = a

(m)Σa(m) = µ4 − σ4.

56



Chapter 6

Martingales.

Definition 6.1. A process X n : n = 0, 1, . . . is a martingale if for n = 0, 1, 2, . . .

(i) E (|X n|) < ∞(ii) E (X n+1|X 0, . . . , X n) = X n.

If X n be the player’s fortune at stage n of a game the the martingale property says

that a game being fair.

Definition 6.2. Let X n : n = 0, 1, . . . and Y n : n = 0, 1, . . . be two stochastic

processes. Then X n is said to be a martingale with respect to Y n if

(i) E (|X n|) < ∞(ii) E (X n+1|Y 0, . . . , Y n) = X n.

Theorem 6.1. Let X and Y be two random variables. We have

E (E (Y |X )) = E (Y )

and

V ar(Y ) = E (V ar(Y |X )) + V ar(E (Y |X )).

Proof. We have

E (Y |X = x) =

∞−∞

ydF (y|x)

and

E (E (Y |X )) =

∞

−∞

∞

−∞

ydF (y|x)dF (x) =

∞

−∞

∞

−∞

ydF (x, y) = E (Y ).

From

E (V ar(Y |X )) = E (E (Y 2|X ) − (E (Y |X ))2) = E (Y 2) − E (E (Y |X ))2)

57





Proof . Obviously E (|X n|) < ∞. Moreover

E (X n+1|Y 0, . . . , Y n) = E [(Y n+1 +n

k=1

Y k)2 − (n + 1)σ2|Y 0, . . . , Y n)]

= E [Y 2n+1 + (n

k=1

Y k)2 + 2Y n+1

nk=1

Y k − (n + 1)σ2|Y 0, . . . , Y n)] = σ2 + X n − σ2 = X n.

Example 3. A ball is drawn at rndom from the urn of balls with a combination of one

red and one green color. The ball and one more of the same color are then returned to

the urn. Repeat this experiment indefinitely. Let

X n = Number of red balls

Number of balls

and

Y n = Number of red balls = (n + 2)X n.

Given Y n = k, we have

P (Y n+1 = k + 1

|Y n = k) =

k

n + 2

, P (Y n+1 = k

|Y n = k) = 1

−

k

n + 2

.

We have

E (Y n+1|Y n = k) = (k + 1)k + k(n + 2 − k)

n + 2 =

k(n + 3)

n + 2 .

Therefore

E (Y n+1|Y n) = bnY n, bn = k(n + 3)

n + 2 .

Therefore

E (Y n+1/(n + 3)

|Y n) = Y n/(n + 2).

Therefore X n is a martingale.

Example 4. (Doob’s martingale).Let Y 0 = 0 and Y i be a sequence of i.i.d. random

variables and X be a random variable satisfying E (|X |) < ∞. Then

X n = E (X |Y 0, . . . , Y n)

59



is a martingale with respect to Y i.

Proof . We have

E (|X n|) = E [|E (X |Y 0, . . . , Y n)] ≤ E [E (|X ||Y 0, . . . , Y n)] = E (|X |) < ∞.

Also

E (X n+1|Y 0, . . . , Y n) = E (E (X |Y 0, . . . , Y n+1)|Y 0, . . . , Y n) = E (X |Y 0, . . . , Y n) = X n.

Note: E [E (X

|Y, Z )

|Y ] = E (X

|Y ).

Example 5. (Likelihood ratio). Let Y 0 = 0 and Y i be a sequence of i.i.d. random

variables and f 0 and f 1 be two p.d.f.. Define

X n = f 1(Y 0) · · · f 1(Y n)

f 0(Y 0) · · · f 0(Y n), n = 0, 1, 2, . . . .

If Y ii.i.d.

f 0 then we have

E (X n+1|Y 0, . . . , Y n) = E

X n f 1(Y n+1)f 0(Y n+1)

|Y 0, . . . , Y n

= X n ∞−∞

f 1(y)f 0(Y )

f 0(y)dy = X n.

Therefore X n is a martingale with respect to Y i.

Example 6. (Wald martingale). Let X i be a sequence of i.i.d. random variables with

M (t) = E (exp(tX )) < ∞. Define:

Y n = exp(λ(X 1 + · · · + X n))

(M (λ))n .

We have (λ(X X ))

Documents

Fatou's Lemma