32
8/14/2019 Chap5&6 http://slidepdf.com/reader/full/chap56 1/32 Part III Stochastic Analysis 153

Chap5&6

Embed Size (px)

Citation preview

Page 1: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 1/32

Part III

Stochastic Analysis

153

Page 2: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 2/32

Page 3: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 3/32

Page 4: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 4/32

156 CHAPTER 5. INTRODUCTION

SH

Stock price: S0

ST

t = 0 t = 1

VH = max(0,SH-K)

Option value: V0

VT = max(0,ST-K)

t = 0 t = 1

Figure 5.1: Binomial Tree for the Stock Price and the Option Value.

Example 95 (Ordinary Least Squares Regression)Consider the following econometric relation between the endogenous variable yt and the explanatory variable(s) xt:

yt = xtβ + εt.

Standard assumptions are

E (εt) = 0,

V ar(εt) = σ2.

Let X , Y be a sample of data. The ordinary-least-squares (OLS) estimator β of β is

β =

X X −1

X Y.

Typical tasks are to show that β is an unbiased, consistent and asymptotically normally distributedestimator of β .

Page 5: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 5/32

Chapter 6

Probability Theory

6.1 Probability Foundations

Example 96 (Dice)

Let us roll a dice.

• “Rolling a Dice” is an experiment.

• The set of states (also called the sample space ) is Ω = 1, 2, 3, 4, 5, 6.

• Let ω ∈ Ω represent an outcome , or sample point. If you roll a 3, ω = 3.

Example 97 (Coin Toss)Let us flip a coin three times.

• The sample space is Ω = H H H , T T T , H H T , T T H , H T H , H T T , T H T , T H H .

• The event “First Toss is Head” = ω ∈ Ω : ω1 = H = HHH,HHT,HTH,HTT .

Definition 119 (Event)An event A is a subset of the sample space Ω.

Example 98 (Coin Toss-continued)• If the probability P(H ) = 1

2 , then the probability of “All Three Tosses Come up Head” is P(HHH ) = 1

2 · 12 · 1

2 = 18 .

• More generally, if the probability P(H ) = p, then

P(HHH ) = p3,

P(HHT ) = p · p · (1 − p) = p2(1 − p),

P(“First toss is Head”) = P( HHH,HHT,HTH,HTT )

= P(HHH ) + P(HHT ) + P(HT H ) + P(HT T )

= p3 + p2(1 − p) + p2(1 − p) + p(1 − p)2.

157

Page 6: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 6/32

158 CHAPTER 6. PROBABILITY THEORY

HHH

HH

HHT

H HTH

THH

HTTH

HTT

T THT

TTH

TT

TTT

Figure 6.1: Binomial Tree Representation for Events.

It is important to keep in mind that the probability measure P(·) is a function that assigns toan event A the chance P(A) that the event occurs.

6.1.1 How can we represent information?

Key questions:

• What events have been resolved at time t = 1, i.e., what do I know at time t = 1?

Answer: I know whether the following events AH and AT are true or false:

AH = HHH,HHT,HTH,HTT ,

AT = T H H , T H T , T T H , T T T .

• More generally, what do I know at time t = 0, 1, 2, 3?

Answer:

1. The knowledge at time t = 0 is F 0 = ∅, Ω.

2. The knowledge at time t = 1 is F 1 = AH , AT , ∅, Ω.

3. The knowledge at time t = 2 is F 2 = AHH , AHT , ATH , ATT , unions of these sets, AT (=ATH ∪ ATT ), AH ,..., ∅, Ω.

4. The knowledge at time t = 3 is F 3 = AHHH , AHHT ,..., all subsets of Ω.

Remember the definition of an algebra from one of the earlier classes? F i, i = 0, 1,..., 3 arecalled σ-algebras. One can think of σ-algebras as “records of information.”

Definition 120 (Sigma-Algebra and Filtration)• A σ-algebra F is a collection of sets A (called events) such that

1. ∅ ∈ F .

Page 7: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 7/32

6.2. PROBABILITY MEASURE 159

2. If A ∈ F , then Ac ∈ F .3. If A1, A2, ... ∈ F , then

∞∪i=1

Ai ∈ F .

• A sub-σ-algebra F i is a σ-algebra with the property that F i ⊂ F .

•A filtration is a sequence of sub-σ-algebras

F i, i = 1,..., with the property

F 0

⊂ F 1

⊂ F 2

⊂... ⊂ F .

Note that F 0 ⊂ F 1 ⊂ F 2 ⊂ F 3 in the above example. Hence, this sequence is a filtration.With every set F i in the sequence, the amount of information becomes more or, in other words,finer.

6.2 Probability Measure

In the previous example we have already made use of the fact that a probability measure P(·)is a function that assigns to an event A the chance P(A) that the event occurs. Let us define

probability measures succinctly.

Definition 121 (Probability Measure and Probability Space)• A probability (measure) is a function P : F → [0, 1] such that

1. P(∅) = 0.

2. P(Ω) = 1.

3. If A1, A2, ... ∈ F are disjoint, then

P(∞∪i=1

Ai) =∞i=1

P(Ai).

• A probability space is the triple (Ω, F ,P).

Example 99 (Dice)Let A1 = 2, 4, 6, A2 = 1, so they are disjoint. If P(A1) = 1

2 , P(A2) = 16 , then P(A1 ∪ A2) =

12 + 1

6 = 23 .

Proposition 56 (Important Properties of Probability Measures)1. P(A) ≤ 1.

2. P(A) + P(Ac) = 1.

3. P(A∩

B) + P(A∩

Bc) = P(A).

4. P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

5. If A ⊂ B, then P(A) ≤ P(B).

6. If Ai is a partition (i.e., a set of subsets that mutually exclusive and exhaustive), then

P(A) =∞i=1

P(A ∩ Ai).

Page 8: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 8/32

160 CHAPTER 6. PROBABILITY THEORY

6.2.1 Conditional Probability and Independence

In the following we introduce the concepts of conditional probability, independence, and we explainthe important Bayes’ Rule.

Definition 122

If A, B are events in F , then the conditional probability of A given B is

P(A | B) =P(A ∩ B)

P(B).

Example 100 (Dice)What is the probability of 6 conditional on rolling an even number?

P(6 | 2, 4, 6) =P(A ∩ B)

P(B)=

P(6)

P (2, 4, 6)=

1612

=1

3.

Proposition 57Some useful results:

1. Partition Equation: If Bi is a partition, then

P(A) =∞i=1

P(A ∩ Bi) =∞i=1

P(A | Bi)P(Bi).

2. Bayes’ Rule (Simple Version):

P(B | A) = P(A | B)P(B)

P(A).

3. Bayes’ Rule (General Version): Suppose P(A) > 0, Bi is a partition. Then

P(Bn | A) = P(A | Bn)P(Bn)

∞nP(A | Bn)P(Bn)

.

Note: Bayes’ Rule allows you to switch the conditioning!

Example 101 (Corporate Finance)Consider a firm that has a project. The project can be either a good or a bad project. Let P(good project) = 3

7 , P(bad project) = 47 . The manager of the firm misstates the quality of the project

with probability 18 . Then

P(good project | manager says it is good)

= P(manager says it is good | good project ∗

)P(good project)

P(manager says it is good ∗∗

)

=78 × 3

72556

=21

25.

Page 9: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 9/32

6.2. PROBABILITY MEASURE 161

One obtains

∗ =7

8.

∗∗ = P(manager says good ∩ good project) + P(manager says good ∩ bad project)

= P(manager says good

|good)P(good ) + P(manager says good

|bad)P(bad)

=78

· 37

+18

· 47

=2556

.

Definition 123 (Independence)• Two events A and B are independent, if

P(A ∩ B) = P(A)P(B).

•A sequence of events A1, A2,...

∈ F are mutually independent if, for any sequence

Aij,...,Aik of events,

P(k

j=1

Aij) =k

j=1

P (Aij) .

Example 102 (Coin Toss)P(HH ) = P(H ) · P(H ) = 1

2 · 12 = 1

4 .

Page 10: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 10/32

162 CHAPTER 6. PROBABILITY THEORY

Figure 6.2: Binomial Tree for the Stock Price.

6.3 Random Variable and Distribution

Example 103 (Binomial Stock Pricing)Let the stock price at time t be S t with S 0 = 4. Assume the stock price increases either by (u − 1) percent from S t to S t+1 = uS t ∀t if the coin flip at time t comes up H or it changes by (d − 1) percent from S t to S t+1 = dS t ∀t if the coin flip comes up T . See Figure 6.2 for an illustration of the associated binomial tree with three coin tosses. The stock price at t = 3 equals

S 3(ω1ω2ω3) =

32 if ω1 = ω2 = ω3 = H,8 if two heads and one tail,2 if one head and two tails,

.5 if ω1 = ω2 = ω3 = T.

Hence, S 3 is a random variable (as of time t = 0, 1, 2).

Note that a function is measurable relative to F if the function does not convey more informationthan is contained in F . In particular, a measurable function may not take different values for ω1

and ω2, if they are not distinguishable from each other in F (i.e., there do not exist two disjointA1 ∈ F and A2 ∈ F such that, w.l.o.g., ω1 ∈ A1, ω2 ∈ A2 but ω1 /∈ A2, ω2 /∈ A1).

Page 11: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 11/32

Page 12: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 12/32

Page 13: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 13/32

6.3. RANDOM VARIABLE AND DISTRIBUTION 165

4. More generally, the rth moment of X , for r = 1, 2, 3,..., is E (X r) =ω

[X (ω)]r P(ω) if Ω is

countable, and E (X r) = ω

[X (ω)]rdP(ω) if Ω is uncountable.

5. More generally, the rth central moment of X , for r = 1, 2, 3,..., is µrX ≡ E [(X − E (X ))r].

6. The skewness of X is µ3

X(µ2

X)3/2 . The skewness measures the asymmetry in the distribution.

7. The kurtosis of X is µ4X

(µ2X)2

. The kurtosis measures the peakedness or flatness of the distri-

bution.

Note that

h(x)dF (x) in the above definition is not a standard Riemann integral, but a moregeneral version of integral known as Lebesgue integral.

Example 110 (Poisson Distribution)The expectation of a Poisson distribution with parameter λ is

E (X ) =∞

j=0

jP(X = j)

=∞

j=0

je−λ λj

j!

= λ∞

j=1

λj−1

( j − 1)! eλ

e−λ

= λeλe−λ

= λ.

Figure 6.3 illustrates the effect on the shape of the probability density function of changing themoments of the distribution.

6.3.3 Distribution Function and Density Function

Definition 127The cumulative distribution function (cdf) of X is F X(x) ≡ P(X ≤ x) ∀x.

Note that P(X

≤x1)

≤P(X

≤x2) if x1

≤x2.

Proposition 58 (Properties of the cdf)A function F (x) is a cumulative distribution function if and only if

1. limx→−∞F (x) = 0, lim

x→∞F (x) = 1,

2. F (x) is nondecreasing in x,

3. F (x) is right-continuous.

Page 14: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 14/32

166 CHAPTER 6. PROBABILITY THEORY

-6 -4 -2 2 4 6

0.1

0.2

0.3

0.4

(a) Two pdfs with different mean.

-6 -4 -2 2 4 6

0.1

0.2

0.3

0.4

(b) Two pdfs with different variance.

1 2 3 4 5

0.1

0.2

0.3

0.4

0.5

0.6

(c) Two pdfs with different skewness.

-6 -4 -2 2 4 6

0.1

0.2

0.3

0.4

(d) Two pdfs with different kurtosis.

Figure 6.3: The Effect on the Shape of the Probability Density Function of Changing the Momentsof the Distribution.

Reminder: A function is right-continuous if it is continuous from the right. A function isleft-continuous if it is continuous from the left. See Figure 3.11 for an illustration.

Definition 128The probability density function (pdf) for a continuous random variable X is the function f Xthat satisfies

F X(x) cdf

=

x −∞

f X(t) pdf

dt

Hence by the Fundamental Theorem of Calculus,

f X(x) ≡ d

dxF X(x).

Proposition 59 (Properties of the pdf)A function f (x) is a probability density function if and only if

1. f (x) ≥ 0 ∀x,

Page 15: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 15/32

6.3. RANDOM VARIABLE AND DISTRIBUTION 167

0

0.05

0.1

0.15

0.2

0.25

-8 -6 -4 -2 0 2 4 6 8a b

P(a<x<b)

Figure 6.4: How to Obtain the Probability of an Event from the Probability Density Function.

2.∞ −∞

f (t)dt = 1.

Lemma 6The probability of the event ω : X (ω) ∈ [a, b] can be computed as

P(a ≤ X ≤ b) =

b a

f X(t)dt = F X(b) − F X(a).

(See Figure 6.4)

Note that a random variable X is called a continuous random variable if F X(x) (its cdf) isa continuous function of x, while X is a discrete random variable if F X(x) is a step function of x.

Example 111 (Uniform Distribution)Let X be a uniformly distributed random variable on the interval [a, b], denoted X ∼ U [a, b] (see Figure 6.5). The pdf of X is

f X(x) =

1

b−a, if a ≤ x ≤ b,

0, otherwise.

Page 16: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 16/32

168 CHAPTER 6. PROBABILITY THEORY

-4 -2 2 4

0.05

0.1

0.15

0.2

0.25

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.1

0.2

0.3

0.4

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Figure 6.5: Probability Density Function (top) and Cumulative Distribution Function (bottom) fora Uniform[-2,2] (left) and a Normal(0,1) (right) Distribution.

Example 112 (Normal Distribution, Gaussian Distribution)Let X be a random variable with normal distribution, denoted X

∼N (µ, σ2) (see Figure 6.5). The

pdf of a normally distributed random variable with mean µ and standard deviation σ is

f X(x) =1√2πσ

e−12(x−µ

σ)2.

6.3.4 Comparison of Random Variables

Reminder: A random variable is a function X : Ω → R that is measurable with respect to F . Arandom variable X is continuous if F X(x) (its cdf) is a continuous function of x, while it is discreteif F X(x) is a step function of x.

Example 113 (Dice)

See Figure 6.6.

Example 114 (Coin Toss)When are two random variables the same? Set Ω = H , T , E . Let

X =

1 if ω = H −1 if ω = T 0 if ω = E

Page 17: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 17/32

6.3. RANDOM VARIABLE AND DISTRIBUTION 169

6

-

x

F x(x)

1

6

1

3

1

2

2

3

5

6

1

1 2 3 4 5 6

Figure 6.6: CDF for the Dice example.

and

Y =

1 if ω = H −1 if ω = T 109 if ω = E

.

Are X and Y the same? Note that E has zero probability, P(E ) = 0. We therefore say that X = Y almost surely (a.s.).

Now, let

Z =

−1 if ω = H 1 if ω = T 0 if ω = E

.

Note that X = Z although they have the same probabilities of winning and losing $1.

Example 115 (Three Coin Tosses)Take a experiment in which you toss a coin 3 times. Define X = #heads and Y = #tails,

ω X (ω) Y (ω)

HHH 3 0HHT 2 1HT H 2 1T HH 2 1HT T 1 2T HT 1 2T T H 1 2T T T 0 3

Page 18: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 18/32

170 CHAPTER 6. PROBABILITY THEORY

6

-

x

F x(x)

1

8

1

4

3

8

1

2

5

8

3

4

7

8

1

1 2 3

Figure 6.7: CDF for the Three Coin Tosses example.

Note that X (ω) = Y (ω) ∀ω ∈ Ω. The pdfs for X and Y are given by

X = x P(X = x) Y = y P(Y = y)

0 1/8 0 1/81 3/8 1 3/82 3/8 2 3/83 1/8 3 1/8

Now, let us determine the cdf of X :

X = x F X(x) = P(X ≤ x)

(−∞, 0) 0[0, 1) 1/8[1, 2) 1/2[2, 3) 7/8

[3, ∞) 1

Note that the cdf of F Y (y) is the same. In this case, we say that X = Y in distribution.

Definition 129For random variables X and Y , we say

1. X = Y almost surely, written X a.s.= Y , if P(ω : X (ω) = Y (ω) = 1.

2. X = Y in distribution (or in law, or X and Y are identically distributed), written X d= Y , if

they induce the same probability, i.e., PX = PY .

Page 19: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 19/32

Page 20: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 20/32

Page 21: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 21/32

6.3. RANDOM VARIABLE AND DISTRIBUTION 173

Example 117Let X be a continuous random variable and define Y ≡ X 2. We want to find F X(x) and f X(x).

F Y (y) = P(Y ≤ y) = P(X 2 ≤ y)

= P(−√y ≤ X ≤ √

y)

=P

(X ≤ √y) − P(X ≤ −√y)= F X(

√y) − F X(−√

y).

Then, the pdf of Y is given by

f Y (y) =dF Y (y)

dy=

1

2√

y[f X(

√y) + f X(−√

y)].

Page 22: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 22/32

174 CHAPTER 6. PROBABILITY THEORY

6.4 Multivariate Random Variables

Definition 130A d-dimensional random vector X = (X 1,...,X d) is a vector-valued function X : Ω → Rd suchthat each component X i, i = 1,...,d, is a random variable.

Example 118Let X → N (µX , σ2

X) and Y → N (µY , σ2Y ). In Figure 6.9 we plot three cases for the joint

distribution of X and Y . In each case, the marginal distributions of both X and Y are Gaussian.Yet, the joint distribution is different in each of the three cases. In the first plot, X and Y appear unrelated, whereas in the second (third) plot X and Y move together (in opposite directions).

Example 118 illustrates that we need to know the joint density f X,Y (x, y) in order to fullycharacterize the joint probability distribution of X and Y , in particular the comovement of the twovariables.

Definition 131

In case d = 2, the joint cdf of X and Y is

F X,Y (a, b) = P( X ≤ a, Y ≤ b),

and the probability of the joint event (x, y) ∈ A ⊂ R2 is given by

P((X, Y ) ∈ A) =

A

f X,Y (x, y)dydx,

which, equivalently, gives the joint cdf of X and Y as

F X,Y (a, b) = a

−∞ b

−∞f X,Y (x, y)dydx.

We can obtain the joint pdf of X and Y by the Fundamental Theorem of Calculus,

f X,Y (x, y) =∂ 2F X,Y (x, y)

∂x∂y.

The following plots show the pdf and level curves for a bi-dimensional normal variable.

6.4.1 How does one go from the joint distribution to the marginal distribution?

The marginal probability that X

≤a is given by,

P( X ≤ a) = P( X ≤ a, Y anything) =

a−∞

∞−∞

f X,Y (x, y)dy

dx.

Proposition 62The marginal density f X(x) is obtained by integrating the joint density f X,Y (x, y) with respectto y,

f X(x) =

∞−∞

f X,Y (x, y)dy.

Page 23: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 23/32

6.4. MULTIVARIATE RANDOM VARIABLES 175

-2

-1

0

1

2-2

-1

0

1

2

0

0.2

0.4

0.6

2

-1

0

1

-2 -1 0 1 2

-2

-1

0

1

2

(a) µX = µY = 0, σX = σY = 0.5, ρ = 0

-2

-1

0

1

2-2

-1

0

1

2

0

0.2

0.4

0.6

2

-1

0

1

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(b) µX = µY = 0, σX = σY = 0.6, ρ = 0.7

-2

-1

0

1

2-2

-1

0

1

2

0

0.2

0.4

0.6

2

-1

0

1

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(c) µX = µY = 0, σX = 0.9, σY = 0.6, ρ = −0.9

Figure 6.9: Joint Distribution Function and Level Surfaces for a Bivariate Normal Random Vector

Page 24: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 24/32

176 CHAPTER 6. PROBABILITY THEORY

Similarly,

f Y (y) =

∞−∞

f X,Y (x, y)dx.

Example 119

Let

f X,Y (x, y) =

6xy2, if (x, y) ∈ (0, 1)2,0, otherwise.

1. Is f X,Y (x, y) a pdf? We need to check two features:

(a) Nonnegativity: f X,Y (x, y) ≥ 0 ∀(x, y).

(b) The density must integrate up 1: ∞−∞

∞−∞

f X,Y (x, y)dydx =

10

10

6xy2dydx =

10

3x2y2

1

0dy =

10

3y2dy = y3

1

0= 1.

2. What is the marginal density f X(x)?

f X(x) =

∞−∞

f X,Y (x, y)dy =

10

6xy2dy = 2y3x10

= 2x.

3. What is the marginal probability P(X ∈ (12 , 34))?

P(X ∈ (1

2,

3

4)) =

34

12

2xdx =5

16.

4. What is the probability P

(X + Y ≥ 1)?

A = (X, Y )|X + Y ≥ 1, 0 < X < 1, 0 < Y < 1= (X, Y )|X ≥ 1 − Y, 0 < X < 1, 0 < Y < 1= (X, Y )|1 − Y < X < 1, 0 < Y < 1 .

Hence,

P (X + Y ≥ 1) =

A

f X,Y (x, y)dydx =

10

11−y

6xy2dxdy =9

10.

5. Define Z ≡ X + Y . What is P (Z ≥ 1)? From before, P (Z ≥ 1) = 910 .

Example 120Sample Average X n = 1

n(X 1 + ... + X n).

Page 25: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 25/32

6.5. (HIGHER ORDER) MOMENTS AND INTEGRATION 177

6.5 (Higher Order) Moments and Integration

Example 121Let X → N (µX , σ2

X), and define Y = exp(X ). We want to compute E (X r) and E (Y r), for r = 1, 2, 3,....

Definition 132 (Moments of Continuous and Discrete Random Variables)• The rth moment of a random variable X , r = 1, 2, 3,..., is defined by

µrX ≡ E (X r) =

i xr

iP (X = xi) , if X is discrete, ∞−∞ xrf X(x)dx, if X is continuous and its pdf exists, ∞−∞ xrdF X(x), if X is continuous and its pdf does not exist.

• The expectation µX is the first moment of X and defined by µX ≡ E (X ).

• The variance σ2X is the second central moment of X and defined by σ2

X ≡ V ar(X ) =E [(X

−µ)2].

• Reminder: A random variable X is called a continuous random variable if F X(x) (its cdf) isa continuous function of x, while X is a discrete random variable if F X(x) is a step functionof x.

• Note that

h(x)dF (x) is not a standard Riemann integral, but a more general version of integral known as Lebesgue integral.

• Taking expectations is the same as (Lebesgue) integration over the realizations of the randomvariable X (properly weighted by f X(x)).

• A random variable X is integrable if E |X | < ∞.

6.5.1 Properties of Expectations

Proposition 63 (Properties of Expectations)Suppose that X and Y are integrable ( E |X | < ∞). The expectation operator satisfies the following properties:

1. Linearity:

E (a + bX + cY ) = a + bE (X ) + cE (Y ).

2. σ-Additivity: Let A, B ∈ F , A ∩ B = φ. Then

E [1A∪BX ] =

A∪B

X (ω)dP(ω) =

A

X (ω)dP(ω)+

B

X (ω)dP(ω)

= E [1AX ] + E [1BX ] .

where the indicator function is defined as 1A =

1, if ω ∈ A,0, otherwise.

3. For each event A, E [1A] = P(A).

Page 26: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 26/32

178 CHAPTER 6. PROBABILITY THEORY

4. Comparison and Monotonicity:

(a) If X = Y a.s., then E (X ) = E (Y ).

(b) If X ≥ Y a.s., then E (X ) ≥ E (Y ).

(c) |E (X )| ≤ E (|X |).

5. Transformation: Let Y = g(X ). Then

E [g(X )] =

∞−∞

g(x)f X(x)dx

=

∞−∞

yf Y (y)dy.

6. Jensen’s Inequality: If g(X ) convex: E [g(X )] ≥ g(E (X )).g(X ) concave: E [g(X )] ≤ g(E (X )).

7. For multivariate random variables,

E [g(X, Y )] =

∞−∞

∞−∞

g(x, y)f XY (x, y)dxdy.

8. Dominated Convergence, Monotone Convergence and Fatou’s Lemma: They describe con-ditions under which the limit of a sequence of expectations equals the expectation of the limit.

Proposition 64 (Properties of Variances)1. V ar(a + bX ) = b2V ar(X ).

2. V ar(X ) = E (X 2)−

E (X )2 = E (X 2)−

µ2.

3. V ar(aX + bY ) = a2V ar(X ) + b2V ar(Y ) + 2ab C ov(X, Y ) E [(X−µX)(Y −µY )]

.

6.5.2 Moment Generating Function

How many moments are needed to characterize a distribution? We will see that knowing allmoments is not equivalent to knowing the cdf.

A nice representation of all moments of a distribution is the moment generating function (mgf).

Definition 133

The moment generating function of X is a function ΦX : R→ R defined by

ΦX(u) = E

euX

, u ∈ R.

Proposition 65 (Properties of the MGF)1. The nth moment of X can be constructed from the mgf as E (X n) = Φ

(n)X (0), where Φ

(n)X (0)

is the nth derivative of the mgf evaluated at zero. Hence, if the mgf exists it characterizes aninfinite number of moments.

Page 27: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 27/32

6.5. (HIGHER ORDER) MOMENTS AND INTEGRATION 179

2. Knowing all the moments of X is not sufficient to identify the distribution of X . That is

X d= Y (E (X n) = E (Y n) ∀n = 0, 1, 2,...).

3. Sufficient conditions for the mgf to identify the distribution of X are:

(a) If F X and F Y have bounded support, then

X d= Y ⇔ (E (X n) = E (Y n) ∀n = 0, 1, 2,...).

(b) If ΦX(u) = ΦY (u) in a neighborhood of u = 0, then X d= Y , i.e.,

F X = F Y .

Proof: Take the first derivative

∂ ΦX(u)

∂u=

∂u

∞−∞

euxf X(x)dx

= ∞−∞

∂eux

∂u f X(x)dx

=

∞−∞

xeuxf X(x)dx

= E

XeuX

.

and evaluate it at u = 0 to obtain

∂ ΦX(u)

∂u

u=0

= E (X ) .

More generally,

n

ΦX(u)(∂u)nu=0

= E (X n) ∀n = 1,....

Example 122Take f 1(x) = 1√

2πxe−

(log x)2

2 and f 2(x) = f 1(x) [1 + sin(2π log x)], x > 0. We can see from Figure 6.10

that both pdfs are different, i.e., f 1 = f 2. Nevertheless, all the moments are the same, i.e.,E (X n1 ) = E (X n2 ) for n = 0, 1, 2,....

Page 28: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 28/32

180 CHAPTER 6. PROBABILITY THEORY

1 2 3 4 5

0.2

0.4

0.6

0.8

1

1.2

Figure 6.10: An example of two different distributions that have identical moments.

6.6 Conditioning and Information

Definition 134The conditional density of Y given X = x is

f Y |X(y|x) =

f X,Y (x,y)

f X(x) , if f X(x) > 0,

0, otherwise.(6.1)

• Note that f Y |X(y|x) is a pdf because it fulfills the conditions f Y |X(y|x) ≥ 0 and ∞−∞ f Y |X(y|x)dy =

1.

• Eq. (6.1) can be used to write the joint density as the product of the marginal and theconditional density:

f X,Y (x, y) = f X(x)f Y |X(y|x).

Definition 135 (Conditional Moments)• The conditional expectation of Y given X = x is given by

E [Y |X = x] = ∞−∞

yf Y |Y (y|x)dy.

• The conditional variance of Y given X = x is given by

V ar(Y |X = x) = E

Y 2|X = x− (E [Y |X = x])2.

Page 29: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 29/32

6.6. CONDITIONING AND INFORMATION 181

Figure 6.11: Binomial Tree for the Stock Price.

Example 123 (Binomial Tree Model for Asset Prices)Some features of the model: As in the previous class, let us consider a sequence of three tosses of the coin. The collection of all possible outcomes (i.e. sequences of tosses of length 3) is

Ω = H H H , H H T , H T H , H T T , T H H , T H T , T T H , T T T .

A typical sequence of Ω is denoted ω, which in this case has three elements ω = (ω1, ω2, ω3). LetS n(ω) denote the stock price at time “ n.” Thus, for instance, S 1(ω) ≡ S 1(ω1, ω2, ω3) ≡ S 1(ω1),and S 2(ω) ≡ S 2(ω1, ω2, ω3) ≡ S 2(ω1, ω2). The collection of sets determined by the first n tosses is denoted F n (see previous lecture). For more details, see your notes from the exercises sessions.

Some general comments: The price of the stock is computed as S n = 11+r

E Q [S n+1|F n], where ris the risk-free rate and

F n is the σ-algebra as of time n. In the case of two states, the risk neutral

probabilities are P(up move ) = ˜ p = 1+r−du−d , P(down move ) = q = u−1−r

u−d and thus the expectedvalue, given information up to n is:

E Q [S n+1|F n] = ˜ pS n+1 (ω1,...,ωn, H ) + qS n+1 (ω1,...,ωn, T ) .

Note that this expected value is under the measure Q. The expected value under the objective probability measure is

E [S n+1|F n] = pS n+1 (ω1,...,ωn, H ) + qS n+1 (ω1,...,ωn, T ) .

Page 30: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 30/32

182 CHAPTER 6. PROBABILITY THEORY

Application of conditional probabilities:

What is the best estimate of S 1 given S 2? Think of S 2 as a r.v. Y and S 1 as a r.v. X . Note that E [Y |X ] = E [S 2|S 1] is a random variable itself.

1) E [S 1|S 2 = y], where y = S 2(ω), is a random variable (i.e., it depends on ω). Thus, we canwrite E [S 1|S 2] (ω).

2) E [S 1|S 2] is F 2-measurable, that is, if the value of S 2 is known, the value of E [S 1|S 2] shouldalso be known.a) Given ω ∈ HHH,HHT , we know S 2(ω) = u2S 0, thus

E [S 1|S 2] (HHH ) = E [S 1|S 2] (HHT ) = uS 0.

b) Given ω ∈ T T T , T T H , we know S 2(ω) = d2S 0, thus

E [S 1|S 2] (T T T ) = E [S 1|S 2] (T T H ) = dS 0.

c) Given ω ∈ A = H T H , H T T , T H H , T H T , we know S 2(ω) = udS 0. However, we do notknow whether S 1 = uS 0 or S 1 = dS 0. The value of E [S 1|S 2] (ω) is given by

E [S 1|S 2] (ω) = uS 0P(uS 0|ω ∈ A) + dS 0P(dS 0|ω ∈ A)

= uS 0P(S 1 = uS 0

∩ω

∈A)

P(ω ∈ A) + dS 0P(S 1 = dS 0

∩ω

∈A)

P(ω ∈ A) .

Note that

P(ω ∈ A) = p2q + pq2 + p2q + pq2 = 2 pq ( p + q) ==1

2 pq.

P(S 1 = uS 0 ∩ ω ∈ A) = p2q HTH

+ pq2 HTT

= pq ( p + q) ==1

pq.

P(S 1 = dS 0 ∩ ω ∈ A) = p2q THH

+ pq2 THT

= pq ( p + q) ==1

pq.

Finally,

E [S 1|S 2] (ω ∈ A) = uS 0 pqP(ω ∈ A) + dS 0 pq

P(ω ∈ A)

=1

2(u + d) S 0.

In summary, E [S 1|S 2] is a random variable with the following realizations:

E [S 1|S 2] (ω) =

uS 0, if ω ∈ HHH,HHT ,12 (u + d) S 0, if ω ∈ A,dS 0, if ω ∈ T T T , T T H .

We will see later that E [S 1|S 2] is the “best” predictor of S 1 given S 2.

Notation 1In the following we oftentimes condition on a σ-algebra F X instead of a random variable X . Whatwe mean by this is E [Y |X ] = E [Y |F X ] where F X is the sub-σ-algebra generated by the randomvariable X , that is,

F X = X −1(A) =

F ∈ F : F = X −1(A), A ∈ A ,

where A is the smallest σ-algebra containing all open sets of the form (a, b). In words, knowing the realization of X is equivalent to knowing whether X (ω) ∈ A ∈ A and, by definition of F X , is equivalent to knowing whether ω ∈ F = X −1(A) ∈ F X has ocurred.

Page 31: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 31/32

6.6. CONDITIONING AND INFORMATION 183

6.6.1 Properties of Conditional Expectations

Proposition 66 (Properties of Conditional Expectations)Let (Ω, F ,P) be a probability space. The following properties apply to conditional expectations:

1. E [X |F ] is a random variable and E [X |F ] is F -measurable.

2. Linearity:

E [a + bX + cY |F ] = a + bE [X |F ] + cE [Y |F ] .

3. Positivity: If X ≥ 0 a.s., then E [X |F ] ≥ 0.

4. Take out what is known: If Z is F -measurable, then

E [ZX |F ] = ZE [X |F ] .

5. If F 0 ⊂ F 1, then E [X |F 0] is F 1-measurable. Hence,

E [E [X |F 0] |F 1] = E [X |F 0] .

6. Law of Iterated Expectations: If F 0 ⊂ F 1, then

E [E [X |F 1] |F 0] = E [X |F 0] .

6.6.2 Independence and Correlation

There are different ways in which random variables can depend on each other.

Definition 136 (Independence)

Let (X, Y ) be random variables with joint pdf f X,Y (x, y) and marginal pdf f X(x) and, respectively,f Y (y). X and Y are independent random variables if

f X,Y (x, y) = f X(x)f Y (y),

or, equivalently,

F X,Y (x, y) = F X(x)F Y (y).

An alternative measure of the dependence between random variables are the covariance and thecorrelation coefficient.

Definition 137 (Covariance and Correlation)

• The covariance of X and Y is given by

Cov(X, Y ) = E [(X − µX) (Y − µY )] = E [XY ] − E [X ] E [Y ] .

• The correlation of X and Y is defined as

ρX,Y = Corr(X, Y ) =Cov(X, Y )

V ar(X )

V ar(Y ).

Page 32: Chap5&6

8/14/2019 Chap5&6

http://slidepdf.com/reader/full/chap56 32/32

184 CHAPTER 6. PROBABILITY THEORY

The correlation coefficient ρX,Y lies between −1 and +1 (Proof: Use the Cauchy-SchwartzInequality). We say

ρ = −1 : X and Y are perfectly negatively correlated.ρ ∈ [−1, 0) : X and Y are negatively correlated.ρ = 0 : X and Y are uncorrelated.ρ ∈ (0, 1) : X and Y are positively correlated.ρ = 1 : X and Y are perfectly positively correlated.

Proposition 67 (Properties of Independent Random Variables)• If X and Y are independent, the conditional pdf is equal to the marginal pdf, i.e.,

f Y |X(y|x) =f X,Y (x, y)

f X(x)=

f X(x)f Y (y)

f X(x)= f Y (y).

• If X and Y are independent,

E [g(X )h(Y )] = E [g(X )] E [h(Y )] , (6.2)

andE [h(Y )|X ] = E [h(Y )] .

• If X and Y are independent, (6.2) implies E [XY ] = E [X ] E [Y ] and, hence,

Cov(X, Y ) = 0,

V ar(X + Y ) = V ar(X ) + V ar(Y ),

ΦX+Y (u) = ΦX(u) + ΦY (u).