Law of Large Numberstcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-05.pdfLaw of Large Numbers June 10, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE Todays

Law of Large Numbers

June 10, 2020

来嶋秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

Law of large numbers w/ proof

• Markov’s ineq.

• Chebyshev’s ineq.

Central limit theorem

• w/ affine trans. of r.v.

確率統計特論 (Probability & Statistics)

Lesson 5

2

Midterm exam (中間試験)

Date/time: June 24 (6/24), 13:00- 14:30

Place (場所): at moodle.

Submit electronic files (incl. photo: recommended). ≤10MB.

Keep your “original data” (I may ask to submit them later).

電子ファイルを提出 (写真可: 推奨).10MB以内.

紙/データを手元に保存しておくこと

(後日提出を求める場合がある)．

Topics (範囲):

Fundamental probability (May 13 – June 17).

check the course page (講義ページを参照のこと)

http://tcs.inf.kyushu-u.ac.jp/~kijima/

Books, notes, google, etc. are allowed to use (持ち込み可).

Communication (e-mail, SNS, BBS) is prohibited (相談不可).

Today’s Summary3

Thm. (law of large numbers; 大数の法則)

Suppose 𝑋1, … , 𝑋𝑛 are i.i.d., w/ expectation 𝜇, and variance 𝜎2,

then 𝑋1+⋯+𝑋𝑛

𝑛converges 𝜇 in probability;

i.e., ∀𝜀 > 0, lim𝑛→∞

Pr𝑋1+⋯+𝑋𝑛

𝑛− 𝜇 < 𝜀 = 1.

Thm. (Central limit theorem; 中心極限定理)


then 𝑍𝑛 ≔1

𝑛σ𝑖=1𝑛 𝑋𝑖−𝜇

𝜎converges to N(0,1) in distribution.

i.e., lim𝑛→∞

Pr 𝑍𝑛 < 𝑧 = −∞

𝑧 1

2𝜋e−

𝑥2

2 d𝑥 .

Prove it.

Make sense?

1. Road to Law of Large Numbers

w/ coupon collector

1.1. Markov’s inequality

1.2. Chebyshev’s inequality

1.3. Proof of law of large numbers

Ex. Coupon collector5

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

•ビックリマンシール

•ポケモンカード





Suppose you have already drawn 𝑘 − 1 kinds of coupon.

Let 𝑋𝑘 denote the number of draws from 𝑘 − 1 to 𝑘.

The probability is 𝑝𝑘 ≔𝑛−(𝑘−1)

𝑛

The expected number is

E 𝑋𝑘 =1

𝑝𝑘=

𝑛

𝑛 − 𝑘 + 1



Thm.

𝑛 ln𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛







harmonic number

E 𝑋 = E

𝑖=1

𝑛

𝑋𝑖

=

𝑖

𝑛

E 𝑋𝑖

=

𝑖=1

𝑛𝑛

𝑛 − 𝑖 + 1

= 𝑛

𝑖′=1

𝑛1

𝑖′

ln 𝑛 = න1

𝑛 1

𝑥d𝑥 ≤

𝑘=1

𝑛1

𝑘

1 +

𝑘=2

𝑛1

𝑘≤ 1 +න

1

𝑛 1

𝑥d𝑥 = 1 + ln𝑛

e.g., n=100, then

ln 100 ≃ 4.605, and hence

460 ≤ 𝐸 𝑋 ≤ 561





What is the probability of completion after 𝑚 trials?



1.1. Markov’s inequality

Markov’s inequality10

Thm. Markov’s inequality

Let X be a nonnegative random variable, then

Pr 𝑋 ≥ 𝑎 ≤E 𝑋

𝑎holds for any a 0.

Markov’s inequality11

E𝑋

𝑎= න

0

∞ 𝑥

𝑎𝑓(𝑥)d𝑥 = න

0

𝑎 𝑥

𝑎𝑓(𝑥)d𝑥 + න

𝑎

∞ 𝑥

𝑎𝑓(𝑥)d𝑥

≥ න𝑎

∞ 𝑥

𝑎𝑓(𝑥)d𝑥 ≥ න

𝑎

∞

𝑓(𝑥) d𝑥 = Pr[𝑋 ≥ 𝑎]

Pr 𝑋 ≥ 𝑎 ≤ E𝑋

𝑎=E 𝑋

𝑎

Thus,

Proof.

Thm. Markov’s inequality

Let X be a nonnegative random variable, then

Pr 𝑋 ≥ 𝑎 ≤E 𝑋

𝑎holds for any a 0.





What is the probability of completion after 𝑚 trials?



Using Markov’s inequality,

Pr 𝑋 ≥ 𝑚 ≤𝐸 𝑋

𝑚≤𝑛 1 + ln𝑛

𝑚e.g., n=100, m=1000,

Pr 𝑐𝑜𝑚𝑝100(1000) = 1 − Pr 𝑋 ≥ 1001 ≥ 1 −100 × (1 + ln(100))

1001≃ 0.44

e.g., n=100, m=10000,

Pr 𝑐𝑜𝑚𝑝100(10000) = 1 − Pr 𝑋 ≥ 10001 ≥ 1 −100 × (1 + ln(100))

1001≃ 0.94

too loose?

rem.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

1.2. Chebyshev’s inequality

Chebyshev’s inequality14

Thm. Chebyshev’s inequality

For any a 0.

Pr 𝑋 − E 𝑋 ≥ 𝑎 ≤Var 𝑋

𝑎2

Remark that

Pr 𝑋 − E 𝑋 ≥ 𝑎 = Pr 𝑋 − E 𝑋 2 ≥ 𝑎2

Using Markov’s inequality,

Pr 𝑋 − E 𝑋 2 ≥ 𝑎2 ≤E 𝑋 − E 𝑋 2

𝑎2=Var 𝑋

𝑎2

proof.

Chebyshev’s inequality15

Cor. Chebyshev’s inequality

For any t 0.

Pr 𝑋 ≥ 1 + 𝑡 E 𝑋 ≤Var 𝑋

𝑡E 𝑋 2

proof.

Pr 𝑋 ≥ 1 + 𝑡 E 𝑋 = Pr 𝑋 − E 𝑋 ≥ 𝑡E[𝑋]

≤ Pr 𝑋 − 𝐸 𝑋 ≥ 𝑡E 𝑋

≤Var 𝑋

𝑡E 𝑋 2


The are n kinds of coupons.



What is the probability of completion after m trials?



Using Chevyshev’s inequality,

Pr 𝑋 ≥ 1 + 𝑡 𝐸[𝑋] ≤Var 𝑋

𝑡E 𝑋 2

rem.









Var 𝑋

=

𝑖=1

𝑛

Var 𝑋𝑖 =

𝑖=1

𝑛1 − 𝑝𝑖

𝑝𝑖2

≤

𝑖=1

𝑛1

𝑝𝑖2 =

𝑖=1

𝑛𝑛

𝑛 − 𝑖 + 1

2

= 𝑛2

𝑖=1

𝑛1

𝑖2≤ 𝑛2

𝜋2

6

Ex. 2.








Using Chevyshev’s inequality,

Pr 𝑋 ≥ 1 + 𝑡 𝐸[𝑋] ≤Var 𝑋

𝑡E 𝑋 2≤

𝑛2𝜋2

6𝑡2 𝑛 ln𝑛 2

=𝜋2

6𝑡2 ln 𝑛 2

rem.


e.g., n=100, m=1000 (𝑡 ≃𝑚

𝑛 ln 𝑛− 1 ≃ 1.1),

Pr 𝑋 ≥ 1001 ≤ Pr 𝑋 ≥ 1.78 E 𝑋 ≤𝜋2

6 × 0.782 × ln 100 2 ≤ 0.127

Pr 𝑐𝑜𝑚𝑝100 1000 ≥ 1 − 0.127 ≃ 0.87

still loose?

Chernoff’s bound

1.3. Law of Large number

Law of large numbers (大数の法則)20

Def.

A series {𝑌𝑛} converges 𝑌 in probability (𝑌に確率収束する), if

∀𝜀 > 0, lim𝑛→∞

Pr 𝑌𝑛 − 𝑌 < 𝜀 = 1

Thm. (law of large numbers; 大数の法則)



𝑛converge 𝜇 in probability;

i.e., ∀𝜀 > 0, lim𝑛→∞


𝑛− 𝜇 < 𝜀 = 1

independent and identically distributed

(独立同一分布)



𝑛converge 𝜇 in probability;

i.e., ∀𝜀 > 0, lim𝑛→∞


𝑛− 𝜇 < 𝜀 = 1

Thm. (law of large numbers; 大数の法則)21

Proof.

Let 𝑌𝑛 ≔𝑋1+⋯+𝑋𝑛

𝑛, for simplicity.

E 𝑌𝑛 = E𝑋1+⋯+𝑋𝑛

𝑛=

E 𝑋1 +⋯+E[𝑋𝑛]

𝑛=

𝜇+⋯+𝜇

𝑛= 𝜇

Var 𝑌𝑛 = Var𝑋1+⋯+𝑋𝑛

𝑛=

Var 𝑋1 +⋯+Var[𝑋𝑛]

𝑛2=

𝜎2+⋯+𝜎2

𝑛2=

𝜎2

𝑛

By Chebyshev’s inequality,

∀𝜀 > 0, ∀𝑛 > 0, Pr𝑋1+⋯+𝑋𝑛

𝑛− 𝜇 ≥ 𝜀 ≤

Var 𝑌𝑛

𝜀2=

𝜎2

𝑛𝜀2

∀𝜀 > 0, Pr𝑋1+⋯+𝑋𝑛

𝑛− 𝜇 < 𝜀 ≥ 1 −

𝜎2

𝑛𝜀2

𝑛→∞1

2. Central Limit Theorem

Central Limit Theorem (中心極限定理)23

Def.

A series 𝑌𝑛 w/ distribution functions 𝐹𝑛

converges 𝑌 in distribution (𝑌に分布収束する), if

lim𝑛→∞

𝐹𝑛 = 𝐹 where 𝐹 is the distr. func. of 𝑌.

Thm. Central limit theorem


then 𝑍𝑛 ≔1

𝑛σ𝑖=1𝑛 𝑋

𝑖−𝜇


i.e., lim𝑛→∞

Pr 𝑍𝑛 < 𝑧 = −∞

𝑧 1

2𝜋e−

𝑥2

2 d𝑥

pdf of normal distribution24

http://en.wikipedia.org/wiki/Normal_distribution

Distr. func. of normal distrbution25

http://en.wikipedia.org/wiki/Normal_distribution



then 𝑍𝑛 ≔1


𝑖−𝜇


i.e., lim𝑛→∞

Pr 𝑍𝑛 < 𝑧 = ∞−𝑧 1

2𝜋e−

𝑥2

2 d𝑥

Before the proof...


Corollary



𝑛converges to N 𝜇,

𝜎2

𝑛in distribution.


then 𝑍𝑛 ≔1


𝑖−𝜇


i.e., lim𝑛→∞

Pr 𝑍𝑛 < 𝑧 = ∞−𝑧 1

2𝜋e−

𝑥2

2 d𝑥

Prop.

Let 𝑎 ∈ 𝐑>0, 𝑏 ∈ 𝐑. Suppose that 𝑋 ∼ 𝑁(𝜇, 𝜎2), and

let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 ∼ 𝑁 𝑎𝜇 + 𝑏, 𝑎2𝜎2 .

Affine transform. of a normal distribution28

2.1. Affine transform. of a random variable

Prop.

Let 𝑎 ∈ 𝐑>0, 𝑏 ∈ 𝐑. Suppose that 𝑋 is

a discrete random variable w/ pmf. 𝑓𝑋(𝑥), and

let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 follows the pmf.

𝑓𝑌 𝑦 = 𝑓𝑋𝑦−𝑏

𝑎

Affine transform. of a discrete random variable30

Proof.

Since 𝑌:= 𝑎𝑋 + 𝑏,

𝑌 = 𝑦 ⇔ [𝑎𝑋 + 𝑏 = 𝑦] ⇔ 𝑋 =𝑦−𝑏

𝑎

i.e.,

𝑓𝑌 𝑦 = 𝑓𝑋𝑦−𝑏

𝑎.

Pr 𝑌 = 𝑦 Pr 𝑋 =𝑦−𝑏

𝑎

Prop.


a continuous random variable w/ pdf 𝑓𝑋(𝑥), and

let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 follows the pdf.

𝑓𝑌 𝑦 =1

𝑎𝑓𝑋

𝑦−𝑏

𝑎.

Affine transform. of a continuous random variable31

Proof.


𝑌 ≤ 𝑦 ⇔ [𝑎𝑋 + 𝑏 ≤ 𝑦] ⇔ 𝑋 ≤𝑦−𝑏

𝑎

And then …

Prop.


a continuous random variable w/ pdf 𝑓𝑋(𝑥), and

let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 follows the pdf.

𝑓𝑌 𝑦 =1

𝑎𝑓𝑋

𝑦−𝑏

𝑎.

Affine transform. of a continuous random variable32

Proof.


𝑌 ≤ 𝑦 ⇔ [𝑎𝑋 + 𝑏 ≤ 𝑦] ⇔ 𝑋 ≤𝑦−𝑏

𝑎

i.e.,

𝐹𝑌(𝑦) = 𝐹𝑋𝑦−𝑏

𝑎.

By differentiating the both sides, we obtain

𝑓𝑌 𝑦 =1

𝑎𝑓𝑋

𝑦−𝑏

𝑎.

Pr 𝑌 ≤ 𝑦 = Pr 𝑋 ≤𝑦−𝑏

𝑎

Prop.


let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 ∼ 𝑁 𝑎𝜇 + 𝑏, 𝑎𝜎 2 .

Affine transform. of a normal distribution33

Proof.

By the proposition in the previous page, 𝑌 follows the pdf

𝑓𝑌 𝑦 =1

𝑎𝑓𝑋

𝑦 − 𝑏

𝑎

=1

𝑎

1

2𝜋𝜎exp −

𝑦 − 𝑏𝑎

− 𝜇2

2𝜎2

=1

2𝜋𝑎𝜎exp −

𝑦 − 𝑎𝜇 + 𝑏2

2 𝑎𝜎 2.

This implies 𝑌 ∼ 𝑁 𝑎𝜇 + 𝑏, 𝑎2𝜎2 .

Recall

𝑓𝑋 𝑥 =1

2𝜋𝜎exp −

𝑥 − 𝜇 2

2𝜎2

The pdf of 𝑁 𝑎𝜇 + 𝑏, 𝑎2𝜎2 is given by

𝑓 𝑡 =1


𝑡 − (𝑎𝜇 + 𝑏) 2

2 𝑎𝜎 2


Corollary



𝑛converges to N 𝜇,

𝜎2

𝑛in distribution.


then 𝑍𝑛 ≔1


𝑖−𝜇


i.e., lim𝑛→∞

Pr 𝑍𝑛 < 𝑧 = ∞−𝑧 1

2𝜋e−

𝑥2

2 d𝑥

Prop.


let 𝑌:= 𝑎𝑋 + 𝑏. Then, 𝑌 ∼ 𝑁 𝑎𝜇 + 𝑏, 𝑎2𝜎2 .

Apex. Affine transform. of a normal distribution35

Another proof. Since Pr 𝑌 ≤ 𝑦 = Pr 𝑋 ≤𝑦−𝑏

𝑎,

𝐹𝑌 𝑦 = 𝐹𝑋𝑦−𝑏

𝑎=

−∞

𝑦−𝑏

𝑎1

2𝜋𝜎exp −

𝑡−𝜇 2

2𝜎2d𝑡 (∗)

let 𝑠 = 𝑎𝑡 + 𝑏, then d𝑠 = 𝑎d𝑡 and

∗ = −∞

𝑦 1

2𝜋𝜎exp −

𝑡−𝑏

𝑎−𝜇

2

2𝜎21

𝑎d𝑠

= −∞

𝑦 1

2𝜋𝜎exp −

𝑠−𝑏

𝑎−𝜇

2

2𝜎21

𝑎d𝑠

= −∞

𝑦 1


𝑠−(𝑎𝜇+𝑏) 2

2𝑎2𝜎2d𝑠

𝑡 −∞ →𝑦−𝑏

𝑎

𝑠 = 𝑎𝑡 + 𝑏 −∞ → 𝑦

density function of

𝑁 𝑎𝜇 + 𝑏, 𝑎𝜎 2

Sum of random variables

…for a proof of the central limit theorem

Next week:

Ex. Normal distr. 37

Suppose 𝑋 ∼ N 𝜇1, 𝜎12 , 𝑌 ∼ N 𝜇2, 𝜎2

2 are independent.

Compute the density function of 𝑍 ≔ 𝑋 + 𝑌.

𝑓𝑍 𝑥 = න−∞

∞

𝑓𝑋 𝑡 𝑓𝑌 𝑥 − 𝑡 d𝑡

= න−∞

∞ 1

2𝜋 𝜎1exp −

𝑡 − 𝜇12

𝜎12

1

2𝜋 𝜎2exp −

𝑥 − 𝑡 − 𝜇22

𝜎22 d𝑡

= ⋯

Hard!

Documents

Law of Large Numberstcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-05.pdfLaw of Large Numbers June 10, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE Todays