41
Probability and Statistics Review Thursday Mar 12

Probability and Statistics Review

Embed Size (px)

DESCRIPTION

Probability and Statistics Review. Thursday Mar 12. מודל למידה. מה המשתנים החשובים? מה הטווח שלהם? מהן הקומבינציות החשובות?. חזרה מהירה על הסתברות. מאורע , אוסף מאורעות משתנה מקרי הסתברות מותנית , חוק ההסתברות השלמה , חוק השרשרת ,חוק בייס תלות ואי תלות שונות ,תוחלת ושונות משותפת - PowerPoint PPT Presentation

Citation preview

Page 1: Probability and Statistics Review

Probability and Statistics Review

Thursday Mar 12

Page 2: Probability and Statistics Review

מודל למידה

?מה המשתנים החשובים?מה הטווח שלהם?מהן הקומבינציות החשובות

Page 3: Probability and Statistics Review

חזרה מהירה על הסתברות

מאורע , אוסף מאורעות•משתנה מקרי •הסתברות מותנית , חוק ההסתברות השלמה , •

חוק השרשרת ,חוק בייסתלות ואי תלות •שונות ,תוחלת ושונות משותפת••Moments

Page 4: Probability and Statistics Review

Sample space and Events

• Sample Space, result of an experiment• If you toss a coin twice

• Event: a subset of • First toss is head = {HH,HT}

• S: event space, a set of events• Closed under finite union and complements

• Entails other binary operation: union, diff, etc.• Contains the empty event and

Page 5: Probability and Statistics Review

Probability Measure

• Defined over (Ss.t.• P() >= 0 for all in S• P() = 1• If are disjoint, then

• P( U ) = p() + p()• We can deduce other axioms from the above ones

• Ex: P( U ) for non-disjoint event

Page 6: Probability and Statistics Review

Visualization

• We can go on and define conditional probability, using the above visualization

Page 7: Probability and Statistics Review

הסתברות מותנית

-P(F|H) = Fraction of worlds in which H is true that also have F true

)(

)()|(

Hp

HFpHFp

Page 8: Probability and Statistics Review

Rule of total probability

A

B1

B2B3

B4

B5

B6B7

ii BAPBPAp |

Page 9: Probability and Statistics Review

Bayes Rule

• We know that P(smart) = .7• If we also know that the students grade is

A+, then how this affects our belief about his intelligence?

• Where this comes from?

)(

)|()(|

yP

xyPxPyxP

Page 10: Probability and Statistics Review

דוגמא

מתוצרת המפעל מיוצרת B . 10% ו-Aבמפעל פעולות שתי מכונות 5% ו A מהמוצרים המיוצרים במכונה B. 1% במכונה 90% ו-Aבמכונה

הם פגומים.Bמהמוצרים המיוצרים במכונה ?נבחר מוצר אקראי, מה ההסתברות שהוא פגום נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונהA? אחרי ביקור של טכנאי שמטפל במכונהB ממוצרי 1.9% , מוצאים ש

Bהמפעל הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה יהיה פגום?

פתרון:-המוצר A , B-המוצר הנבחר יוצר במכונה Aנגדיר את המאורעות הבאים:

-המוצר שנבחר פגום.B, Cהנבחר יוצר במכונה P(C|A)*P(A)+P(C|B)*P(B)א. עפ"י נוסחת ההסתברות השלמה מתקיים:

0.046=0.01*0.1 +0.05*0.9|P(A|C)=P(C|A)*P(A)/P(C) => P(A ב. עפ"י נוסחת בייס:

C)=0.01*0.1/0.046=0.021

Page 11: Probability and Statistics Review

משתנה מקרי בדיד

מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) •למרחב המאפיינים

בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי •

• Modeling students (Grade and Intelligence):• all possible students• What are events

• Grade_A = all students with grade A• Grade_B = all students with grade A• Intelligence_High = … with high intelligence

Page 12: Probability and Statistics Review

Random Variables

High

low

A

B A+

I:Intelligence

G:Grade

Page 13: Probability and Statistics Review

Random Variables

High

low

A

B A+

I:Intelligence

G:Grade

P(I = high) = P( {all students whose intelligence is high})

Page 14: Probability and Statistics Review

הסתברות משותפת

• Joint probability distributions quantify this • P( X= x, Y= y) = P(x, y)

• How probable is it to observe these two attributes together?• How can we manipulate Joint probability distributions?

.1,2,3,4דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות 1 מס' הפעמים שהספרה Y מס' הספרות השונות המופיעות במס' ו Xיהי

מופיעה.?)X,Y(מהי ההתפלגות המשותפת של הזוג

XY

1 2

0 3/16 6/16

1 0 6/16

2 1/16 0

Page 15: Probability and Statistics Review

חוק השרשרת

• Always true• P(x,y,z) = p(x) p(y|x) p(z|x, y)

= p(z) p(y|z) p(x|y, z) =…

כדי לסבך קצת את העניינים נוסיף לשאלה מקודם

יהיה מס הפעמים שמופיע ספרה גדולה ממש zאת הנתון הבא 2מ

P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x=2)=0.75*[(6/16)/P(x=2)]*[(4/16)/(6/16)]

Page 16: Probability and Statistics Review

Conditional Probability

P X YP X Y

P Y

x yx y

y

)(

),(|

yp

yxpyxP

But we will always write it this way:

events

Page 17: Probability and Statistics Review

הסתברות השולית

• We know P(X,Y), what is P(X=x)?• We can use the low of total probability, why?

y

y

yxPyP

yxPxp

|

,

A

B1

B2B3B4

B5

B6B7

Page 18: Probability and Statistics Review

Marginalization Cont.

• Another example

yz

zy

zyxPzyP

zyxPxp

,

,

,|,

,,

Page 19: Probability and Statistics Review

Bayes Rule cont.

• You can condition on more variables

)|(

),|()|(,|

zyP

zxyPzxPzyxP

Page 20: Probability and Statistics Review

אי תלות

• X is independent of Y means that knowing Y does not change our belief about X.• P(X|Y=y) = P(X) • P(X=x, Y=y) = P(X=x) P(Y=y)

• Why this is true?• The above should hold for all x, y• It is symmetric and written as X Y

Page 21: Probability and Statistics Review

CI: Conditional Independence

• X Y | Z if once Z is observed, knowing the value of Y does not change our belief about X• The following should hold for all x,y,z• P(X=x | Z=z, Y=y) = P(X=x | Z=z) • P(Y=y | Z=z, X=x) = P(Y=y | Z=z) • P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)

We call these factors : very useful concept !!

Page 22: Probability and Statistics Review

Properties of CI

• Symmetry: – (X Y | Z) (Y X | Z)

• Decomposition: – (X Y,W | Z) (X Y | Z)

• Weak union: – (X Y,W | Z) (X Y | Z,W)

• Contraction: – (X W | Y,Z) & (X Y | Z) (X Y,W | Z)

• Intersection: – (X Y | W,Z) & (X W | Y,Z) (X Y,W | Z) – Only for positive distributions!– P()>0, 8, ;

Page 23: Probability and Statistics Review

Monty Hall Problem

You're given the choice of three doors: Behind one door is a car; behind the others, goats.

You pick a door, say No. 1 The host, who knows what's behind the doors,

opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?

Page 24: Probability and Statistics Review

                        

                        

Host mustreveal Goat B

                        

                        

Host mustreveal Goat A

                       

                             

Host revealsGoat A

orHost reveals

Goat B

           

             

Page 25: Probability and Statistics Review

Monty Hall Problem: Bayes Rule

: the car is behind door i, i = 1, 2, 3 : the host opens door j after you pick door i

iC

ijH 1 3iP C

0

0

1 2

1 ,

ij k

i j

j kP H C

i k

i k j k

Page 26: Probability and Statistics Review

Monty Hall Problem

WLOG, i=1, j=3

13 1 11 13

13

P H C P CP C H

P H

13 1 11 1 1

2 3 6P H C P C

Page 27: Probability and Statistics Review

Monty Hall Problem: Bayes Rule cont.

13 13 1 13 2 13 3

13 1 1 13 2 2

, , ,

1 11

6 31

2

P H P H C P H C P H C

P H C P C P H C P C

1 131 6 1

1 2 3P C H

Page 28: Probability and Statistics Review

Monty Hall Problem: Bayes Rule cont.

1 131 6 1

1 2 3P C H

You should switch!

2 13 1 131 2

13 3

P C H P C H

Page 29: Probability and Statistics Review

Moments

Mean (Expectation): Discrete RVs:

Continuous RVs:

Variance: Discrete RVs:

Continuous RVs:

XE X P X

ii iv

E v v XE xf x dx

2X XV E 2X P X

ii iv

V v v 2XV x f x dx

Page 30: Probability and Statistics Review

Properties of Moments

Mean If X and Y are independent,

Variance If X and Y are independent,

X Y X YE E E X XE a aE

XY X YE E E

2X XV a b a V

X Y (X) (Y)V V V

Page 31: Probability and Statistics Review

The Big Picture

Model Data

Probability

Estimation/learning

Page 32: Probability and Statistics Review

Statistical Inference

Given observations from a model What (conditional) independence assumptions

hold? Structure learning

If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

Page 33: Probability and Statistics Review

MLE

Maximum Likelihood estimation Example on board

Given N coin tosses, what is the coin bias ( )? Sufficient Statistics: SS

Useful concept that we will make use later In solving the above estimation problem, we only

cared about Nh, Nt , these are called the SS of this model. All coin tosses that have the same SS will result in the

same value of Why this is useful?

Page 34: Probability and Statistics Review

Statistical Inference

Given observation from a model What (conditional) independence assumptions

holds? Structure learning

If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

We need some concepts from information theory

Page 35: Probability and Statistics Review

Information Theory

• P(X) encodes our uncertainty about X• Some variables are more uncertain that others

• How can we quantify this intuition?• Entropy: average number of bits required to encode X

P(X) P(Y)

X Y

xP xP

xPxp

EXH1

log1

log

Page 36: Probability and Statistics Review

Information Theory cont.

• Entropy: average number of bits required to encode X

• We can define conditional entropy similarly

• We can also define chain rule for entropies (not surprising)

xP xP

xPxp

EXH1

log1

log

YHYXHyxp

EYXH PPP

,

|

1log|

YXZHXYHXHZYXH PPPP ,||,,

Page 37: Probability and Statistics Review

Mutual Information: MI

• Remember independence?• If XY then knowing Y won’t change our belief about X• Mutual information can help quantify this! (not the only

way though)• MI:

• Symmetric• I(X;Y) = 0 iff, X and Y are independent!

YXHXHYXI PPP |;

Page 38: Probability and Statistics Review

Continuous Random Variables

What if X is continuous? Probability density function (pdf) instead of

probability mass function (pmf) A pdf is any function that describes the

probability density in terms of the input variable x.

f x

Page 39: Probability and Statistics Review

PDF Properties of pdf

Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is

0,f x x

1f x

1

0P 0 1X f x dx

1 ???f x

Page 40: Probability and Statistics Review

Cumulative Distribution Function

Discrete RVs

Continuous RVs

X P XF v v

X P Xi

ivF v v

X

vF v f x dx

X

dF x f x

dx

Page 41: Probability and Statistics Review

Acknowledgment

Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html

Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html