Blackboard CLT

Embed Size (px)

Citation preview

  • 8/4/2019 Blackboard CLT

    1/25

    The Central Limit Theorem

    Mitja Stadje

    September, 2011

    Mitja Stadje, QM Additional Material

    http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    2/25

    Central Limit Theorem, CLT

    X1,X2, . . . i.i.d. with EXi = and

    0 < VarXi = 2

  • 8/4/2019 Blackboard CLT

    3/25

    CLT, Application I: Calculating probabilities of sums

    In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to

    compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to know

    the cdf ofX

    1 +X

    2 + . . . +X

    100, i.e.,P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)

    The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that

    100i=1 X

    i N(100, 1002

    ). Therefore

    P[100i=1

    Xi 5] P[100

    i=1 Xi 10010

    5 10010

    ]

    = (5 100

    10).

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    4/25

    CLT, Application I: Calculating probabilities of sums

    In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to

    compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to know

    the cdf ofX

    1 +X

    2 + . . . +X

    100, i.e.,P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)

    The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that

    100i=1 X

    i N(100, 1002

    ). Therefore

    P[100i=1

    Xi 5] P[100

    i=1 Xi 10010

    5 10010

    ]

    = (5 100

    10).

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    5/25

    CLT, Application I: Calculating probabilities of sums

    In case the distribution is known, (in particular, the true mean and the true variance are known), the CLT can be used to

    compute probabilities involving the sum of i.i.d. observations: Forinstance suppose that Xi are i.i.d. with exponential, ort-distribution, or F-distribution or any other standard distribution.Then to compute P[X1 + X2 + . . . + X100 5] we need to knowthe cdf of X

    1+ X

    2+ . . . + X

    100, i.e.,

    P[X1 + X2 + . . . + Xn 5] = FX1+X2+...+X100(5)The problem is that FX1+X2+...+X100 often is very complicated tocompute. However, using the CLT we know that

    100i=1

    Xi N

    (100, 100

    2

    ). Therefore

    P[100i=1

    Xi 5] P[100

    i=1 Xi 10010

    5 10010

    ]

    = (5 100

    10).

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    6/25

    CLT, Application II: Confidence intervals for

    In case that is not known and we have i.i.d. observationsX1, . . . ,Xn we can always estimate with Xn. However, thequestion is how good is this estimate? To answer this question onetypically gives a whole interval around Xn where with probabilitysay = 95% should be in. This is called a confidence interval.Confidence intervals:

    Give a better idea about the possible values of compared tojust a point estimator Xn.

    Can be used to evaluate the quality of the estimate. Larger

    intervals correspond to bad estimates and smaller intervalscorrespond to good estimates.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    7/25

    CLT, Application II: Confidence intervals for if is

    unknown

    Since1 = P

    z/2 Z z/2

    for Z N(0, 1), by the CLT also

    1

    = P

    z/2

    n(Xn )

    z/2. (1)

    Solvingn(Xn)

    z/2 for we get Xn + z/2/

    n.

    Solvingn(Xn)

    z/2 we get Xn z/2/

    n. Therefore,

    1 = PXn

    z/2

    n

    Xn +z/2

    n .

    So is with 1 probability in the intervalXn

    z/2n

    , Xn +z/2

    n

    .

    In case is not known is typically replaced by S2.Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    8/25

    CLT, Application II: Confidence intervals for if is

    unknown

    Since1 = P

    z/2 Z z/2

    for Z N(0, 1), by the CLT also

    1

    = P

    z/2

    n(Xn )

    z/2. (1)

    Solvingn(Xn)

    z/2 for we get Xn + z/2/

    n.

    Solvingn(Xn)

    z/2 we get Xn z/2/

    n. Therefore,

    1 = PXn

    z/2

    n

    Xn +z/2

    n .

    So is with 1 probability in the intervalXn

    z/2n

    , Xn +z/2

    n

    .

    In case is not known is typically replaced by S2

    .Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    9/25

    CLT, Application II: Confidence intervals for if is

    unknown

    Since1 = P

    z/2 Z z/2

    for Z N(0, 1), by the CLT also

    1

    = P

    z/2

    n(Xn )

    z/2. (1)

    Solvingn(Xn)

    z/2 for we get Xn + z/2/

    n.

    Solvingn(Xn)

    z/2 we get Xn z/2/

    n. Therefore,

    1 = PXn

    z/2

    n

    Xn +z/2

    n .

    So is with 1 probability in the intervalXn

    z/2n

    , Xn +z/2

    n

    .

    In case is not known is typically replaced by S2

    .Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    10/25

    CLT, Application III: Hypothesis testing

    Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true

    expectation of our observation is equal to 0. That is we want totest the hypothesis

    H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields

    T =n(Xn 0)

    Z N(0, 1)

    So if H0 is true with probability 1 1

    = P[

    z/2

    T

    z/2].

    Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    11/25

    CLT, Application III: Hypothesis testing

    Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true

    expectation of our observation is equal to 0. That is we want totest the hypothesis

    H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields

    T =n(Xn 0)

    Z N(0, 1)

    So if H0 is true with probability 1 1

    = P[

    z/2

    T

    z/2].

    Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    12/25

    CLT, Application III: Hypothesis testing

    Suppose that is unknown but is known. We are given aconstant 0 (for instance 0 = 0) and we want to test if the true

    expectation of our observation is equal to 0. That is we want totest the hypothesis

    H0 : = 0 against H1 : = 0.Strategy: Assume for a moment that H0 were true. The CLT yields

    T =n(Xn 0)

    Z N(0, 1)

    So if H0 is true with probability 1 1

    = P[

    z/2

    T

    z/2].

    Typically is chosen such that 1 = 99% or 95%. Therefore, ifwe observe in my outcomes that T / [z/2, z/2], it seemsunlikely that the hypothesis H0 is true. (If it were true then with95% probability T [z/2, z/2] which was not the case). So inthis case we reject H0.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    13/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 :

    0?

    (c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    14/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 :

    0?

    (c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    15/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 :

    0?

    (c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    16/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 :

    0?

    (c) Above we only assumed that the observations X1, . . . ,Xn arei.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    CLT A li i III H h i i

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    17/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 : 0?(c) Above we only assumed that the observations X1, . . . ,Xn are

    i.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    CLT A li i III H h i i

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    18/25

    CLT, Application III: Hypothesis testing

    So the CLT can be used to test hypothesis H0. If H0 is rejected we

    can conclude that there is evidence supporting H1. (The otherdirection does not hold.)The region [z/2, z/2] (which under H0 has 95% probability) isalso called the acceptance region. The region R \ [z/2, z/2] iscalled the rejection region of the test. is called the significance

    of the test.

    (a) In the setting above what do you do if you do not know themean and do not know the variance?

    (b) How to test the hypothesis H0 : 0?(c) Above we only assumed that the observations X1, . . . ,Xn are

    i.i.d. What happens if we do not know the variance but weadditionally know that our observations are normal? Could wethen do even better?

    Mitja Stadje, QM Additional Material

    E i

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    19/25

    Excursion

    To prove the CLT we will need moment generating functions:

    Definition

    Suppose that X is a random variable. The moment generatingfunction of X is defined as

    MX(t) = E[exp(tX)].

    Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then

    MX(t) = E[exp(tX)]

    =

    exp(tx)fX(x)dx

    =

    exp(tx)12

    exp(x2/2)dx

    Mitja Stadje, QM Additional Material

    E i

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    20/25

    Excursion

    To prove the CLT we will need moment generating functions:

    Definition

    Suppose that X is a random variable. The moment generatingfunction of X is defined as

    MX(t) = E[exp(tX)].

    Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then

    MX(t) = E[exp(tX)]

    =

    exp(tx)fX(x)dx

    =

    exp(tx)12

    exp(x2/2)dx

    Mitja Stadje, QM Additional Material

    E i

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    21/25

    Excursion

    To prove the CLT we will need moment generating functions:

    Definition

    Suppose that X is a random variable. The moment generatingfunction of X is defined as

    MX(t) = E[exp(tX)].

    Why the name? M(t) = E[X exp(tX)], so M(0) = E[X].M(t) = E[X2 exp(tX)], so M(0) = E[X2]; and so on.Example: If X N(0, 1) then

    MX(t) = E[exp(tX)]

    =

    exp(tx)fX(x)dx

    =

    exp(tx)12

    exp(x2/2)dx

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    22/25

    MX(t) =

    12

    exp (x2 2tx)/2

    dx

    =

    12

    exp (x2 2tx + t2 t2)/2dx

    =

    12

    exp ((x t)2 t2)/2

    dx

    = et2/2

    1

    2 exp (x t)2

    /2

    dx = et2/2

    Other properties: If X and Y are independent then

    MX+Y(t) = MX(t)MY(t).

    Why? Therefore, if Xi is a random sample

    MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =

    MXi(t)n.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    23/25

    MX(t) =

    12

    exp (x2 2tx)/2

    dx

    =

    12

    exp (x2 2tx + t2 t2)/2dx

    =

    12

    exp ((x t)2 t2)/2

    dx

    = et2/2

    1

    2 exp (x t)2

    /2

    dx = et2/2

    Other properties: If X and Y are independent then

    MX+Y(t) = MX(t)MY(t).

    Why? Therefore, if Xi is a random sample

    MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =

    MXi(t)n.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    24/25

    MX(t) =

    12

    exp (x2 2tx)/2

    dx

    =

    12 exp (x2 2tx + t2 t2)/2dx

    =

    12

    exp ((x t)2 t2)/2

    dx

    = et2/2

    1

    2 exp (x t)2

    /2

    dx = et2/2

    Other properties: If X and Y are independent then

    MX+Y(t) = MX(t)MY(t).

    Why? Therefore, if Xi is a random sample

    MX1+X2+...+Xn(t) = MX1 (t) . . .MXn(t) =

    MXi(t)n.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/
  • 8/4/2019 Blackboard CLT

    25/25

    We will need the following result:

    Theorem

    Let Y1,Y2, . . . be a sequence of random variables with distributionfunctions F1,F2, . . . and mgf -s M1,M2, . . . . If the random variable

    Y has distribution function F and mgf M, thenlimn Mn(t) = M(t) for t (h, h), for some h > 0, impliesYn

    d Y.

    Mitja Stadje, QM Additional Material

    http://goforward/http://find/http://goback/