Lecture ANOVA

Embed Size (px)

Citation preview

  • 7/24/2019 Lecture ANOVA

    1/30

    DESIGN OF EXPERIMENTS

    ANALYSIS OF VARIANCE

  • 7/24/2019 Lecture ANOVA

    2/30

    Normal Distribution

    f(y) =

    b

    a

    dyyf )(P(a y b) =

    1)(

    dyyf

    m

    Gaussian Equation:

    s

    Measurement:yi= + ei

  • 7/24/2019 Lecture ANOVA

    3/30

    Estimation of Parameters

    1

    1estimates the population mean

    n

    i

    i

    y yn

    m

    2 2 2

    1

    1( ) estimates the variance

    1

    n

    i

    i

    S y y

    n

    s

    The average (mean):

    The Sample Variance (Standard Deviation)2:

    Measurement:yi= + ei

    m

  • 7/24/2019 Lecture ANOVA

    4/30

    The Standard Normal Distribution

    One way to simplify calculating probabilities is to use aNormal Deviate (z)

    Subtract mfrom allyobservations and the newobservations will have a mean of 0

    Divide these new observations by sand the new standarddeviation becomes 1.

    s

    m

    y

    z

    Result:f(z) = a normal standarddistribution

    2exp2

    1)(

    2z

    zf

  • 7/24/2019 Lecture ANOVA

    5/30

    Areas under the standard normal distribution curve

    Y rea Y rea Y rea Y rea Y rea Y rea Y rea Y rea

    -4,00 0,0000 -3,00 0,0013 -2,00 0,0228 -1,00 0,1587 0,00 0,5000 1,00 0,8413 2,00 0,9772 3,00 0,9987

    -3,99 0,0000 -2,99 0,0014 -1,99 0,0233 -0,99 0,1611 0,01 0,5040 1,01 0,8438 2,01 0,9778 3,01 0,9987

    -3,98 0,0000 -2,98 0,0014 -1,98 0,0239 -0,98 0,1635 0,02 0,5080 1,02 0,8461 2,02 0,9783 3,02 0,9987

    -3,97 0,0000 -2,97 0,0015 -1,97 0,0244 -0,97 0,1660 0,03 0,5120 1,03 0,8485 2,03 0,9788 3,03 0,9988-3,96 0,0000 -2,96 0,0015 -1,96 0,0250 -0,96 0,1685 0,04 0,5160 1,04 0,8508 2,04 0,9793 3,04 0,9988

    -3,95 0,0000 -2,95 0,0016 -1,95 0,0256 -0,95 0,1711 0,05 0,5199 1,05 0,8531 2,05 0,9798 3,05 0,9989

    -3,94 0,0000 -2,94 0,0016 -1,94 0,0262 -0,94 0,1736 0,06 0,5239 1,06 0,8554 2,06 0,9803 3,06 0,9989

    -3,93 0,0000 -2,93 0,0017 -1,93 0,0268 -0,93 0,1762 0,07 0,5279 1,07 0,8577 2,07 0,9808 3,07 0,9989

    -3,92 0,0000 -2,92 0,0018 -1,92 0,0274 -0,92 0,1788 0,08 0,5319 1,08 0,8599 2,08 0,9812 3,08 0,9990

    -3,91 0,0000 -2,91 0,0018 -1,91 0,0281 -0,91 0,1814 0,09 0,5359 1,09 0,8621 2,09 0,9817 3,09 0,9990

    Y

    mz= 0

    sz= 1

    Y

    dzzf )(

  • 7/24/2019 Lecture ANOVA

    6/30

    Portland Cement Formulation

    Observation

    (Sample),j

    Formula 1

    y1,j

    Formula 2

    y2,j1 16.85 17.50

    2 16.40 17.63

    3 17.21 18.25

    4 16.35 18.00

    5 16.52 17.86

    6 17.04 17.75

    7 16.96 18.22

    8 17.15 17.90

    9 16.59 17.96

    10 16.57 18.15

    Average: 16.76 17.92

    1

    2

    1

    1

    1

    16.76

    0.100

    0.316

    10

    y

    S

    S

    n

    Summary Statistics

    Formulation 1

    2

    2

    2

    2

    2

    17.920.061

    0.247

    10

    yS

    S

    n

    Formulation 2

    yij= i+ ij (i=1,2, j=1-10)

  • 7/24/2019 Lecture ANOVA

    7/30

    14 15 16 17 18 19 20 21

    0.316

    0.247

    17.9216.76

    Frequency

    Observation

    12

    1

    1

    1

    16.760.100

    0.316

    10

    yS

    S

    n

    22

    2

    2

    2

    17.920.061

    0.247

    10

    yS

    S

    n

    Cement Formulation Data

    Do these formulations differ?

  • 7/24/2019 Lecture ANOVA

    8/30

    The Two-Sample t-Test

    t0follows a

    t-distribution

    with n1+n2-2degreesof freedom

    21

    210

    11

    nnS

    yyt

    p

    kdegrees

    of freedom

    2)1()1(

    21

    2

    22

    2

    112

    nnSnSnSp

  • 7/24/2019 Lecture ANOVA

    9/30

    The Two-Sample t-Test

    CementFormulation:t0= -9.13

    m1= m2is consideredto be a rare event if

    | t0| is 2.101.

    The P-value (3.6810-8)is the risk ofwrongly rejectingthe null hypothesis ofequal means

    2.5%2.5%

    t distributionwith 18 DOF

  • 7/24/2019 Lecture ANOVA

    10/30

    What If There Are More

    Than Two Factor Levels?

    The t-test does not directly apply

    There are lots of practical situations wherethere are either more than two levels ofinterest, or there are several factors ofsimultaneous interest

    The analysis of variance(ANOVA) is the

    appropriate analysis engine for these typesof experiments

    Used extensively today for industrial

    experiments

  • 7/24/2019 Lecture ANOVA

    11/30

    New synthetic fiber to make cloth for shirts Response variable: tensile strength

    Cotton content vary between 15 and 35%

    Each experiments replicated 5 times

    What is the best weight % of cotton to use?

    Cotton Fiber Example

  • 7/24/2019 Lecture ANOVA

    12/30

    Does changingthecotton weight

    percent change the

    mean tensilestrength?

    Is there an optimumlevel for cottoncontent?

  • 7/24/2019 Lecture ANOVA

    13/30

    In general, there will be alevelsof the factor, or a

    treatments, and nreplicatesof the experiment, run inrandomorder, i.e.a completely randomized design

    N= antotal runs Objectiveis to test for differences between the a means

  • 7/24/2019 Lecture ANOVA

    14/30

    Models for the Data

    Consider the normal (means) model:

    yi j= i+ i j (mi: mean for the i:th treatment)

    If we define:i= + t i

    We will get an effects model:yi j= + t i+ i j

    where t iis the i:th treatmenteffect

  • 7/24/2019 Lecture ANOVA

    15/30

    The basic single-factor ANOVA model is

    1,2,...,,1,2,...,

    ij i iji ayj n

    m t e

    The name analysis of varianceis derived froma partitioning of the total variability in theresponse variable into its components parts

    Analysis of Variance (ANOVA)

    Treatments

    Replications

  • 7/24/2019 Lecture ANOVA

    16/30

    ANOVA1,2,...,

    ,1,2,...,

    ij i ij

    i ay

    j nm t e

    Definitions:

    n

    y

    y

    n

    j

    ij

    i

    1.

    N

    y

    y

    a

    i

    n

    j

    ij

    1 1

    ..

    Observation Mean: Overall Mean:

    (within treatments) (all measurements)

  • 7/24/2019 Lecture ANOVA

    17/30

    The Analysis of Variance

    a measure of thetotal variability

    Total Corrected Sum of Squares:

    a

    i

    n

    j

    i jT yySS1 1

    2

    .. )(

    The basic ANOVA partitioning is:

    2

    .

    1 1

    ...

    1 1

    2

    .. )]()[()( ii j

    a

    i

    n

    j

    i

    a

    i

    n

    j

    i jT yyyyyySS

    TreatmentAverage

  • 7/24/2019 Lecture ANOVA

    18/30

    The Analysis of Variance

    2.

    1 1

    ... )]()[( iij

    a

    i

    n

    j

    i yyyy

    ))((2)()( .1 1...

    2

    .1 1

    2

    1 1... ii j

    a

    i

    n

    jiii j

    a

    i

    n

    j

    a

    i

    n

    jiT yyyyyyyySS

    a

    i

    i yyn1

    2

    ...)(

    n

    j

    iij yy1

    . 0)(

  • 7/24/2019 Lecture ANOVA

    19/30

    The Analysis of Variance

    E rrorsTreatmentsii j

    a

    i

    n

    j

    a

    i

    iT SSSSyyyynSS

    2

    .

    1 11

    2

    ... )()(

  • 7/24/2019 Lecture ANOVA

    20/30

    The Analysis of Variance

    A large value of SSTreatments

    reflects largedifferences in treatment means

    A small value of SSTreatments likely indicates nodifferences in treatment means

    Formal statistical hypotheses are:

    T Treatments E SS SS SS

    0 1 2

    1

    :

    : At least one mean is different

    aH

    H

    m m m

  • 7/24/2019 Lecture ANOVA

    21/30

    The Analysis of Variance

    While sums of squares cannot be directly compared totest the hypothesis of equal means, mean squarescan.

    A mean square (MS) is a sum of squares divided by itsdegrees of freedom:

    1

    )(1

    2

    2

    n

    yy

    S

    n

    i

    i

    Recall the equation for thesample variance:

    1

    )(

    1

    2

    1

    ...

    a

    yyn

    aSSMS

    a

    i

    i

    TreatmentsTreatments

    aN

    yy

    aNSSMS

    a

    i

    i

    n

    j

    ij

    Er ro rEr ro r

    1

    2

    .

    1

    )(

  • 7/24/2019 Lecture ANOVA

    22/30

    The Analysis of Variance

    If the treatment means are equal, the treatment anderror mean squares will be (theoretically) equal.

    1= 2= = a MSTreatment~ MSError

    If treatment means differ, the treatment meansquare will be larger than the error mean square.

    i k MSTreatment> MSError

    If the error mean squares are larger than thetreatment mean squares, there is a PROBLEM

  • 7/24/2019 Lecture ANOVA

    23/30

    The Analysis of Variance

    Rejectthe null hypothesis(equal treatment means) if 0 , 1, ( 1)a a n

    F F

    It turns out that the ratio ofMSTreatmentsandMSErrorfollows the F distributionwith (a-1)and (N-a) degrees of freedom

    Error

    Treatments

    MS

    MS

    F 0

    So the test statisticfor the hypothesis of nodifference in mean is:

  • 7/24/2019 Lecture ANOVA

    24/30

    Cotton Fiber Example

    ANOVA

    Source of

    Variation SS df MS F P-value F crit

    Between Groups 475.76 4 118.94 14.75682 9.13E-06 2.866081

    Within Groups 161.2 20 8.06

    Total 636.96 24

    76.140

    Error

    Treatments

    MS

    MSF

  • 7/24/2019 Lecture ANOVA

    25/30

    The F Distribution

    F0 = 14.76 >> F0.01,4,20

    F0.05,4,20= 2.87 (5% probability)

    F0.01,4,20= 4.43 (1% probability)

    With a-1 (5-1=4) and N-a (25-5=20)

    Degrees of Freedom (F4,20)

  • 7/24/2019 Lecture ANOVA

    26/30

    The Statistical EffectsModel1,2,...,

    ( ) 1,2,...,

    1,2,...,ijk i j ij ijk

    i a

    y j b

    k n

    m t t e

    ti= effect of treatment A

    j= effect of treatment B

    (t)ij= synergistic effect of

    treatments A and B (interaction term)

    MORE FACTORS?

  • 7/24/2019 Lecture ANOVA

    27/30

    Extension of ANOVA to Factorials

    2 2 2

    ... .. ... . . ...

    1 1 1 1 1

    2 2. .. . . ... .

    1 1 1 1 1

    ( ) ( ) ( )

    ( ) ( )

    a b n a b

    ijk i j

    i j k i j

    a b a b n

    ij i j ijk ij

    i j i j k

    y y bn y y an y y

    n y y y y y y

    Degrees of Freedom:

    SST= abn-1

    SSA= a-1

    SSB= b-1

    SSAB= (a-1)(b-1)

    SSE= ab(n-1)

    Can calculate mean

    squares as before todetermine if the

    variables have effect

  • 7/24/2019 Lecture ANOVA

    28/30

    ANOVA Table Fixed Effects Case

    Available computer softwares can

    easily perform these calculations.

  • 7/24/2019 Lecture ANOVA

    29/30

    Factorials with More Than

    Two Factors

    Basic procedure is similar to the two-factorcase; all a,b,c,,ktreatment combinationsare run in random order

    ANOVA identity is also similar:

    T A B AB AC

    ABC AB K E

    SS SS SS SS SS

    SS SS SS

    Computers are great, are they not?

  • 7/24/2019 Lecture ANOVA

    30/30

    Minitab Statistical Software

    Download 30-day trial version fromweb site:

    http://www.minitab.com/

    1) Click on Products tab

    2) MiniTab 17: Click: Learn More

    3) Click: try Minitab 17,

    then 30-day Free trial

    http://www.minitab.com/http://www.minitab.com/