33
Outline Comparative Studies The Role of Probability Lesson 2 Chapter 1: Basic Statistical Concepts Michael Akritas Department of Statistics The Pennsylvania State University Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

Lesson2_Chapter 1 Basic Statistical Concepts

Embed Size (px)

DESCRIPTION

chap

Citation preview

OutlineComparative Studies

The Role of Probability

Lesson 2Chapter 1: Basic Statistical Concepts

Michael Akritas

Department of StatisticsThe Pennsylvania State University

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

1 Comparative Studies

Terminology and Comparative Graphics

Randomization, Confounding and Simpson’s Paradox

Causation: Experiments and Observational Studies

Factorial Experiments

2 The Role of Probability

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

• Comparative studies aim at discerning and explainingdifferences between two or more populations. Examplesinclude:

The comparison of two methods of cloud seeding for hailand fog suppression at international airports,the comparison of two or more cement mixtures in terms ofcompressive strength,the comparison the survival times of a type of root systemunder different watering regimens,the comparison of the effectiveness of three cleaningproducts in removing four different types of stains.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Outline

1 Comparative Studies

Terminology and Comparative Graphics

Randomization, Confounding and Simpson’s Paradox

Causation: Experiments and Observational Studies

Factorial Experiments

2 The Role of Probability

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Jargon used in comparative studies

One-factor studies.Factor levels; treatments; populationsResponse variable

ExampleIn the comparison the survival times of a type of root systemunder different watering regimens,

Watering is the factor.The different watering regimens are called factor levels ortreatments. Treatments correspond to populations.The survival time is the response variable.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

More jargon

Experimental units: These are the subjects or objects onwhich measurements are made.

In previous example, the roots are the experimental units

Multi-factor studies.Factor levels combinations; treatments; populations

Factor BFactor A 1 2 3 4

1 Tr11 Tr12 Tr13 Tr14

2 Tr21 Tr22 Tr23 Tr24

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

ExampleTo study the effect of five different temperature levels and fivedifferent humidity levels affect the yield of a chemical reaction:

Factors are temperature and humidity, with 5 levels each.Treatments are the different factor level combinations,which, again, correspond to the different populations.Response is the yield of the chemical reaction.Experimental units is the set of materials used for thechemical reaction.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

ExampleA study will compare the level of radiation emitted by five kindsof cell phones at each of three volume settings. State thefactors involved in this study, the number of levels for eachfactor, the total number of populations or treatments, theresponse variable and the experimental units.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Final piece of jargon: Contrasts

Comparisons typically focus on differences (e.g. of means,or proportions). Such differences are called contrasts.

The comparison of two different cloud seeding methodsmay focus on the contrast µ1 − µ2.

In studies involving more than two populations a number ofdifferent contrasts may be of interest.For example, in a study aimed at comparing the meantread life of four types of high performance tires, possiblesets of contrasts of interest are

1 µ1 − µ2, µ1 − µ3, µ1 − µ4 (control vs treatment)

2µ1 + µ2

2− µ3 + µ4

2(brand A vs brand B)

3 µ1 − µ, µ2 − µ, µ3 − µ, µ4 − µ (tire effects)

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

The Comparative Boxplot

The comparative boxplot consists of side-by-side individualboxplots for the data sets from each population.It is useful for providing a visual impression of differencesin the median and percentiles.

ExampleIron concentration measurements from four different iron oreformations are given in http://www.stat.psu.edu/˜mga/401/Data/anova.fe.data.txt. The comparative boxplotcan be seen in http://www.stat.psu.edu/˜mga/401/fig/BoxplotComp_Fe.pdf

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

The Comparative Bar Graph

The comparative bar graph generalizes the bar graph inthat for each category it plots several bars represents thecategory’s proportion in each of the populations beingcompared; different colors are used to distinguish bars thatcorrespond to different populations.

ExampleThe light vehicle market share of car companies for the monthof November in 2010 and 2011 is given inhttp://www.stat.psu.edu/˜mga/401/Data/MarketShareLightVehComp.txt. The comparative bargraph can be seen in http://stat.psu.edu/˜mga/401/fig/LvMsBarComp.pdf

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Outline

1 Comparative Studies

Terminology and Comparative Graphics

Randomization, Confounding and Simpson’s Paradox

Causation: Experiments and Observational Studies

Factorial Experiments

2 The Role of Probability

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

To avoid comparing apples with oranges, the experimental unitsfor the different treatments must be homogenous.

If fabric age affects the effectiveness of cleaning productsthen, unless the fabrics used in different treatments areage- homogenous, the comparison of treatments will bedistorted.If the meditation group in the diet study consists mainly ofthose subjects who had practiced meditation before, thecomparison will be distorted.

To mitigate the distorting effects, or confounding, of otherpossible factors, called lurking variables, it is recommendedthat the allocation of units to treatments be randomized.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Randomizing the allocation of fabric pieces to the differenttreatments (cleaning product and stain) avoidsconfounding with the factor age of fabric.Randomizing the allocation of subjects to the control (dietalone) and treatment (diet plus meditation) groups avoidsconfounding with the experience factor.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

The distortion caused by lurking variables in the comparison ofproportions is called Simpson’s Paradox.

ExampleThe success rates of two treatments, Treatments A and B, forkidney stones are:

Treatment A Treatment B78% (273/350) 83% (289/350)

The obvious conclusion is that Treatment B is more effective.The lurking variable here is the size of the kidney stone.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Example (Kidney Stone Example Continued)When the size of the treated kidney stone is taken intoconsideration, the success rates are as follows:

Small Large CombinedTr.A 81/87 or .93 192/263 or .73 273/350 or .78Tr.B 234/270 or .87 55/80 or .69 289/350 or .83

Now we see that Treatment A has higher success rate for bothsmall and large stones.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Batting Averages

ExampleThe overall batting average of baseball players Derek Jeter andDavid Justice during the years 1995 and 1996 were 0.310 and0.270, respectively. But looking at each year separately we geta different picture:

1995 1996 CombinedJeter 12/48 or .250 183/582 or .314 195/630 or .310

Justice 104/411 or .253 45/140 or .321 149/551 or .270

Justice had a higher batting average than Jeter in both 1995and 1996.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Outline

1 Comparative Studies

Terminology and Comparative Graphics

Randomization, Confounding and Simpson’s Paradox

Causation: Experiments and Observational Studies

Factorial Experiments

2 The Role of Probability

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

DefinitionA study is called a statistical experiment if the investigatorcontrols the allocation of units to treatments or factor-levelcombinations, and this allocation is done in a randomizedfashion. Otherwise the study is called observational.

• Causation can only be established via a statisticalexperiment. Thus, a relation between salary increase andproductivity does not imply that salary increases causeincreased productivity.• Observational studies cannot establish causation, unlessthere is additional corroborating evidence. Thus, the linkbetween smoking and health has been established throughobservational studies.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Outline

1 Comparative Studies

Terminology and Comparative Graphics

Randomization, Confounding and Simpson’s Paradox

Causation: Experiments and Observational Studies

Factorial Experiments

2 The Role of Probability

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

A statistical experiment involving several factors is called afactorial experiment if all factor-level combinations areconsidered. Thus,

Factor BFactor A 1 2 3 4

1 Tr11 Tr12 Tr13 Tr14

2 Tr21 Tr22 Tr23 Tr24

is a factorial experiment if all 8 treatments are included inthe study.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Main Effects and Interactions

In factorial experiments it is not enough to considerdifferences between the levels within each factorseparately. Synergistic effects or interactions are also ofinterest.

DefinitionSynergistic effects among the levels of two different factors, i.e.,when a change in the level of factor A has different effects onthe levels of factor B, we say that there is interaction betweenthe two factors. The absence of interaction is called additivity.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

ExampleAn experiment considers two types of corn, used for bio-fuel,and two types of fertilizer. The following two tables givepossible population mean yields for the four combinations ofseed type and fertilizer type.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Fertilizer Row MainI II Averages Row Effects

Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −0.25

Seed B µ21 = 109 µ22 = 110 µ2· = 109.5 α2 = 0.25Column

Averages µ·1 = 108 µ·2 = 110.5 µ·· = 109.25Main

Column β1 = −1.25 β2 = 1.25Effects

Here the factors interact.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Fertilizer Row Main RowI II Averages Effects

Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −1

Seed B µ21 = 109 µ22 = 113 µ2· = 111 α2 = 1Column

Averages µ·1 = 108 µ·2 = 112 µ·· = 110Main

Column β1 = −2 β2 = 2Effects

Here the factors do not interact.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Under additivity:There is an indisputably best level for each factor, andThe best factor level combination is that of the best level offactor A with the best level of factor B.What is the best level of each factor in the above design?

Under additivity, the comparison of the levels of each factorare based on the main effects:

αi = µi· − µ··, βj = µ·j − µ··

See the main effects in the above two designs.Under additivity,

µij = µ·· + αi + βj

See the above design.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

When the factors interact, the cell means are not given interms of the main effects as above.The difference

γij = µij − (µ·· + αi + βj)

quantifies the interaction effect.For example, in the above non-additive design,

γ11 = µ11 − µ·· − α1 − β1

= 107 − 109.25 + 0.25 + 1.25

= −0.75.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Data Versions of Main Effects and Interactions

Data from a two-factor factorial experiment use threesubscripts:

Factor BFactor A 1 2 3

1 x11k , x12k , x13k ,k = 1, . . . ,n11 k = 1, . . . ,n12 k = 1, . . . ,n13

2 x21k , x22k , x23k ,k = 1, . . . ,n21 k = 1, . . . ,n22 k = 1, . . . ,n23

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Sample versions of main effects and interactions aredefined using

x ij =1nij

nij∑k=1

xijk ,

instead of µij :

α̂i = x i· − x ··, β̂j = x ·j − x ··Sample Main Row

and Column Effects

γ̂ij = x ij −(

x ·· + α̂i + β̂j

) Sample InteractionEffects

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Terminology and Comparative GraphicsRandomization, Confounding and Simpson’s ParadoxCausation: Experiments and Observational StudiesFactorial Experiments

Sample versions of main effects and interactions estimatetheir population counterparts but, in general, they are notequal to them.Thus, even if the data has come from an additive design,the sample interaction effects will not be zero.The interaction plot is a graphical technique that can helpassess whether the sample interaction effects aresignificantly different from zero.

For each level of, say, factor B, the interaction plot tracesthe cell means along the levels of factor A. Seehttp://stat.psu.edu/˜mga/401/fig/CloudSeedInterPlot.pdf for an example.For data coming from additive designs, these traces (orprofiles) should be approximately parallel.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Probability and Statistics

Probability plays a central role in statistics, but the two differ:

In a probability problem, the properties of the population ofinterest are assumed known, whereas statistics isconcerned with learning those properties.Thus probability uses properties of the population to inferthose of the sample, while statistical inference does theopposite.

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Example (Examples of Probability Questions)If 5% of electrical components have a certain defect, whatare the chances that a batch of 500 such components willcontain less than 20 defective ones?If the highway mileage achieved by the 2011 Toyota Priuscars has population mean and standard deviation of 51and 1.5 miles per gallon, respectively, what are thechanges that in a sample of size 10 cars the averagehighway mileage is lass than 50 miles per gallon?

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts

OutlineComparative Studies

The Role of Probability

Go to previous lesson http://www.stat.psu.edu/

˜mga/401/course.info/lesson1.pdf

Go to next lesson http://www.stat.psu.edu/˜mga/401/course.info/lesson3.pdf

Go to the Stat 401 home page http://www.stat.psu.edu/˜mga/401/course.info/

Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts