15
Lecture 5: Measures of Variation

Lecture 5 Measures of Variation

Embed Size (px)

Citation preview

Page 1: Lecture 5 Measures of Variation

Lecture 5: Measures of Variation

Page 2: Lecture 5 Measures of Variation

Measures of VariationWe’re not looking for the most typical number here

◦ Instead, we’re looking for a way to describe numerically how spread out the members of a data set are

If a variable is at the nominal level of measurement, we can’t talk about spread at all, so these variables have no measure of variation

For the other three levels there’s the very simplest of measures, the range, which is the spread itself

◦ This is the difference of the highest and lowest members of the set◦ Range = max – min

◦ The range is simply the difference!

What we really need though is a more sophisticated way to measure variation, and fortunately there is a very simple, yet sophisticated way of doing this.

◦ The Standard Deviation

Measures of Variation Standard Deviation Cvar Activity

Page 3: Lecture 5 Measures of Variation

Measures of the Spread of the Data

An important characteristic of any set of data is the variation in the data In some data sets, the data values are concentrated closely near the

mean In other data sets, the data values are more widely spread out from

the mean

The most common measure of variation, or spread, is the standard deviation Standard Deviation – a number that measures how far data values are

from their mean

The Standard Deviation Provides a numerical measure of the overall amount of variation in a

data set, and Can be used to determine whether a particular data value is close to

or far from the mean

Measures of Variation Standard Deviation Cvar Activity

Page 4: Lecture 5 Measures of Variation

Standard Deviation

The standard deviation:

provides a numerical measure of the overall variation in a data set

is always positive or zero

can be used to determine whether a particular data value is close to or far from the mean

is small when the data are concentrated close to the mean, exhibiting little variation or spread

is larger when the data values are more spread out from the mean, exhibiting more variation

Measures of Variation Standard Deviation Cvar Activity

Page 5: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

We can say that the range of time they waited is 32-18 = 14 minutes, but this doesn’t really describe the variation

it’s simply the largest value minus the smallest value.

One really big order in front of you could really change the range drastically!

We’re going to do most of this with calculators, but let’s build it up by hand so we can see what it’s all about.

We’re going to need the mean of the data set

it’s 22.6 ( 𝑥 = 22.6)

And then we’re going to add another column to the table, which is the difference from each data value and the mean

Measures of Variation Standard Deviation Cvar Activity

Page 6: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

The 𝑥 − 𝑥 are called deviations from the mean, because they express how far the datum is from the mean

If the x is less than the mean, it’s deviation from the mean is negative

Notice that the 𝑥 − 𝑥 column adds up to zero?

This is how it should be, since the mean is like a balancing point

Next, we’ll add the final column, the (𝑥 − 𝑥)2 ‘s.

x 𝑥 − 𝑥

23 0.4

18 -4.6

21 -1.6

32 9.4

19 -3.6

Sum: 0

x 𝑥 − 𝑥 (𝑥 − 𝑥)2

23 0.4 0.16

18 -4.6 21.16

21 -1.6 2.56

32 9.4 88.36

19 -3.6 12.96

This column is the squared deviations from the mean(obviously!)

Remember, the numbers in this column will be positive

Measures of Variation Standard Deviation Cvar Activity

Page 7: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

x 𝑥 − 𝑥

23 0.4

18 -4.6

21 -1.6

32 9.4

19 -3.6

x 𝑥 − 𝑥 (𝑥 − 𝑥)2

23 0.4 0.16

18 -4.6 21.16

21 -1.6 2.56

32 9.4 88.36

19 -3.6 12.96

Now we do various things with this third column, (𝑥 − 𝑥)2.

First, we add it up:

Then, we divide this sum by one less than the sample size, n - 1

Why minus one? It turns out that this number is a better fit for the population of which this data set is a sample (try not to worry about this)

2.125)( 2 xx

3.314

2.125

1

)( 2

n

xxS2 =

We label this quotient s2, and call it the sample variance

Measures of Variation Standard Deviation Cvar Activity

Page 8: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

x 𝑥 − 𝑥

23 0.4

18 -4.6

21 -1.6

32 9.4

19 -3.6

x 𝑥 − 𝑥 (𝑥 − 𝑥)2

23 0.4 0.16

18 -4.6 21.16

21 -1.6 2.56

32 9.4 88.36

19 -3.6 12.96

3.314

2.125

1

)( 2

n

xxS2 =

We label this quotient s2, and call it the sample variance

So, 31.3 is the sample variance.

Next, we find the sample standard deviation, which we label s, because…

𝑠 = 𝑠2

That is, the sample standard deviation is the square root of the sample variance.

5.5946

We’ll round this to the nearest tenth…

In summary, the formula for the sample standard deviation is:

𝑠 = 31.3 ≈

1

)( 2

n

xxs

𝑠 = 5.6

Measures of Variation Standard Deviation Cvar Activity

Page 9: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

x 𝑥 − 𝑥

23 0.4

18 -4.6

21 -1.6

32 9.4

19 -3.6

x 𝑥 − 𝑥 (𝑥 − 𝑥)2

23 0.4 0.16

18 -4.6 21.16

21 -1.6 2.56

32 9.4 88.36

19 -3.6 12.96

So, sample variance, s2 = 31.3

And the sample standard variation, s = 5.6

But what does this mean?

In general, a data value that is two standard deviations from the average is on the borderline for what many statisticians would consider to be far from average

In this sample, 𝑥 = 22.6

So, 22.6 ± 5.6 = 17 and 28.2 minutes

22.6 ± 2(5.6) = 11.4 and 33.8 minutes

If you were to wait less than 11.4 minutes, or more than 33.8 minutes, that would be far from average

Measures of Variation Standard Deviation Cvar Activity

Page 10: Lecture 5 Measures of Variation

Wait times at In N Out

Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday

x

23

18

21

32

19

x 𝑥 − 𝑥

23 0.4

18 -4.6

21 -1.6

32 9.4

19 -3.6

x 𝑥 − 𝑥 (𝑥 − 𝑥)2

23 0.4 0.16

18 -4.6 21.16

21 -1.6 2.56

32 9.4 88.36

19 -3.6 12.96

In this sample, 𝑥 = 22.6

So, 22.6 ± 5.6 = 17 and 28.2 minutes

22.6 ± 2(5.6) = 11.4 and 33.8 minutes

If you were to wait less than 11.4 minutes, or more than 33.8 minutes, that would be far from average

In general

𝑣𝑎𝑙𝑢𝑒 = 𝑚𝑒𝑎𝑛 + (#𝑜𝑓𝑆𝑇𝐷𝐸𝑉)(𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)

#STDEV does not need to be an integer

Sample:

Population:

𝑥 = 𝑥 + (#𝑜𝑓𝑆𝑇𝐷𝐸𝑉)(𝑠)

x = µ+ (# of STDEV)(σ)

Population Mean Population Standard Deviation

Measures of Variation Standard Deviation Cvar Activity

Page 11: Lecture 5 Measures of Variation

Formulas for the Standard DeviationsSample Standard Deviation

Population Standard Deviation

1

)( 2

n

xxs

N

x

2)(

If the sample has the same characteristics as the population…

then s should be a good estimate of σ

σ2 represents the population variance just as s2 represents the sample variance

Also note that if we have a census (so, the whole population), we divide by N, the number of items in the population

Measures of Variation Standard Deviation Cvar Activity

Page 12: Lecture 5 Measures of Variation

A few facts about what the Standard Deviation tells usFor ANY data set, no matter what the distribution of the data isAt least 75% of the data is within

two standard deviations of the mean

At least 89% of the data is within three standard deviations of the mean

At least 95% of the data is within 4.5 standard deviations of the mean This is known as Chebyshev’s Rule

Measures of Variation Standard Deviation Cvar Activity

Page 13: Lecture 5 Measures of Variation

A few facts about what the Standard Deviation tells us

For data having a distribution that is Bell-Shaped and Symmetric:Approximately 68% of the data is

within one standard deviation of the mean

Approximately 95% of the data is within two standard deviations of the mean

More than 99% of the data is within three standard deviations of the mean This is known as the Empirical Rule

It is important to note that this rule only applies when the shape of the distribution of the data is bell shaped and symmetric

Measures of Variation Standard Deviation Cvar Activity

Page 14: Lecture 5 Measures of Variation

Coefficient of Variation The coefficient of variation, also known as 𝐶𝑣𝑎𝑟, is used to see how the sample standard deviation measures up against the sample mean.

◦ It is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another◦ Suppose you are looking at several different samples of data

◦ That is, is the standard deviation large or small compared to the mean?

◦ 𝐶𝑣𝑎𝑟 =𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛=

𝑠

𝑥

From our In N Out example:

◦ 𝐶𝑣𝑎𝑟 =𝑠

𝑥=

5.5946

22.6≈ 0.2475 ≈ 25%

◦ You may have noticed I used more decimal places for s

◦ This is because once you start rounding and use the rounded numbers to get other numbers you get further and further away from an accurate figure

Measures of Variation Standard Deviation Cvar Activity

Page 15: Lecture 5 Measures of Variation

Activity 4 & 5: Finding measures of central tendency and variationUsing the first five men’s ages in the Class Data Base, find:

1. The mode

2. The median

3. The mean

4. The midrange

5. The range

6. The sample variance (to the nearest tenth)

7. The sample standard deviation (to the nearest tenth)

8. The coefficient of variation (to the nearest whole percent)

Use the table method for the sample variance and standard deviation

Measures of Variation Standard Deviation Cvar Activity