Upload
jason-edington
View
151
Download
2
Embed Size (px)
Citation preview
Lecture 5: Measures of Variation
Measures of VariationWe’re not looking for the most typical number here
◦ Instead, we’re looking for a way to describe numerically how spread out the members of a data set are
If a variable is at the nominal level of measurement, we can’t talk about spread at all, so these variables have no measure of variation
For the other three levels there’s the very simplest of measures, the range, which is the spread itself
◦ This is the difference of the highest and lowest members of the set◦ Range = max – min
◦ The range is simply the difference!
What we really need though is a more sophisticated way to measure variation, and fortunately there is a very simple, yet sophisticated way of doing this.
◦ The Standard Deviation
Measures of Variation Standard Deviation Cvar Activity
Measures of the Spread of the Data
An important characteristic of any set of data is the variation in the data In some data sets, the data values are concentrated closely near the
mean In other data sets, the data values are more widely spread out from
the mean
The most common measure of variation, or spread, is the standard deviation Standard Deviation – a number that measures how far data values are
from their mean
The Standard Deviation Provides a numerical measure of the overall amount of variation in a
data set, and Can be used to determine whether a particular data value is close to
or far from the mean
Measures of Variation Standard Deviation Cvar Activity
Standard Deviation
The standard deviation:
provides a numerical measure of the overall variation in a data set
is always positive or zero
can be used to determine whether a particular data value is close to or far from the mean
is small when the data are concentrated close to the mean, exhibiting little variation or spread
is larger when the data values are more spread out from the mean, exhibiting more variation
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
We can say that the range of time they waited is 32-18 = 14 minutes, but this doesn’t really describe the variation
it’s simply the largest value minus the smallest value.
One really big order in front of you could really change the range drastically!
We’re going to do most of this with calculators, but let’s build it up by hand so we can see what it’s all about.
We’re going to need the mean of the data set
it’s 22.6 ( 𝑥 = 22.6)
And then we’re going to add another column to the table, which is the difference from each data value and the mean
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
The 𝑥 − 𝑥 are called deviations from the mean, because they express how far the datum is from the mean
If the x is less than the mean, it’s deviation from the mean is negative
Notice that the 𝑥 − 𝑥 column adds up to zero?
This is how it should be, since the mean is like a balancing point
Next, we’ll add the final column, the (𝑥 − 𝑥)2 ‘s.
x 𝑥 − 𝑥
23 0.4
18 -4.6
21 -1.6
32 9.4
19 -3.6
Sum: 0
x 𝑥 − 𝑥 (𝑥 − 𝑥)2
23 0.4 0.16
18 -4.6 21.16
21 -1.6 2.56
32 9.4 88.36
19 -3.6 12.96
This column is the squared deviations from the mean(obviously!)
Remember, the numbers in this column will be positive
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
x 𝑥 − 𝑥
23 0.4
18 -4.6
21 -1.6
32 9.4
19 -3.6
x 𝑥 − 𝑥 (𝑥 − 𝑥)2
23 0.4 0.16
18 -4.6 21.16
21 -1.6 2.56
32 9.4 88.36
19 -3.6 12.96
Now we do various things with this third column, (𝑥 − 𝑥)2.
First, we add it up:
Then, we divide this sum by one less than the sample size, n - 1
Why minus one? It turns out that this number is a better fit for the population of which this data set is a sample (try not to worry about this)
2.125)( 2 xx
3.314
2.125
1
)( 2
n
xxS2 =
We label this quotient s2, and call it the sample variance
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
x 𝑥 − 𝑥
23 0.4
18 -4.6
21 -1.6
32 9.4
19 -3.6
x 𝑥 − 𝑥 (𝑥 − 𝑥)2
23 0.4 0.16
18 -4.6 21.16
21 -1.6 2.56
32 9.4 88.36
19 -3.6 12.96
3.314
2.125
1
)( 2
n
xxS2 =
We label this quotient s2, and call it the sample variance
So, 31.3 is the sample variance.
Next, we find the sample standard deviation, which we label s, because…
𝑠 = 𝑠2
That is, the sample standard deviation is the square root of the sample variance.
5.5946
We’ll round this to the nearest tenth…
In summary, the formula for the sample standard deviation is:
𝑠 = 31.3 ≈
1
)( 2
n
xxs
𝑠 = 5.6
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
x 𝑥 − 𝑥
23 0.4
18 -4.6
21 -1.6
32 9.4
19 -3.6
x 𝑥 − 𝑥 (𝑥 − 𝑥)2
23 0.4 0.16
18 -4.6 21.16
21 -1.6 2.56
32 9.4 88.36
19 -3.6 12.96
So, sample variance, s2 = 31.3
And the sample standard variation, s = 5.6
But what does this mean?
In general, a data value that is two standard deviations from the average is on the borderline for what many statisticians would consider to be far from average
In this sample, 𝑥 = 22.6
So, 22.6 ± 5.6 = 17 and 28.2 minutes
22.6 ± 2(5.6) = 11.4 and 33.8 minutes
If you were to wait less than 11.4 minutes, or more than 33.8 minutes, that would be far from average
Measures of Variation Standard Deviation Cvar Activity
Wait times at In N Out
Let x be the number of minutes a person waits for an order at In N Out at lunchtime Monday through Friday
x
23
18
21
32
19
x 𝑥 − 𝑥
23 0.4
18 -4.6
21 -1.6
32 9.4
19 -3.6
x 𝑥 − 𝑥 (𝑥 − 𝑥)2
23 0.4 0.16
18 -4.6 21.16
21 -1.6 2.56
32 9.4 88.36
19 -3.6 12.96
In this sample, 𝑥 = 22.6
So, 22.6 ± 5.6 = 17 and 28.2 minutes
22.6 ± 2(5.6) = 11.4 and 33.8 minutes
If you were to wait less than 11.4 minutes, or more than 33.8 minutes, that would be far from average
In general
𝑣𝑎𝑙𝑢𝑒 = 𝑚𝑒𝑎𝑛 + (#𝑜𝑓𝑆𝑇𝐷𝐸𝑉)(𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)
#STDEV does not need to be an integer
Sample:
Population:
𝑥 = 𝑥 + (#𝑜𝑓𝑆𝑇𝐷𝐸𝑉)(𝑠)
x = µ+ (# of STDEV)(σ)
Population Mean Population Standard Deviation
Measures of Variation Standard Deviation Cvar Activity
Formulas for the Standard DeviationsSample Standard Deviation
Population Standard Deviation
1
)( 2
n
xxs
N
x
2)(
If the sample has the same characteristics as the population…
then s should be a good estimate of σ
σ2 represents the population variance just as s2 represents the sample variance
Also note that if we have a census (so, the whole population), we divide by N, the number of items in the population
Measures of Variation Standard Deviation Cvar Activity
A few facts about what the Standard Deviation tells usFor ANY data set, no matter what the distribution of the data isAt least 75% of the data is within
two standard deviations of the mean
At least 89% of the data is within three standard deviations of the mean
At least 95% of the data is within 4.5 standard deviations of the mean This is known as Chebyshev’s Rule
Measures of Variation Standard Deviation Cvar Activity
A few facts about what the Standard Deviation tells us
For data having a distribution that is Bell-Shaped and Symmetric:Approximately 68% of the data is
within one standard deviation of the mean
Approximately 95% of the data is within two standard deviations of the mean
More than 99% of the data is within three standard deviations of the mean This is known as the Empirical Rule
It is important to note that this rule only applies when the shape of the distribution of the data is bell shaped and symmetric
Measures of Variation Standard Deviation Cvar Activity
Coefficient of Variation The coefficient of variation, also known as 𝐶𝑣𝑎𝑟, is used to see how the sample standard deviation measures up against the sample mean.
◦ It is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another◦ Suppose you are looking at several different samples of data
◦ That is, is the standard deviation large or small compared to the mean?
◦ 𝐶𝑣𝑎𝑟 =𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛=
𝑠
𝑥
From our In N Out example:
◦ 𝐶𝑣𝑎𝑟 =𝑠
𝑥=
5.5946
22.6≈ 0.2475 ≈ 25%
◦ You may have noticed I used more decimal places for s
◦ This is because once you start rounding and use the rounded numbers to get other numbers you get further and further away from an accurate figure
Measures of Variation Standard Deviation Cvar Activity
Activity 4 & 5: Finding measures of central tendency and variationUsing the first five men’s ages in the Class Data Base, find:
1. The mode
2. The median
3. The mean
4. The midrange
5. The range
6. The sample variance (to the nearest tenth)
7. The sample standard deviation (to the nearest tenth)
8. The coefficient of variation (to the nearest whole percent)
Use the table method for the sample variance and standard deviation
Measures of Variation Standard Deviation Cvar Activity