If you can't read please download the document

View

25Download

0

Embed Size (px)

DESCRIPTION

Numerical Descriptive Techniques. Chapter 4. Introduction. Recall Chapter 2, where we used graphical techniques to describe data:. While this histogram provides some new insight, other interesting questions (e.g. what is the class average? what is the mark spread?) go unanswered. - PowerPoint PPT Presentation

Numerical Descriptive TechniquesChapter 4

2007()

IntroductionRecall Chapter 2, where we used graphical techniques to describe data:While this histogram provides some new insight, other interesting questions (e.g. what is the class average? what is the mark spread?) go unanswered.

2007()

Numerical Descriptive TechniquesMeasures of Central LocationMean, Median, Mode

Measures of VariabilityRange, Standard Deviation, Variance, Coefficient of Variation

Measures of Relative StandingPercentiles, Quartiles

Measures of Linear RelationshipCovariance, Correlation, Least Squares Line

2007()

4.1 Measures of Central LocationUsually, we focus our attention on two types of measures when describing population characteristics:Central location (e.g. average)Variability or spreadThe measure of central location reflects the locations of all the actual data points.

2007()

4.1 Measures of Central LocationThe measure of central location reflects the locations of all the actual data points.How?But if the third data point appears on the left hand-sideof the midrange, it should pullthe central location to the left.With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).

2007()

The Arithmetic MeanThis is the most popular and useful measure of central location

2007()

NotationWhen referring to the number of observations in a population, we use uppercase letter N

When referring to the number of observations in a sample, we use lower case letter n

The arithmetic mean for a population is denoted with Greek letter mu:

The arithmetic mean for a sample is denoted with an x-bar.

2007()

Statistics is a pattern language

PopulationSampleSizeNnMean

2007()

The Arithmetic MeanSample meanPopulation meanSample sizePopulation size

2007()

Statistics is a pattern language

PopulationSampleSizeNnMean

2007()

The Arithmetic Mean Example 4.1The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time on the Internet. 072211.042.1938.4545.7743.59

2007()

Additional Example The Arithmetic Mean

2007()

The Arithmetic Meanis appropriate for describing measurement data, e.g. heights of people, marks of student papers, etc.

is seriously affected by extreme values called outliers. E.g. as soon as a billionaire moves into a neighborhood, the average household income increases beyond what it was previously!

2007()

The MedianThe Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude.Odd number of observations0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 338.5,8

2007()

The ModeThe Mode of a set of observations is the value that occurs most frequently.Set of data may have one mode (or modal class), or two or more modes.For large data setsthe modal class is much more relevant than a single-value mode.

2007()

The ModeExample 4.5 Find the mode for the data in Example 4.1. Here are the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22

SolutionAll observation except 0 occur once. There are two 0. Thus, the mode is zero. Is this a good measure of central location?The value 0 does not reside at the center of this set (compare with the mean = 11.0 and the mode = 8.5).

2007()

The ModeAdditional exampleThe manager of a mens store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.The mode of this data set is 34 in. This information seems to be valuable (for example, for the design of a new display in the store), much more than the median is 33.5 in.

2007()

Measures of Central LocationThe mode of a set of observations is the value that occurs most frequently.

A set of data may have one mode (or modal class), or two, or more modes.

Mode is a useful for all data types, though mainly used for nominal data.

For large data sets the modal class is much more relevant than a single-value mode. Sample and population modes are computed the same way.

2007()

=MODE(range) in ExcelNote: if you are using Excel for your data analysis and your data is multi-modal (i.e. there is more than one mode), Excel only calculates the smallest one.

You will have to use other techniques (i.e. histogram) to determine if your data is bimodal, trimodal, etc.

2007()

The Mean, Median and ModeAdditional example A professor of statistics wants to report the results of a midterm exam, taken by 100 students. The mean of the test marks is 73.90 The median of the test marks is 81 The mode of the test marks is 84 Describe the information each one provides.The mean provides informationabout the over-all performance level of the class. It can serve as a tool for making comparisons with other classes and/or other exams. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. A student can use this statistic to place his mark relative to other students in the class.The mode must be used when data are nominal If marks are classified by letter grade, the frequency of each grade can be calculated. Then, the mode becomes a logical measure to compute.

2007()

Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ.A positively skewed distribution(skewed to the right)MeanMedianMode

2007()

Relationship among Mean, Median, and ModeIf a distribution is symmetrical, the mean, median and mode coincideIf a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.A positively skewed distribution(skewed to the right)MeanMedianModeMeanMedianModeA negatively skewed distribution(skewed to the left)

2007()

Mean, Median, ModeIf data are symmetric, the mean, median, and mode will be approximately the same.

If data are multimodal, report the mean, median and/or mode for each subgroup.

If data are skewed, report the median.

2007()

Mean, Median, & Modes for Ordinal & Nominal DataFor ordinal and nominal data the calculation of the mean is not valid.

Median is appropriate for ordinal data.

For nominal data, a mode calculation is useful for determining highest frequency but not central location.

2007()

The Geometric MeanThis is a measure of the average growth rate.Let Ri denote the the rate of return in period i (i=1,2,n). The geometric mean of the returns R1, R2, ,Rn is the constant Rg that produces the same terminal wealth at the end of period n as do the actual returns for the n periods.

2007()

The Geometric MeanIf the rate of return was Rg in everyperiod, the nth period return wouldbe calculated by:For the given series of rate of returns the nth period return iscalculated by:=Rg is selected such that

2007()

Finance ExampleSuppose a 2-year investment of $1,000 grows by 100% to $2,000 in the first year, but loses 50% from $2,000 back to the original $1,000 in the second year. What is your average return?

Using the arithmetic mean, we have

This would indicate we should have $1,250 at the end of our investment, not $1,000.

Solving for the geometric mean yields a rate of 0%, which is correct.The upper case Greek Letter Pi represents a product of terms

2007()

The Geometric MeanAdditional Example A firms sales were $1,000,000 three years ago.Sales have grown annually by 20%, 10%, -5%.Find the geometric mean rate of growth in sales.SolutionSince Rg is the geometric mean (1+Rg)3 = (1+.2)(1+.1)(1-.05)= 1.2540Thus,

2007()

Measures of Central Location SummaryCompute the Mean to Describe the central location of a single set of interval data

Compute the Median toDescribe the central location of a single set of interval or ordinal data

Compute the Mode to Describe a single set of nominal data

Compute the Geometric Mean to Describe a single set of interval data based on growth rates

2007()

4.2 Measures of variabilityMeasures of central location fail to tell the whole story about the distribution.A question of interest still remains unanswered:How much are the observations spread outaround the mean value?

2007()

4.2 Measures of variabilityObserve two hypothetical data sets:The average value provides a good representation of theobservations in the data set.Small variability

This data set is now changing to...

2007()

4.2 Measures of variabilityObserve two hypothetical data sets:The average value provides a good representation of theobservations in the data set.Small variabilityLarger variabilityThe same average value does not provide as good representation of theobservations in the data set as before.

2007()

Measures of VariabilityMeasures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value?For example, two sets of class grades are shown. The mean (=50) is the same in each case

But, the red class has greater variability than the blue class.

2007()

The rangeThe range of a set of observations is the difference between the largest and smallest observations.

Its major advantage is the ease with which it can be computed.

Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. But, how do all the observations spread out?SmallestobservationLargestobservationThe range cannot assis