prev

next

of 143

View

324Download

3

Embed Size (px)

- 1. Descriptive Statistics-IIDr Mahmoud Alhussami

2. Shapes of Distribution A third important property of data after locationand dispersion - is its shape Distributions of quantitative variables can bedescribed in terms of a number of features, manyof which are related to the distributions physicalappearance or shape when presented graphically. modality Symmetry and skewness Degree of skewness Kurtosis 3. Modality The modality of a distribution concernshow many peaks or high points there are. A distribution with a single peak, onevalue a high frequency is a unimodaldistribution. 4. Modality A distribution with twoor more peaks calledmultimodaldistribution. 5. Symmetry and Skewness A distribution is symmetric if the distribution could be splitdown the middle to form two haves that are mirror imagesof one another. In asymmetric distributions, the peaks are off center, witha bull of scores clustering at one end, and a tail trailing offat the other end. Such distributions are often describes asskewed. When the longer tail trails off to the right this is a positivelyskewed distribution. E.g. annual income. When the longer tail trails off to the left this is callednegatively skewed distribution. E.g. age at death. 6. Symmetry and Skewness Shape can be described by degree of asymmetry (i.e.,skewness). mean > median positive or right-skewness mean = median symmetric or zero-skewness mean < median negative or left-skewness Positive skewness can arise when the mean isincreased by some unusually high values. Negative skewness can arise when the mean isdecreased by some unusually low values. 7. Left skewed:Right skewed:Symmetric: 7 8. Shapes of the Distribution Three common shapes of frequencydistributions: A BCSymmetrical Positively Negativelyand bellskewed orskewed orshapedskewed toskewed tothe rightthe leftMarch 28, 20138 9. Shapes of the Distribution Three less common shapes of frequencydistributions: A B CBimodal ReverseUniformJ-shapedMarch 28, 20139 10. This guytook a VERYlong time!10 11. Degree of Skewness A skewness index can readily be calculated moststatistical computer program in conjunction withfrequency distributions The index has a value of 0 for perfectlysymmetric distribution. A positive value if there is a positive skew, andnegative value if there is a negative skew. A skewness index that is more than twice thevalue of its standard error can be interpreted as adeparture from symmetry. 12. Measures of Skewness or Symmetry Pearsons skewness coefficient It is nonalgebraic and easily calculated. Also itis useful for quick estimates of symmetry . It is defined as:skewness = mean-median/SD Fishers measure of skewness. It is based on deviations from the mean to thethird power. 13. Pearsons skewness coefficient For a perfectly symmetrical distribution, the mean willequal the median, and the skewness coefficient will bezero. If the distribution is positively skewed the meanwill be more than the median and the coefficient will bethe positive. If the coefficient is negative, thedistribution is negatively skewed and the mean less thanthe median. Skewness values will fall between -1 and +1 SD units.Values falling outside this range indicate a substantiallyskewed distribution. Hildebrand (1986) states that skewness values above0.2 or below -0.2 indicate severe skewness. 14. Assumption of Normality Many of the statistical methods that we willapply require the assumption that a variable orvariables are normally distributed. With multivariate statistics, the assumption isthat the combination of variables follows amultivariate normal distribution. Since there is not a direct test for multivariatenormality, we generally test each variableindividually and assume that they aremultivariate normal if they are individuallynormal, though this is not necessarily the case. 15. Evaluating normality There are both graphical and statistical methodsfor evaluating normality. Graphical methods include the histogram andnormality plot. Statistical methods include diagnostic hypothesistests for normality, and a rule of thumb that saysa variable is reasonably close to normal if itsskewness and kurtosis have values between 1.0and +1.0. None of the methods is absolutely definitive. 16. Transformations When a variable is not normally distributed, wecan create a transformed variable and test it fornormality. If the transformed variable is normallydistributed, we can substitute it in our analysis. Three common transformations are: thelogarithmic transformation, the square roottransformation, and the inverse transformation. All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematicallyequivalent to the original variable. 17. Types of Data Transformations for moderate skewness, use a square roottransformation. For substantial skewness, use a logtransformation. For sever skewness, use an inversetransformation. 18. Computing Explore descriptivestatistics To compute the statistics needed for evaluating the normality of a variable, select the Explore command from the Descriptive Statistics menu. 19. Adding the variable to be evaluatedSecond, click on rightarrow button to movethe highlighted variableto the Dependent List.First, click on thevariable to be includedin the analysis tohighlight it. 20. Selecting statistics to be computed To select the statistics for the output, click on the Statistics command button. 21. Including descriptive statistics First, click on the Descriptives checkbox to select it. Clear the other checkboxes. Second, click on the Continue button to complete the request for statistics. 22. Selecting charts for the output To select the diagnostic charts for the output, click on the Plots command button. 23. Including diagnostic plots andstatisticsFirst, click on theNone option buttonon the Boxplots panelsince boxplots are notas helpful as othercharts in assessingnormality.Finally, click on theContinue button tocomplete the request. Second, click on the Normality plots with tests Third, click on the Histogram checkbox to includecheckbox to include a normality plots and thehistogram in the output. You hypothesis tests for may want to examine the normality. stem-and-leaf plot as well,though I find it less useful. 24. Completing the specifications for theanalysis Click on the OK button to complete the specifications for the analysis and request SPSS to produce the output. 25. The histogramHistogramAn initial impression of the normality of the distribution 50 can be gained by examining the histogram. 40In this example, the histogram shows a substantial violation of normality caused 30by a extremely large value in the distribution. 20 Frequency 10 Std. Dev = 15.35 Mean = 10.7 0 N = 93.000.020.040.060.080.0100.010.030.050.070.090.0TOTAL TIME SPENT ON THE INTERNET 26. The normality plotNormal Q-Q Plot of TOTAL TIME SPENT ON THE INTERNET 3 2 1 0The problem with the normality of thisvariables distribution is reinforced by the Expected Normal -1normality plot. -2If the variable were normally distributed, the red dots would fit the green line very closely. In this case, the red points in the -3 upper right of the chart indicate the-40 -200 20 40 60 80 100120 severe skewing caused by the extremely large data values.Observed Value 27. The test of normalityTests of Normality aKolmogorov-SmirnovShapiro-WilkStatistic df Sig. StatisticdfSig. TOTAL TIME SPENT.24693 .000.606 93 .000 ON THE INTERNETa. Lilliefors Significance Correction Problem 1 asks about the results of the test of normality. Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead. The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is < 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as 5641-4 865-14 127 15-24 490 25-3466 35-44 806 45-54 1,425 55-64 3,511 65-74 6,932 75-8410,101+859825 Total34,52485 86. Frequency TableDataFrequency CumulativeRelativeCumulativeFrequencyFrequency RelativeIntervals)%( )%(Frequency10-19 5 520-29182330-39103340-49134650-59 45060-69 45470-79 256Total 86 87. Cumulative Relative Frequency Cumulative Relative Frequency thepercentage of persons having ameasurement less than or equal to theupper boundary of the class interval. i.e. cumulative relative frequency for the 3rdinterval of our data example: 8.8+13.3+17.5 = 59.6%- We say that 59.6% of the children have weightsbelow 39.5 pounds.March 28, 2013 87 88. Number of Intervals There is no clear-cut rule on the numberof intervals or classes that should be used. Too many intervals the data may not besummarized enough for a clearvisualization of how they are distributed. Too few intervals the data may be over-summarized and some of the details of thedistribution may be lost.March 28, 2013 88 89. Presenting DataChart - Visual representation of afrequency distribution that helps to gain insight about what the data mean.-Built with lines, area & text: barchartsEx: bar chart, pie chart 90. Bar Chart Simplest form of chart Used to display ETHICAL ISSUES SCALEnominal or ordinalITEM 8data 60 50 40 PERCENT 30 20 10 0 NeverSeldom Somet imes FrequentlyACTING AGAINST YOUR OWN PERSONAL/RELIGIOUS VIEWS 91. Horizontal Bar Chart CLINICAL PRACTICE AREAAcute Care Critical Care Gerontology CLINICAL PRACTICE AREAP ost Anesthesia Perinatal Clinical Research Family Nursing NeonatalPsych/Mental HealthCommunity Health General Practice OrthopedicsPrimary CareOperating RoomMedicalOncolog