32
UDM MSC COURSE IN EDUCATION & DEVELOPMENT 2013 [email protected] www.nicspaull.com/teaching Day 2: Core statistics 101

Day 2: Core statistics 101

  • Upload
    xiang

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Day 2: Core statistics 101. UDM Msc course in education & development 2013 [email protected] – www.nicspaull.com/teaching. Introduction. What are statistics? “the practice or science of collecting and analysing numerical data in large quantities” - PowerPoint PPT Presentation

Citation preview

Page 1: Day 2: Core  statistics 101

U D M M S C C O U R S E I N E D U C AT I O N & D E V E L O P M E N T 2 0 1 3

N i c h o l a s S p a u l l @ g m a i l . c o m – w w w. n i c s p a u l l . c o m / t e a c h i n g

Day 2: Core statistics 101

Page 2: Day 2: Core  statistics 101

Introduction

What are statistics? “the practice or science of collecting and analysing

numerical data in large quantities”

Why do we need descriptive statistics? When we look at large amounts of data, there is very

little “face value” information. If you had a dataset listing the income of 10,000 people and someone asked you if the income of the group was high or low it would be difficult to answer that question without using summary statistics (mean, median, mode etc.).

Page 3: Day 2: Core  statistics 101

3

Types of Data

Data

Categorical Numerical

Discrete Continuous

Page 4: Day 2: Core  statistics 101

4

Types of Data

Data

Categorical Numerical

Discrete Continuous

Examples: Marital Status Political Party Eye Color (Defined categories)

Examples: Number of Children Defects per hour (Counted items)

Examples: Weight Voltage (Measured characteristics)

Page 5: Day 2: Core  statistics 101

5

Collecting Data

Secondary SourcesData Compilation

Observation

Experimentation

Print or Electronic

Survey

Primary SourcesData Collection

Page 6: Day 2: Core  statistics 101

Sampling

What is a sample? A sample is “a small part or quantity intended to show

what the whole is like”Why do we use samples rather than the

population?

Page 7: Day 2: Core  statistics 101

7

Descriptive Statistics

Collect data e.g., Survey

Present data e.g., Tables and graphs

Characterize data e.g., Sample mean =

iXn

Page 8: Day 2: Core  statistics 101

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

XX

n

ii

1

Midpoint of ranked values

Most frequently observed value

Page 9: Day 2: Core  statistics 101

9

Mean

The most common measure of central tendencyMean = sum of values divided by the number of

valuesAffected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

155

54321

4520

5104321

Page 10: Day 2: Core  statistics 101

10

Median

In an ordered array, the median is the “middle” number (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Page 11: Day 2: Core  statistics 101

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of

the two middle numbers

Note that is not the value of the median, only the position of the median in the ranked data

dataorderedtheinposition2

1npositionMedian

21n

Page 12: Day 2: Core  statistics 101

12

Mode

A measure of central tendencyValue that occurs most oftenNot affected by extreme valuesUsed for either numerical or categorical

(nominal) dataThere may be no modeThere may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 13: Day 2: Core  statistics 101

13

Five houses on a hill by the beach

Review Example

$2,000 K

$500 K

$300 K

$100 K

$100 K

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Page 14: Day 2: Core  statistics 101

14

Review Example: Summary Statistics

Mean: ($3,000,000/5) = $600,000

Median: middle value of ranked data = $300,000

Mode: most frequent value = $100,000

House Prices:

$2,000,000 500,000 300,000 100,000 100,000Sum $3,000,000

Page 15: Day 2: Core  statistics 101

Mean, median, mode and range

Mean = the average valueMedian = the middle value in an ordered list of dataMode= the most common valueRange = difference between highest and lowest value

Example: If we calculated the height of a class and we found:

In cm: 160, 162, 164, 164, 165, 165, 165, 180, 190Mean = (160+160+162+163+164+164+165+165+165+180+190)/9 = 167Median = 160+160+162+163+164+164+165+165+165+180+190 = 164Mode= 160+160+162+163+164+164+165+165+165+180+190 =165Range= 190 – 160 =30

If you are still confused about how to calculate the mean, median and mode,watch this 4min video on YouTube: http://www.youtube.com/watch?v=k3aKKasOmIw

Page 16: Day 2: Core  statistics 101

16

Mean is generally used, unless extreme values (outliers) exist

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be

reported for a region – less sensitive to outliers

Which measure of location is the “best”?

Page 17: Day 2: Core  statistics 101

17

Range

Simplest measure of variationDifference between the largest and the

smallest values in a set of data:

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Page 18: Day 2: Core  statistics 101

18

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12Range = 12 - 7 =

5

7 8 9 10 11 12Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Page 19: Day 2: Core  statistics 101

Getting from the real world to a distribution

When we collect data from the ‘real world’ we need to then represent it in numerically and graphically useful ways. This is where graphical analysis and numerical statistical analysis are helpful.

Say we went into one classroom and observed 22 students with the following reading and mathematics scores.

To help understand the distribution of performance in this class we will calculate the mean, median and mode and also create a histogram of the data. (Do UDM Tut1) UDM Tutorial 1 – Mean, median, mode

student_idreading_sco

re math_score1 508 4832 437 4543 378 4544 355 4695 388 3536 378 4397 399 4398 437 4549 447 469

10 355 45411 399 42412 490 48313 437 46914 419 35315 516 53516 456 43917 525 52218 447 35319 437 45420 456 45421 456 42422 551 454

Page 20: Day 2: Core  statistics 101

Mean Median Mode

Page 21: Day 2: Core  statistics 101

Create a histogram

To create a histogram. Ensure that your analysis module in Excel is enabled

FileOptionsAdd-InsAnalysis ToolPak (click Analysis ToolPak and click “Go” at the bottom

Under the “Data” tab in Excel you should now have a button which says “Data Analysis” on the far right

Click “Data Analysis” Click “Histogram” Highlight the reading marks for input rangehighlight the Bin ranges for bin rangeClick OK

Relabel the Bin ranges 0-299, 300-399, 400-449 and so on. Insert graph.If you are still confused about how to create a histogram in Excel watch this 4min video on YouTube: http://www.youtube.com/watch?v=RyxPp22x9PU

Page 22: Day 2: Core  statistics 101

The normal distribution

In a perfect normal distribution the mean, median and mode are equal to each other – 75 here.

Page 23: Day 2: Core  statistics 101

Skewness

Negative/Left skew

Positive/Right skew

TIP: To remember if it is positive skew or negative skew, think of the distribution like a door-stop. Does the door touch the positive side or the negative side of the distribution?

Page 24: Day 2: Core  statistics 101

24

Shape of a Distribution

Describes how data are distributedMeasures of shape

Symmetric or skewed

Mean = Median Mean < Median Median < MeanRight-SkewedLeft-Skewed Symmetric

Page 25: Day 2: Core  statistics 101

Positive and negative skew

Page 26: Day 2: Core  statistics 101

Example question

For this graph will: The mean > mode? The median <

mean? The mean = mode? The mean =

median?

Page 27: Day 2: Core  statistics 101

Example question

For this graph will: The mean > mode? The median <

mean? The mean = mode? The mean =

median?

The “highest” point in the distribution is always the mode…

Page 28: Day 2: Core  statistics 101

Tutorial quiz 1

Go to http://quizstar.4teachers.org/indexs.jsp Enter your username and passwordClick on “Basic Stats 101” Quiz and complete the

quizIf you have any questions raise your hand and I will

come and help you

For those not already registered you can register as a student on http://quizstar.4teachers.org/indexs.jsp and then search for my class  ”UDM Msc Education” anyone can join the class

Page 29: Day 2: Core  statistics 101

End of Lecture 1

For questions email me at [email protected]

All slides/tutorials available at www.nicspaull.com/teaching

Page 30: Day 2: Core  statistics 101

30Exploratory Data Analysis

Box-and-Whisker Plot: A Graphical display of data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

Minimum 1st Median 3rd Maximum Quartile Quartile

Minimum 1st Median 3rd Maximum Quartile Quartile

25% 25% 25% 25%

Page 31: Day 2: Core  statistics 101

31Shape of Box-and-Whisker Plots

The Box and central line are centered between the endpoints if data are symmetric around the median

A Box-and-Whisker plot can be shown in either vertical or horizontal format

Min Q1 Median Q3 Max

Page 32: Day 2: Core  statistics 101

32

Distribution Shape and Box-and-Whisker Plot

Right-SkewedLeft-Skewed Symmetric

Q1 Q2Q3 Q1Q2Q3 Q1 Q2 Q3