34
1 統統統 Spring 2004 授授授授 授授授授授授 授授2004 授 3 授 23 授 授授授 授授授授授授授授授

統計學 Spring 2004

Embed Size (px)

DESCRIPTION

統計學 Spring 2004. 授課教師:統計系余清祥 日期:2004年3月23日 第六週:配適度與獨立性檢定. Chapter 12 Tests of Goodness of Fit and Independence. Goodness of Fit Test: A Multinomial Population Tests of Independence: Contingency Tables - PowerPoint PPT Presentation

Citation preview

Page 1: 統計學                       Spring 2004

1 1 Slide Slide

統計學 Spring 2004

授課教師:統計系余清祥 日期: 2004 年 3 月 23 日 第六週:配適度與獨立性檢定

Page 2: 統計學                       Spring 2004

2 2 Slide Slide

Chapter 12Chapter 12 Tests of Goodness of Fit and Tests of Goodness of Fit and

IndependenceIndependence Goodness of Fit Test: A Multinomial Goodness of Fit Test: A Multinomial

Population Population Tests of Independence: Contingency TablesTests of Independence: Contingency Tables Goodness of Fit Test: Poisson and Normal Goodness of Fit Test: Poisson and Normal

DistributionsDistributions

Page 3: 統計學                       Spring 2004

3 3 Slide Slide

Goodness of Fit Test:Goodness of Fit Test:A Multinomial PopulationA Multinomial Population

1. 1. Set up the null and alternative hypotheses.Set up the null and alternative hypotheses.

2. Select a random sample and record the 2. Select a random sample and record the observedobserved

frequency, frequency, ffi i , for each of the , for each of the kk categories. categories.

3. Assuming 3. Assuming HH00 is true, compute the expected is true, compute the expected frequency, frequency, eei i , in each category by multiplying , in each category by multiplying the category probability by the sample size.the category probability by the sample size.

continuedcontinued

Page 4: 統計學                       Spring 2004

4 4 Slide Slide

Goodness of Fit Test:Goodness of Fit Test:A Multinomial PopulationA Multinomial Population

4. 4. Compute the value of the test statistic.Compute the value of the test statistic.

5. Reject 5. Reject HH00 if if

(where (where is the significance level and there is the significance level and there are are kk - 1 degrees of freedom). - 1 degrees of freedom).

22

1

( )f ee

i i

ii

k2

2

1

( )f ee

i i

ii

k

2 2 2 2

Page 5: 統計學                       Spring 2004

5 5 Slide Slide

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

Finger Lakes Homes manufactures four models Finger Lakes Homes manufactures four models of of

prefabricated homes, a two-story colonial, a prefabricated homes, a two-story colonial, a ranch, aranch, a

split-level, and an A-frame. To help in productionsplit-level, and an A-frame. To help in production

planning, management would like to determine ifplanning, management would like to determine if

previous customer purchases indicate that there previous customer purchases indicate that there is ais a

preference in the style selected.preference in the style selected.

Page 6: 統計學                       Spring 2004

6 6 Slide Slide

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit Multinomial Distribution Goodness of Fit TestTest

The number of homes sold of each The number of homes sold of each model for 100model for 100

sales over the past two years is shown sales over the past two years is shown below.below.

Model Colonial Ranch Split-Level A-Model Colonial Ranch Split-Level A-FrameFrame

# Sold# Sold 30 30 20 35 20 35 15 15

Page 7: 統計學                       Spring 2004

7 7 Slide Slide

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

• NotationNotation

ppCC = popul. proportion that purchase a colonial = popul. proportion that purchase a colonial

ppRR = popul. proportion that purchase a ranch = popul. proportion that purchase a ranch

ppSS = popul. proportion that purchase a split-level = popul. proportion that purchase a split-level

ppAA = popul. proportion that purchase an A-frame = popul. proportion that purchase an A-frame

• HypothesesHypotheses

HH00: : ppCC = = ppRR = = ppSS = = ppAA = .25 = .25

HHaa: The population proportions are not: The population proportions are not

ppCC = .25, = .25, ppRR = .25, = .25, ppSS = .25, and = .25, and ppAA = .25 = .25

Page 8: 統計學                       Spring 2004

8 8 Slide Slide

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

• Expected FrequenciesExpected Frequencies

ee11 = .25(100) = 25 = .25(100) = 25 ee22 = .25(100) = = .25(100) = 2525

ee33 = .25(100) = 25 = .25(100) = 25 ee44 = .25(100) = = .25(100) = 2525

• Test StatisticTest Statistic

= 1 + 1 + 4 + 4 = 1 + 1 + 4 + 4

= 10= 10

22 2 2 230 25

2520 25

2535 25

2515 25

25 ( ) ( ) ( ) ( )2

2 2 2 230 2525

20 2525

35 2525

15 2525

( ) ( ) ( ) ( )

Page 9: 統計學                       Spring 2004

9 9 Slide Slide

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

• Rejection RuleRejection Rule

With With = .05 = .05 andand

kk - 1 = 4 - 1 = 3 - 1 = 4 - 1 = 3

degrees of freedomdegrees of freedom

22

7.81 7.81

Do Not Reject H0Do Not Reject H0 Reject H0Reject H0

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Page 10: 統計學                       Spring 2004

10 10 Slide Slide

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

• ConclusionConclusion

22 = 10 > 7.81, so we = 10 > 7.81, so we rejectreject the the assumption there isassumption there is

no home style preference, at the .05 no home style preference, at the .05 level of level of significance.significance.

Page 11: 統計學                       Spring 2004

11 11 Slide Slide

Test of Independence: Contingency Test of Independence: Contingency TablesTables

1. 1. Set up the null and alternative hypotheses.Set up the null and alternative hypotheses.

2. Select a random sample and record the 2. Select a random sample and record the observedobserved

frequency, frequency, ffij ij , for each cell of the contingency , for each cell of the contingency table.table.

3. Compute the expected frequency, 3. Compute the expected frequency, eeij ij , for , for each cell. each cell.

ei j

ij (Row Total )(Column Total )

Sample Sizee

i jij

(Row Total )(Column Total ) Sample Size

Page 12: 統計學                       Spring 2004

12 12 Slide Slide

Test of Independence: Contingency Test of Independence: Contingency TablesTables

4. 4. Compute the test statistic.Compute the test statistic.

5. Reject 5. Reject HH00 if (where if (where is the is the significance level and with significance level and with nn rows and rows and mm columns there are columns there are

((nn - 1)( - 1)(mm - 1) degrees of freedom). - 1) degrees of freedom).

22

( )f e

eij ij

ijji2

2

( )f e

eij ij

ijji

2 2 2 2

Page 13: 統計學                       Spring 2004

13 13 Slide Slide

Example: Finger Lakes Homes (B)Example: Finger Lakes Homes (B)

Contingency Table (Independence) TestContingency Table (Independence) Test Each home sold can be classified according to Each home sold can be classified according to

price and to style. Finger Lakes Homes’ manager price and to style. Finger Lakes Homes’ manager would like to determine if the price of the home and would like to determine if the price of the home and the style of the home are independent variables.the style of the home are independent variables.

The number of homes sold for each model and The number of homes sold for each model and price for the past two years is shown below. For price for the past two years is shown below. For convenience, the price of the home is listed as either convenience, the price of the home is listed as either $65,000 or less $65,000 or less or or more than $65,000more than $65,000..

Price Colonial Ranch Split-Level A-Price Colonial Ranch Split-Level A-FrameFrame

<< $65,000 18 $65,000 18 6 19 6 19 12 12

> $65,000 12 14 16 > $65,000 12 14 16 33

Page 14: 統計學                       Spring 2004

14 14 Slide Slide

Contingency Table (Independence) TestContingency Table (Independence) Test

• HypothesesHypotheses

HH00: Price of the home : Price of the home isis independent of independent of the style the style of the home that is purchased of the home that is purchased

HHaa: Price of the home : Price of the home is notis not independent of theindependent of the

style of the home that is purchasedstyle of the home that is purchased

• Expected FrequenciesExpected Frequencies

PricePrice Colonial Ranch Split-Level A-Frame Colonial Ranch Split-Level A-Frame Total Total

<< $99K 18 $99K 18 6 19 12 6 19 12 55 55

> $99K 12 14 16 > $99K 12 14 16 3 45 3 45

Total 30 20 35 Total 30 20 35 15 10015 100

Example: Finger Lakes Homes (B)Example: Finger Lakes Homes (B)

Page 15: 統計學                       Spring 2004

15 15 Slide Slide

Contingency Table (Independence) TestContingency Table (Independence) Test

• Test StatisticTest Statistic

= .1364 + 2.2727 + . . . + 2.0833 = = .1364 + 2.2727 + . . . + 2.0833 = 9.14869.1486

• Rejection RuleRejection Rule

With With = .05 = .05 and (2 - 1)(4 - 1) = 3 d.f., and (2 - 1)(4 - 1) = 3 d.f.,

Reject Reject HH00 if if 22 > 7.81 > 7.81

• ConclusionConclusion

We reject We reject HH00, the assumption that the price , the assumption that the price of the of the home is independent of the style of the home is independent of the style of the home home that is purchased.that is purchased.

22 2 218 16 5

16 56 11

113 6 75

6 75 ( . )

.( )

. .( . )

. . 2

2 2 218 16 516 5

6 1111

3 6 756 75

( . ).

( ). .

( . ).

.

. .052 7 81. .052 7 81

Example: Finger Lakes Homes (B)Example: Finger Lakes Homes (B)

Page 16: 統計學                       Spring 2004

16 16 Slide Slide

Goodness of Fit Test: Poisson DistributionGoodness of Fit Test: Poisson Distribution

1. 1. Set up the null and alternative hypotheses.Set up the null and alternative hypotheses.

2. Select a random sample and2. Select a random sample and

a. Record the observed frequency, a. Record the observed frequency, ffi i , for each , for each of theof the

k k values of the Poisson random variable.values of the Poisson random variable.

b. Compute the mean number of occurrences, b. Compute the mean number of occurrences, ..

3. Compute the expected frequency of 3. Compute the expected frequency of occurrences, occurrences, eei i , for each value of the Poisson , for each value of the Poisson random variable.random variable.

continuedcontinued

Page 17: 統計學                       Spring 2004

17 17 Slide Slide

Goodness of Fit Test: Poisson DistributionGoodness of Fit Test: Poisson Distribution

4. 4. Compute the value of the test statistic.Compute the value of the test statistic.

5. Reject 5. Reject HH00 if if

(where (where is the significance level is the significance level and there and there are are kk - 2 degrees of freedom). - 2 degrees of freedom).

22

1

( )f ee

i i

ii

k2

2

1

( )f ee

i i

ii

k

2 2 2 2

Page 18: 統計學                       Spring 2004

18 18 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

In studying the need for an additional In studying the need for an additional entrance to a city parking garage, a consultant entrance to a city parking garage, a consultant has recommended an approach that is has recommended an approach that is applicable only in situations where the number applicable only in situations where the number of cars entering during a specified time period of cars entering during a specified time period follows a Poisson distribution.follows a Poisson distribution.

Page 19: 統計學                       Spring 2004

19 19 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

A random sample of 100 one-minute time A random sample of 100 one-minute time intervals resulted in the customer arrivals intervals resulted in the customer arrivals listed below. A statistical test must be listed below. A statistical test must be conducted to see if the assumption of a conducted to see if the assumption of a Poisson distribution is reasonable.Poisson distribution is reasonable.

# Arrivals# Arrivals 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 11 1210 11 12

Frequency 0 1 4 10 14 20 12 12 9 8 Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1 6 3 1

Page 20: 統計學                       Spring 2004

20 20 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

• HypothesesHypotheses

HH00: Number of cars entering the garage : Number of cars entering the garage duringduring

a one-minute interval is Poisson a one-minute interval is Poisson distributed.distributed.

HHaa: Number of cars entering the garage : Number of cars entering the garage during a during a one-minute interval is one-minute interval is notnot Poisson distributed Poisson distributed

Page 21: 統計學                       Spring 2004

21 21 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

• Estimate of Poisson Probability FunctionEstimate of Poisson Probability Function

otal Arrivals = 0(0) + 1(1) + 2(4) + otal Arrivals = 0(0) + 1(1) + 2(4) + . . .. . . + + 12(1) = 60012(1) = 600

Total Time Periods = 100Total Time Periods = 100

Estimate of Estimate of = 600/100 = 6 = 600/100 = 6

Hence, Hence, f x

ex

x

( )!

6 6

f xex

x

( )!

6 6

Page 22: 統計學                       Spring 2004

22 22 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

•Expected FrequenciesExpected Frequencies

xx f f ((x x )) xf xf ((x x )) xx f f ((x x )) xf xf ((x x ))

00 .0025.0025 .25 .25 7 7 .1389.1389 13.8913.89

11 .0149.0149 1.49 1.49 8 8 .1041.1041 10.4110.41

22 .0446.0446 4.46 4.46 9 9 .0694.0694 6.946.94

33 .0892.0892 8.92 8.92 1010 .0417.0417 4.174.17

44 .1339.1339 13.3913.39 1111 .0227.0227 2.272.27

55 .1620.1620 16.2016.20 1212 .0155.0155 1.551.55

66 .1606.1606 16.0616.06 Total Total 1.00001.0000 100.00100.00

Page 23: 統計學                       Spring 2004

23 23 Slide Slide

Example: Troy Parking GarageExample: Troy Parking Garage

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test• Observed and Expected FrequenciesObserved and Expected Frequencies

ii ffii eeii ffii - - eeii

0 or 1 or 20 or 1 or 2 55 6.206.20 -1.20-1.2033 1010 8.928.92 1.081.0844 1414 13.3913.39 .61.6155 2020 16.2016.20 3.803.8066 1212 16.0616.06 -4.06-4.0677 1212 13.8913.89 -1.89-1.8988 99 10.4110.41 -1.41-1.4199 88 6.946.94 1.061.06

10 or more10 or more 1010 7.997.99 2.012.01

Page 24: 統計學                       Spring 2004

24 24 Slide Slide

Poisson Distribution Goodness of Fit TestPoisson Distribution Goodness of Fit Test

• Test StatisticTest Statistic

• Rejection RuleRejection Rule

With With = .05 and = .05 and kk - - pp - 1 = 9 - 1 - 1 = 7 - 1 = 9 - 1 - 1 = 7 d.f. (where d.f. (where kk = number of categories and = number of categories and pp = number of population = number of population parameters parameters estimated), estimated),

Reject Reject HH00 if if 22 > 14.07 > 14.07

• ConclusionConclusion

We cannot reject We cannot reject HH00. There’s no reason to . There’s no reason to doubtdoubt the assumption of a Poisson distribution.the assumption of a Poisson distribution.

22 2 21 20

6 201 088 92

2 017 99

3 42 ( . ).

( . ).

. . .( . )

.. 2

2 2 21 206 20

1 088 92

2 017 99

3 42 ( . ).

( . ).

. . .( . )

..

. .052 14 07 . .052 14 07

Example: Troy Parking GarageExample: Troy Parking Garage

Page 25: 統計學                       Spring 2004

25 25 Slide Slide

Goodness of Fit Test: Normal DistributionGoodness of Fit Test: Normal Distribution

1. 1. Set up the null and alternative hypotheses.Set up the null and alternative hypotheses.

2. Select a random sample and2. Select a random sample and

a. Compute the mean and standard deviation.a. Compute the mean and standard deviation.

b. Define intervals of values so that the b. Define intervals of values so that the expectedexpected

frequency is at least 5 for each interval. frequency is at least 5 for each interval.

c. For each interval record the observed c. For each interval record the observed frequenciesfrequencies

3. Compute the expected frequency, 3. Compute the expected frequency, eei i , for each , for each interval.interval.

continuedcontinued

Page 26: 統計學                       Spring 2004

26 26 Slide Slide

Goodness of Fit Test: Normal DistributionGoodness of Fit Test: Normal Distribution

4. 4. Compute the value of the test statistic.Compute the value of the test statistic.

5. Reject 5. Reject HH00 if if

(where (where is the significance level is the significance level

and there are and there are kk - 3 degrees of freedom). - 3 degrees of freedom).

22

1

( )f ee

i i

ii

k2

2

1

( )f ee

i i

ii

k

2 2 2 2

Page 27: 統計學                       Spring 2004

27 27 Slide Slide

Example: Victor ComputersExample: Victor Computers

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

Victor Computers manufactures and sells aVictor Computers manufactures and sells a

general purpose microcomputer. As part of a general purpose microcomputer. As part of a study to evaluate sales personnel, study to evaluate sales personnel, management wants to determine if the annual management wants to determine if the annual sales volume (number of units sold by a sales volume (number of units sold by a salesperson) follows a normal probability salesperson) follows a normal probability distribution.distribution.

Page 28: 統計學                       Spring 2004

28 28 Slide Slide

Example: Victor ComputersExample: Victor Computers

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

A simple random sample of 30 of the A simple random sample of 30 of the salespeople was taken and their numbers of salespeople was taken and their numbers of units sold are below.units sold are below.

33 43 44 45 52 52 56 58 33 43 44 45 52 52 56 58 63 6463 64

64 65 66 68 70 72 73 73 64 65 66 68 70 72 73 73 74 7574 75

83 84 85 86 91 92 94 98 83 84 85 86 91 92 94 98 102 105102 105

(mean = 71, standard deviation = (mean = 71, standard deviation = 18.54)18.54)

Page 29: 統計學                       Spring 2004

29 29 Slide Slide

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

• HypothesesHypotheses

HH00: The population of number of units : The population of number of units sold sold has a normal distribution has a normal distribution with mean 71 with mean 71 and standard and standard deviation 18.54.deviation 18.54.

HHaa: The population of number of units : The population of number of units sold sold does does notnot have a normal have a normal distribution with distribution with mean 71 and standard mean 71 and standard deviation 18.54.deviation 18.54.

Example: Victor ComputersExample: Victor Computers

Page 30: 統計學                       Spring 2004

30 30 Slide Slide

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

• Interval DefinitionInterval Definition

To satisfy the requirement of an To satisfy the requirement of an expected expected

frequency of at least 5 in each interval frequency of at least 5 in each interval we we

will divide the normal distribution into will divide the normal distribution into 30/5 = 630/5 = 6

equal probability intervals.equal probability intervals.

Example: Victor ComputersExample: Victor Computers

Page 31: 統計學                       Spring 2004

31 31 Slide Slide

Example: Victor ComputersExample: Victor Computers

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

• Interval DefinitionInterval Definition

Areas = 1.00/6 = .1667

Areas = 1.00/6 = .1667

717153.0253.0263.0363.03 78.9778.97

88.98 = 71 + .97(18.54)88.98 = 71 + .97(18.54)

Page 32: 統計學                       Spring 2004

32 32 Slide Slide

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

• Observed and Expected FrequenciesObserved and Expected Frequencies

ii ffii eeii ffii - - eeii

Less than 53.02Less than 53.02 66 55 11

53.02 to 63.0353.02 to 63.03 33 55 -2-2

63.03 to 71.0063.03 to 71.00 66 55 11

71.00 to 78.9771.00 to 78.97 55 55 00

78.97 to 88.9878.97 to 88.98 44 55 -1-1

More than 88.98More than 88.98 66 55 11

TotalTotal 3030 3030

Example: Victor ComputersExample: Victor Computers

Page 33: 統計學                       Spring 2004

33 33 Slide Slide

Normal Distribution Goodness of Fit TestNormal Distribution Goodness of Fit Test

• Test StatisticTest Statistic

• Rejection RuleRejection Rule

With With = .05 and = .05 and kk - - pp - 1 = 6 - 2 - 1 = 3 d.f., - 1 = 6 - 2 - 1 = 3 d.f.,

Reject Reject HH00 if if 22 > 7.81 > 7.81

• ConclusionConclusion

We cannot reject We cannot reject HH00. There is little evidence . There is little evidence to to

support rejecting the assumption the support rejecting the assumption the population population is normally distributed with is normally distributed with = 71 = 71 and and = 18.54. = 18.54.

22 2 2 2 2 21

525

15

05

15

15

1 60 ( ) ( ) ( ) ( ) ( ) ( ).2

2 2 2 2 2 215

25

15

05

15

15

1 60 ( ) ( ) ( ) ( ) ( ) ( ).

. .052 7 81 . .052 7 81

Example: Victor ComputersExample: Victor Computers

Page 34: 統計學                       Spring 2004

34 34 Slide Slide

End of Chapter 12End of Chapter 12