43
z z -Based Confidence Intervals -Based Confidence Intervals for a Population Mean: for a Population Mean: Known Known t t -Based Confidence Intervals -Based Confidence Intervals for a Population Mean: for a Population Mean: Unknown Unknown Sample Size Determination( Sample Size Determination( 样样 样样 样样样 样样样 ) ) Confidence Intervals for a Confidence Intervals for a Population Proportion Population Proportion Chapter 7 Confidence Intervals ( 样样样样 )

z-Based Confidence Intervals for a Population Mean: Known t-Based Confidence Intervals for a Population Mean: Unknown Sample Size Determination(

Embed Size (px)

Citation preview

Page 1: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

zz-Based Confidence Intervals for a -Based Confidence Intervals for a

Population Mean: Known Population Mean: Known

tt-Based Confidence Intervals for a -Based Confidence Intervals for a

Population Mean: UnknownPopulation Mean: Unknown

Sample Size Determination(Sample Size Determination( 样本样本量计算量计算 ))

Confidence Intervals for a Confidence Intervals for a

Population ProportionPopulation Proportion

Chapter 7 Confidence Intervals(置信区间 )

Page 2: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Page 3: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Statistical Inferences

Statistical Inferences (统计推断 ) –Estimation (估计 )

• Point estimation (点估计 )

• Confidence interval estimation (置信区间的估计 )

–Hypothesis-testing (假设检验 )

Page 4: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example

The value of the population mean is not known. Our best estimate of this value is the sample mean of 24.0 hours. This value is called a point estimate.

The Dean of the Business School wants to estimate the mean number of hours worked per week by students. A sample of 49 students showed a mean of 24 hours with a standard deviation of 4 hours. What is the population mean?

Page 5: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

A confidence interval (CI) is a range of values within which the population parameter is expected to occur at a specified probability (confidence level).

The two confidence levels that are used extensively are the 95% and the 99%.

An Interval Estimate states the range within which a population parameter probably lies.

A point estimate is a single value (statistic) used to estimate a population value (parameter).

Page 6: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

z-Based Confidence Intervals for a z-Based Confidence Intervals for a Population MeanPopulation Mean

Consider the Consider the sampling distributionsampling distribution of the of the sample meansample mean

– Recall from Chapter 6 that if a population is normally Recall from Chapter 6 that if a population is normally distributed with meandistributed with mean and standard deviation , then the sampling distribution of is normal with mean

= and standard deviation

Even if the population being sampled is non-normal, when the sample size n is large, the sampling distribution of sample means is again approximately normal (by the Central Limit Theorem)

x

x

nx

Page 7: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• Recall the empirical rule, so…

– 68.26% of all possible sample means are within one standard deviation of the population mean

– 95.44% of all possible sample means are within two standard deviations of the population mean

– 99.73% of all possible sample means are within three standard deviations of the population mean

Page 8: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.1Example 7.1 The Car Mileage CaseThe Car Mileage Case• Recall that the population of car mileages is normally

distributed with mean We assume that the standard deviation is known : = 0.8 mpg– Note that is unknown and is to be estimated

• Taking samples of size n = 5, the sampling distribution of sample mean mileages is normal with mean = (which is also unknown) and standard deviation

• The probability is 0.9544 that will be within plus or minus 2 = 2 • 0.35777 = 0.7155 of

0.357775

80

.

nx

x

x

x

Page 9: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• Then there is a 0.9544 probability that will be a value

so that interval [ ± 0.7155] containscontains – In other words

P(– 0.7155 ≤ ≤ + 0.7155) = 0.9544– The interval [ ± 0.7155] is referred to as the 95.44%

confidence interval for – We take three samples with means = 31.3, 31.72,

and 32.5, respectively. Suppose actually

The Car Mileage Case The Car Mileage Case ## 22

x

x x

x

x

x

]7155.07155.0[

]7155.07155.0[

XX

X

Page 10: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

The Car Mileage Case The Car Mileage Case ## 33

95.44% Confidence Intervals for

• Three intervals shown

• Two contain

• One does not

1x 2x 3x

Page 11: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• According to the 95.44% confidence interval, we know even before we sample that of all the possible samples that could be selected …

• … There is 95.44% probability that the sample mean that is calculated is such that the interval[ ± 0.7155] will contain the actual (but unknown) population mean – In other words, of all possible sample means,

95.44% of all the corresponding intervals will contain the population mean

– Note that there is a 4.56% probability that the interval does not contain

• The sample mean is either too high or too low

x

Page 12: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• In the example, we found the probability that is contained in an interval of integer multiples of

• More usual to specify the (integer) probability and find the corresponding number of

• The probability that the confidence interval will not contain the population mean is denoted by– In the example, = 0.0456

GeneralizingGeneralizing

x

x

Page 13: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Generalizing Generalizing ContinuedContinued

• The probability that the confidence interval will contain the population mean is denoted by– 1 – is referred to as the confidence coefficient– (1 – ) 100% is called the confidence level– In the example, 1 – = 0.9544

• Usual to use two decimal point probabilities for 1 – – Here, focus on 1 – = 0.95 or 0.99

Page 14: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

General Confidence IntervalGeneral Confidence Interval• In general, the probability is 1 – that the population

mean is contained in the intervalis contained in the interval

– The normal point z/2 gives a right hand tail area under the standard normal curve equal to /2

– The normal point - z/2 gives a left hand tail area under the standard normal curve equal to /2

– The area under the standard normal curve between

-z/2 and z/2 is 1 –is 1 –

n

zxzx x

22

Page 15: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Page 16: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

z-Based Confidence Intervals for aMean with Known

• If a population has standard deviation (known),

• and if the population is normal or if sample size is large (n 30), then …

• … a )100% confidence interval for is

nzx,

nzx

nzx 222

Page 17: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

95% Confidence Level95% Confidence Level• For a 95% confidence level,

1 – = 0.95= 0.05 /2 = 0.025

• For 95% confidence, need the normal point z0.025

• The area under the standard normal curve between -z0.025 and z0.025 is 0.95

• Then the area under the standard normal curve between 0 and z0.025 is 0.475

• From the standard normal table, the area is 0.475 for z = 1.96

• Then z0.025 = 1.96

Page 18: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

The Effect of on ConfidenceInterval Width

z/2 = z0.025 = 1.96 z/2 = z0.005 = 2.575

Page 19: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

95% Confidence Interval95% Confidence IntervalThe 95% confidence interval isThe 95% confidence interval is

n.x,

n.x

n.xzx x.

961961

9610250

Page 20: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

99% Confidence Interval99% Confidence Interval• For 99% confidence, need the normal point

z0.005

• Reading between table entries in the standard normal table, the area is 0.495 for z0.005 = 2.575

• The 99% confidence interval is

n.x,

n.x

n.xzx x.

57525752

57520250

Page 21: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.2Example 7.2 The Car Mileage CaseThe Car Mileage Case

Given: = 31.5531 mpg = 0.8 mpgn = 49

95% Confidence Interval:

78313331

224055313149

809615531310250

.,.

..

...

nzx .

99% Confidence Interval:

85312631

294055313149

8057525531310050

.,.

..

...

nzx .

x

Page 22: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• The 99% confidence interval is slightly wider than the The 99% confidence interval is slightly wider than the 95% confidence interval95% confidence interval– The higher the confidence level, the wider the The higher the confidence level, the wider the

intervalinterval• Reasoning from the intervals:Reasoning from the intervals:

– The target mean mileage should be at least 31 mpgThe target mean mileage should be at least 31 mpg– Both confidence intervals exceed this targetBoth confidence intervals exceed this target– According to the 95% confidence interval, we can According to the 95% confidence interval, we can

be 95% confident that the mileage is between 31.33 be 95% confident that the mileage is between 31.33 and 31.78 mpgand 31.78 mpg

– So we can be 95% confident that, on average, the So we can be 95% confident that, on average, the mean mileage exceeds the target by at least 0.33 mean mileage exceeds the target by at least 0.33 mpg and at most 0.78 mpgmpg and at most 0.78 mpg

Page 23: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Section 7.2 Section 7.2 tt-Based Confidence -Based Confidence Intervals for a Population Mean Intervals for a Population Mean

• If is unknown (which is usually the case), we can construct a confidence interval for based on the sampling distribution of

• If the population is normal, then for any sample size n, this sampling distribution is called the t distribution

(t分布 )

ns

xt

Page 24: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

The t Distribution(t 分布 )The t Distribution(t 分布 )• The curve of the The curve of the tt distribution is similar to that of the distribution is similar to that of the

standard normal curvestandard normal curve– Symmetrical and bell-shapedSymmetrical and bell-shaped– The The t t distribution is more spread out than the distribution is more spread out than the

standard normal distributionstandard normal distribution– The spread of the The spread of the tt is given by the is given by the number of number of

degrees of freedom(degrees of freedom( 自由度自由度 ))• Denoted by Denoted by dfdf• For a sample of size For a sample of size nn, there are one fewer , there are one fewer

degrees of freedom, that is,degrees of freedom, that is,

dfdf = = nn – 1 – 1

Page 25: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Degrees of Freedom and thet-Distribution

As the number of degrees of freedom increases, the spread of the t distribution decreases and the t curve approaches the standard normal curve

Page 26: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

The t Distribution and Degrees of Freedom

The t Distribution and Degrees of Freedom

• For a t distribution with n – 1 degrees of freedom, – As the sample size n increases, the degrees of

freedom also increases– As the degrees of freedom increase, the spread of

the t curve decreases– As the degrees of freedom increases indefinitely,

the t curve approaches the standard normal curve• If n ≥ 30, so df = n – 1 ≥ 29, the t curve is very

similar to the standard normal curve

Page 27: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

t and Right Hand Tail Areast and Right Hand Tail Areas• Use a t point denoted by t

– t is the point on the horizontal axis under the t curve that gives a right hand tail equal to

– So the value of t in a particular situation depends in a particular situation depends

on the right hand tail areaon the right hand tail area and the number of degrees of freedom• dfdf = = nn – 1 – 1 = 1 – , where 1 – is the specified

confidence coefficient

Page 28: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Page 29: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• Rows correspond to the different values of df• Columns correspond to different values of a• See Table 7.3, Tables A.4 and A.20 in Appendix A and the

table on the inside cover– Table 7.3 and A.4 gives t points for df 1 to 30, then for df =

40, 60, 120, and ∞• On the row for ∞, the t points are the z points

– Table A.20 gives t points for df from 1 to 100• For df greater than 100, t points can be approximated by

the corresponding z points on the bottom row for df = ∞– Always look at the accompanying figure for guidance on

how to use the table

Using the Using the t t Distribution TableDistribution Table

Page 30: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Find t for a sample of size n = 15 and right hand tail area of 0.025– For n = 15, df = 14– = 0.025

• Note that = 0.025 corresponds to a confidence level of 0.95

– In Table 7.3, along row labeled 14 and under column labeled 0.025, read a table entry of 2.145

– So t = 2.145

Example 7.3Example 7.3

Page 31: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

t-Based Confidence Intervals for aMean: Unknown

If the sampled population is normally distributed with mean , then a )100% confidence interval for is

n

stx 2

t/2 is the t point giving a right-hand tail area of /2 under the t curve having n – 1 degrees of freedom

Page 32: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.4Example 7.4 Debt-to-Equity RatioDebt-to-Equity Ratio• Estimate the mean debt-to-equity ratio of the loan

portfolio of a bank• Select a random sample of 15 commercial loan accounts

– Box plot is given in figure below• Know: = 1.34 s = 0.192

n = 15• Want a 95% confidence interval for the ratio• Assume all ratios are normally distributed but unknown

x

Page 33: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

• Have to use the t distribution• At 95% confidence,

• 1 – = 0.95 so = 0.05 and /2 = 0.025• For n = 15,

• df = 15 – 1 = 14• Use the t table to find t/2 for df = 14

• t/2 = t0.025 = 2.145 for df = 14• The 95% confidence interval:

44912371

1060343115

1920145234310250

.,.

..

...

n

stx .

Page 34: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Section 7.3 Sample Size Section 7.3 Sample Size Determination Determination

If is known, then a sample of size

2

2

E

zn

Letting E denote the desired margin of error, so that is within E units of , with 100(1-)% confidence.

x

Page 35: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

If If is unknown and is estimated from is unknown and is estimated from ss, then a sample , then a sample of sizeof size

2

2

E

stn

so that is within E units of , with 100(1-)% confidence. The number of degrees of freedom for the t/2 point is the size of the preliminary sample minus 1

x

Page 36: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.5Example 7.5 Car Mileage CaseCar Mileage CaseWe wish to find the sample size that is needed to make the margin of error in a 95% confidence interval for mean equal to 0.3.We regard the previously discussed sample of five mileages as a preliminary sample, and calculated the s=0.7583 from 5 sample cars.

We find that the appropriate sample size is:

24.493.0

7583.0776.222

025.0

E

tn

Rounding up, we employ a sample of size 50.

Page 37: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.5Example 7.5 Car Mileage CaseCar Mileage Case

Given: = 31.5531 mpgs = 0.7992 mpgn = 49

t- based 95% Confidence Interval:

where t/2 = t0.025 = 2.0106 for df = 49 – 1 = 48 degrees of freedom

78.31,32.31

2296.05531.3149

7992.00106.25531.31025.0

n

stx

x

Note: the error bound B = 0.2296 mpg, within the maximum error of 0.3 mpg

Page 38: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Section 7.4 Confidence Intervals for a Section 7.4 Confidence Intervals for a Population Proportion Population Proportion

If the sample size n is large*, then a )100% confidence interval for p is

n

p̂p̂zp̂

12

* Here n should be considered large if both

5ˆ15ˆ pnandpn

Page 39: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.6Example 7.6 Phe-Mycin Side EffectsPhe-Mycin Side EffectsGiven: Given: nn = 200, 35 patients experience nausea. = 200, 35 patients experience nausea.

1750200

35.p̂

Note:

so both quantities are > 5

165825.0200ˆ1

35175.0200ˆ

pn

pn

For 95% confidence, z/2 = z0.025 = 1.96 and

22801220

05301750

200

825017509611750

12

.,.

..

....

n

p̂p̂zp̂

Page 40: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Determining Sample Size forConfidence Interval for p

A sample size

2

21

E

zppn

will yield an estimate , precisely within E units of p, with 100(1-)% confidence

Note that the formula requires a preliminary estimate of p. The conservative value of p = 0.5 is generally used when there is no prior information on p

Page 41: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

A Comparison of ConfidenceIntervals and Tolerance Intervals

A tolerance interval contains a specified percentage of individual population measurements

• Often 68.26%, 95.44%, 99.73%

A confidence interval is an interval that contains the population mean , and the confidence level expresses how sure we are that this interval contains • Often confidence level is set high (e.g., 95% or 99%)

– Because such a level is considered high enough to provide convincing evidence about the value of

Page 42: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Example 7.7Example 7.7 Car Mileage CaseCar Mileage CaseTolerance intervals shown:

[± s] contains 68% of all individual cars

[ ± 2s] contains 95.44% of all individual cars

[ ± 3s] contains 99.73% of all individual cars

The t-based 95% confidence interval for is [31.32, 31.78], so we can be 95% confident that lies between 31.23 and 31.78 mpg

x

x

x

Page 43: z-Based Confidence Intervals for a Population Mean: Known  t-Based Confidence Intervals for a Population Mean: Unknown  Sample Size Determination(

Chapter 7 Confidence Intervals

Summary: Selecting an AppropriateConfidence Interval for a Population Mean