1 Chapter 7 Distributions of Sampling Statistics 1

1

Chapter 7 Chapter 7 Distributions of Distributions of Sampling Sampling StatisticsStatistics

1

2

Chapter 7 Distributions of Sampling Chapter 7 Distributions of Sampling StatisticsStatistics

7.1 A Preview7.2 Introduction7.3 Sample Mean7.4 Central Limit Theorem7.5 Sampling Proportions from a Finite

Population7.6 Distribution of the Sample Variance of

a Normal Population

2

3

A PreviewA Preview If you bet $1 on a number at a roulette table ( 輪盤賭桌 ) in

a U.S. casino( 賭場 ), then either you will win $35 if your number appears on the roulette wheel or you will lose $1 if it does not.

Since the wheel has 38 slots–numbered 0, 00, and each of the integers from 1 to 36—it follows that the probability that your number appears is 1/38.

As a result, your expected gain on the bet is

That is, your expected loss on each spin ( 旋轉 ) of the wheel is approximately 5.3 cents.

Suppose you continually place bets at the roulette table.How lucky do you have to be in order to be winning

money at the end of your play?

3

4

A PreviewA Preview It depends on how long you continue to play.

◦ After 100 plays you will be ahead with probability 0.4916.

◦ After 1000 plays your chance of being ahead drops to 0.39.

◦ After 100,000 plays not only will you almost certainly be losing (your probability of being ahead is approximately 0.002), but also you can be 95 percent certain that your average loss per play will be 5.26 ± 1.13 cents.

◦ …◦ If you play long enough, you will learn that the

average loss per game is around 5.26 cents.

5

Introduction One of the key concerns of statistics is the drawing of

conclusions from a set of observed data. ◦ These data consist of a sample of certain elements of

a population.◦ The objective is to use the sample to draw conclusions

about the entire population.

The assumptions to use sample data to make inferences about the values of the entire population:◦ There is an underlying probability distribution for the

population values.◦ The sample data are assumed to be independent

values from this distribution.

6

IntroductionDefinition (sample) If X1, . . . , Xn are independent random variables having a

common probability distribution, we say they constitute (構成 ) a sample from that distribution.

In most applications, the population distribution will not be completely known, and one will attempt to use the sample data to make inferences about it.

Two important statistics that we will consider are ◦ the sample mean and ◦ the sample variance.

7

Sample Mean The value associated with any member of the

population can be regarded as being the value of a random variable having expectation μ and variance σ 2.

The quantities μ and σ2 are called the population mean and the population variance, respectively.

Let X1, X2, . . . , Xn be a sample of values from this population.

The sample mean

Its expectation

The variance of the sample mean

8

Example 7.1 Let us check the preceding formulas for the expected

value and variance of the sample mean by considering a sample of size 2 from a population whose values are equally likely to be either 1 or 2.

The pair of values X1, X2 can assume any of four possible pairs of values (1, 1), (1, 2), (2, 1), (2, 2) .

9

Example 7.1

Figure 7.2 plots the population probability distribution alongside the probability distribution of the sample mean of a sample of size 2.

10

Discussion

The standard deviation of the sample mean is equal to the population standard deviation divided by the square root of the sample size.

Exercises Exercises (p. 303, 2,3)(p. 303, 2,3)

(Take home) Suppose that X1 and X2 constitute a sample of size 2 from a population in which a typical value X is equal to either 1 or 2 with respective probabilities

P{X = 1}=0.7 P{X = 2}=0.3(a) Compute E[X].(b) Compute Var(X).(c) What are the possible values of X=(X1+X2)/2?(d) Determine the probabilities that X assumes the values in (c).(e) Using (d), directly compute E[ ] and Var( ).

Consider a population whose probabilities are given by p(1)=p(2)=p(3)= 1/3(a) Determine E[X].(b) Determine SD(X).(c) Let denote the sample mean of a sample of size 2 from this population. Determine the possible values of along with their probabilities.(d) Use the result of part (c) to compute E[ ] and SD( ).

11

X X

XXX

X

12

Central Limit Theorem

The central limit theorem states that the sum of a large number of independent random variables is approximately normally distributed.

The central limit theorem partially explains why many data sets related to biological characteristics tend to be approximately normal.◦ For example, the central limit theorem can be used to explain why

the heights of the many daughters of a particular pair of parents will follow a normal curve.

PS. 中央極限定理指不論母群的分佈如何﹐其平均數的分佈都會傾向常態分佈﹐因此中央極限定理又稱為常態收斂定理。

13

Example 7.2 An insurance company has 10,000 (=104) automobile policyholders

( 汽車投保人 ). If the expected yearly claim ( 索賠 ) per policyholder is $260 with a

standard deviation of $800, approximate the probability that the total yearly claim exceeds $2.8 million (=$2.8 × 106).

Solution

14

Distribution of the Sample Mean Let X1, . . . , Xn be a sample from a population having

mean μ and variance σ2, and let

be the sample mean.

Since has expectation μ and standard deviation , the standardized variable

has an approximately standard normal distribution.

15

Example 7.3 The blood cholesterol ( 血液膽固醇 ) levels of a population of workers

have mean 202 and standard deviation 14.

(a) If a sample of 36 workers is selected, approximate the probability that the sample mean of their blood cholesterol levels will lie between 198 and 206.

(b) Repeat (a) for a sample size of 64.

Solution

16

Example 7.3

17

Example 7.4 An astronomer ( 天文學家 ) is interested in measuring, in units

of light-years, the distance from her observatory to a distant ( 遙遠的 ) star.

However, the astronomer knows that due to differing atmospheric conditions ( 大氣的情況 ) and normal errors, each time a measurement is made, it will yield not the exact distance, but an estimate of it.

As a result, she is planning on making a series of 10 measurements and using the average of these measurements as her estimated value for the actual distance.

If the values of the measurements constitute a sample from a population having mean d (the actual distance) and a standard deviation of 3 light-years, approximate the probability that the astronomer’s estimated value of the distance will be within 0.5 light-years of the actual distance.

18

Example 7.4 Solution

19

How Large a Sample Is Needed? The central limit theorem leaves open the question of how

large the sample size n needs to be for the normal approximation to be valid, and indeed the answer depends on the population distribution of the sample data. ◦ For instance, if the underlying population distribution is

normal, then the sample mean will also be normal, no matter what the sample size is.

◦ A general rule of thumb is that you can be confident of the normal approximation whenever the sample size n is at least 30.

Figure 7.3 presents the distribution of the sample means from a certain underlying population distribution (known as the exponential distribution) for samples sizes n = 1, 5, and 10.

Exercises Exercises (p. 311, 4; p.312, 10)(p. 311, 4; p.312, 10)

If you place a $1 bet on a number of a roulette wheel, then either you win $35, with probability 1/38, or you lose $1, with probability 37/38.

Let X denote your gain on a bet of this type.(a) Find E[X] and SD(X).Suppose you continually place bets of the preceding type. Show that(b) The probability that you will be winning after 1000 bets is approximately 0.39.(c) The probability that you will be winning after 100,000 bets is approximately 0.002

A six-sided die, in which each side is equally likely to appear, is repeatedly rolled until the total of all rolls exceeds 400. What is the approximate probability that this will require more than 140 rolls? (Hint: Relate this to the probability that the sum of the first 140 rolls is less than 400.)

20

21

Sampling Proportions from a Finite Population Consider a population of size N in which certain elements

have a particular characteristic of interest. Let p denote the proportion of the population having this

characteristic. So Np elements of the population have it and N(1 − p) do

not.

Example 7.5

Suppose that 60 out of a total of 900 students of a particular school are left handed.

If left-handedness is the characteristic of interest, then N = 900 and

p = 60/900 = 1/15.

22

Sampling Proportions from a Finite Population Definition A sample of size n selected from a population of N

elements is said to be a random sample if it is selected in such a manner that the sample chosen is equally likely to be any of the subsets of size n.

Suppose that a random sample of size n has been chosen from a population of size N.

For i = 1, . . . , n, let

Consider now the sum of the Xi

◦ Xi contributes 1 to the sum if the ith member of the sample has the characteristic and contributes 0 otherwise.

◦ The sum is equal to the number of members of the sample that possess the characteristic.

23

Sampling Proportions from a Finite Population Similarly, the sample mean

will equal the proportion of members of the sample who possess the characteristic.

Let us consider the probabilities associated with the statistic .

Since the ith member of the sample is equally likely to be any of the N members of the population, of which Np have the characteristic

each Xi is equal to either 1 or 0 with respective probabilities p and 1 − p.

24

Sampling Proportions from a Finite Population Note that the random variables X1, X2, . . . , Xn are not

independent.◦ For instance, without any knowledge of the outcome of

the first selection,

P {X2 = 1} = p

◦ However, the conditional probability that X2 = 1, given that the first selection has the characteristic, is

P {X2 = 1|X1 = 1} = (Np − 1)/( N − 1)

and

P {X2 = 1|X1 = 0} = (Np)/( N − 1)

Thus, knowing whether the first element of the random sample has the characteristic changes the probability for the next element.

However, when the population size N is large in relation to the sample size n, this change will be very slight.

25

Sampling Proportions from a Finite Population

When the population size N is large with respect to the sample size n, then X1, X2, . . . , Xn are approximately independent.

26

Sampling Proportions from a Finite Population Let X denote the number of members of the population

who have the characteristic, then ◦ if the population size N is large in relation to the

sample size n, then the distribution of X is approximately a binomial distribution with parameters n and p.

For the remainder of this text we will suppose that the underlying population is large in relation to the sample size, and we will take the distribution of X to be binomial.

The mean and standard deviation of a binomial random variable

27

Example 7.6 Suppose that 50 percent of the population is planning on

voting for candidate A in an upcoming election ( 選舉 ).

If a random sample of size 100 is chosen, then the proportion of those in the sample who favor candidate A has

expected value

E[X] = 0.50

and standard deviation

Exercises Exercises (p. 319, 1)(p. 319, 1)

Suppose that 60 percent of the residents of a city are in favor of teaching evolution ( 進化論 ) in high school.

Determine the mean and the standard deviation of the proportion of a random sample of size n that is in favor when

(a) n = 10 (c) n = 1, 000

28

29

Probabilities Associated with Sample Proportions: The Normal Approximation to the Binomial Distribution From an historical point of view, one of the most

important applications of the central limit theorem was in computing binomial probabilities.

Let X denote a binomial random variable having parameters n and p.

30

Probabilities Associated with Sample Proportions: The Normal Approximation to the Binomial Distribution

31

Example 7.7 Suppose that exactly 46 percent of the population favors

a particular candidate. If a random sample of size 200 is chosen, what is the

probability that at least 100 favor this candidate?

Solution If X is the number who favor the candidate, then X is a

binomial random variable with parameters n = 200 and p = 0.46.

The desired probability is P{X ≥ 100}.◦ Note that since the binomial is a discrete and the normal is a

continuous random variable, it is best to compute P{X = i} as P{i − 0.5 ≤ X ≤ i + 0.5} when applying the normal approximation (this is called the continuity correction).

Exercises Exercises (p. 322, 15;p. 321, 8)(p. 322, 15;p. 321, 8)

Let X be a binomial random variable with parameters n = 100 and p = 0.2. Approximate the following probabilities.(a) P{X ≤ 25}(b) P{X > 30}(c) P{15< X <22}

If 65 percent of the population of a certain community is in favor of a proposed increase in school taxes, find the approximate probability that a random sample of 100 people will contain(a) At least 45 who are in favor of the proposition(b) Fewer than 60 who are in favor(c) Between 55 and 75 who are in favor

32

33

Distribution of the Sample Variance of a Normal Population

34

Distribution of the Sample Variance of a Normal Population The expected value of a chi-squared random variable

The expected value of a chi-squared random variable is equal to its number of degrees of freedom.

35

Distribution of the Sample Variance of a Normal Population

36

Explanation Consider the standardized variables (Xi − μ)/σ , i = 1, . . . , n,

where μ is the population mean. The sum of their squares

has a chi-squared distribution with n degrees of freedom.

Substitute the sample mean for the population mean,

will remain a chi-squared random variable. However, it will lose 1 degree of freedom because the

population mean (μ) is replaced by its estimator (the sample mean).

37

KEY TERMSKEY TERMS A sample from a population distribution: If X1, . . . , Xn are

independent random variables having a common distribution F, we say that they constitute a sample from the population distribution F.

Statistic: A numerical quantity whose value is determined by the sample.

Sample mean Sample variance Central limit theorem: A theorem stating that the sum of a

sample of size n from a population will approximately have a normal distribution when n is large.

Random sample: A sample of n members of a population is a random sample if it is obtained in such a manner that each of the possible subsets of n members is equally likely to be the chosen sample.

Chi-squared distribution with n degrees of freedom: The distribution of the sum of the squares of n independent standard normals.

37

Documents

1 Chapter 7 Distributions of Sampling Statistics 1