27
生生生生生生生生生 Distributions 生生 生生生 生生993002014 生生生生 生生生

生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Embed Size (px)

Citation preview

Page 1: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

生醫統計學期末報告

Distributions

學生:劉俊成

學號: 993002014

授課老師:蔡章仁

Page 2: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Negative Binomial and Geometric Distributions

Under the same assumptions as for the binomial distribution, let x and y be discrete random variables. The pdf for the negative binomial distribution is the probability of getting x failures before y successes where p = the probability of success on any single trial; i.e.

The geometric distribution is a special case of the negative binomial distribution, where y = 1, namely:

This represents the probability of getting x failures before the first success.

Page 3: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Negative Binomial and Geometric Distributions

Excel Functions: Excel provides the following function regarding the negative binomial distribution:

NEGBINOMDIST(x, y, p) = the probability of getting x failures before y successes where p = the probability of success on any single trial; i.e. the pdf of the negative binomial distribution.

Excel 2010/2013 provide the following additional function: NEGBINOM.DIST(x, y, p, cum) where cum takes the values TRUE or FALSE. In particular, NEGBINOM.DIST(x, y, p, FALSE) = NEGBINOMDIST(x, y, p), while NEGBINOM.DIST(x, y, p, TRUE) = the probability of getting at least x failures before y successes, where p = the probability of success on any single trial; i.e. the cumulative probability function.

Page 4: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Negative Binomial and Geometric Distributions

Observation: The geometric distribution is memoryless, which means that if you intend to repeat an experiment until the first success, then, given that the first success has not yet occurred, the conditional probability distribution of the number of additional trials required until the first success does not depend on how many failures have already occurred. The die one throws or the coin one tosses does not have a “memory” of these failures. The geometric distribution is in fact the only memoryless discrete distribution.

The cumulative distribution function of the geometric distribution is

Page 5: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Negative Binomial and Geometric Distributions

Other key statistical properties of the geometric distribution are:

Mean = (1 – p) ⁄ pMode = 0Range = [0, ∞)Variance = (1 – p) ⁄ p^2Skewness = (2 – p) ⁄ (1-p)^0.5Kurtosis = (6 + p^2) ⁄ (1 – p)

Page 6: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Hypergeometric Distribution

Under the same assumptions as for the binomial distribution, from a population of size m of which k are successes, a sample of size n is drawn. Let x be a random variable whose value is the number of successes in the sample. The pdf for x, called the hypergeometric distribution, is given by

Observations: Let p = k/m. Then the situation is the same as for the binomial distribution B(n, p) except that in the binomial case after each trial the selection (success or failure) is put back in the population, while in the hypergeometric case the selection is not put back and so can’t be drawn again. When n is big the hypergeometric and bionomial distributions yield more or less the same result, but this is not necessarily true for small samples.

Page 7: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Hypergeometric Distribution

Excel Functions: Excel provides the following function:

HYPERGEOMDIST(x, n, k, m) = the probability of getting x successes from a sample of size n, where the population has size m of which k are successes; i.e. the pdf of the hypergeometric distribution.

Excel 2010/2013 provide the following additional function: HYPERGEOM.DIST(x ,n, k, m, cum) where cum takes the values TRUE or FALSE. HYPERGEOM.DIST(x, n, k, m, FALSE) = HYPERGEOMDIST(x, n, k, m), while HYPERGEOM.DIST(x, n, k, m, TRUE) = the probability of getting at most x successes from a sample of size n, where the population has size m of which k are successes; i.e. the cumulative probability function.

Page 8: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example

Mary and Jane both attend the same university, but don’t know each other. Each has about 200 friends at the university. Assuming that each of these groups of friends represents a random sample from the 50,000 students who attend the university, what is the probability that Mary and Jane will have at least one friend in common.

It turns out that this problem is equivalent to picking 200 balls at random (representing Mary’s friends) from a bag containing 49,998 balls (representing the 50,000 students less Mary and Jane), 200 of which are blue (representing Jane’s friends), and getting at least one blue ball. We first calculate the probability that none of the balls will be blue as follows:

HYPERGEOMDIST(0, 200, 200, 49998) = .448

Thus the answer is 1 – .448 = 55.2%.

Page 9: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Beta Distribution

For the binomial distribution the number of successes x is the random variable and the number of trials n and the probability of success p on any single trial are parameters (i.e. constants). Instead we would like to view the probability of success on any single trial as the random variable, and the number of trials n and the total number of successes in n trials as constants.

Let α = # of successes in n trials and β = # of failures in n trials (and so α + β = n). The pdf for x = the probability of success on any single trial is given by

Page 10: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Beta Distribution

This is a special case of the beta function

where Γ is the gamma function.

Page 11: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Beta Distribution

Excel Functions: Excel provides the following functions:

BETADIST(x, α, β) = the cumulative distribution function F(x) at x for the pdf given above.

BETAINV(p, α, β) = x such that BETADIST(x, α, β) = p. Thus BETAINV is the inverse of BETADIST.

Excel 2010/2013 provide the following two additional functions: BETA.INV which is equivalent to BETAINV and BETA.DIST(x, α, β, cum) where cum takes the values TRUE or FALSE and BETA.DIST(x, α, β, TRUE) = BETADIST(x, α, β) while BETA.DIST(x, α, β, FALSE) is the pdf of the beta distribution at x (as described above).

Page 12: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 1

A lottery organization claims that at least one out of every ten people wins. Of the last 500 lottery tickets sold 37 were winners. Based on this sample, what is the probability that the lottery organization’s claim is true: namely players have at least a 10% probability of buying a winning ticket? What is the 95% confidence interval?

To answer the first question we use the cumulative beta distribution function as follows:

BETADIST(.1, 37, 463, TRUE) = 98.1%

This represents that organization’s claim is false (i.e. less than 10% probability of success). Thus the probability that the organization’s claim is true is 100% – 98.1% = 1.9%

Page 13: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 1(cont.)

The lower bound of the 95% confidence interval is

BETAINV(.025, 37, 463) = 5.3%

The upper bound of the 95% confidence interval is

BETAINV(.975, 37, 463) = 9.8%

Since 10% is not in the 95% confidence (5.3%, 9.8%), we conclude (with 95% confidence) that the lottery’s claim is not accurate.

Page 14: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Multnomial Distribution

Given an experiment with the following characteristics:

the experiment consists of n independent trials, each with k mutually exclusive outcomes Eifor each trial the probability of outcome Ei is piLet x1 …, xk be discrete random variables whose values are the number of times outcome Ei occurs in n trials. Then the probability distribution function for x1 …, xk is called the multinomial distribution and is defined as follows:

Here

The case where k=2 is equivalent to the binomail distribution.

Page 15: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 1

Suppose that a bag contains 8 balls: 3 red, 1 green and 4 blue. You reach in the bag pull out a ball at random and then put the ball back in the bag and pull out another ball. This experiment is repeated a total of 10 times. What is the probability that the outcome will result in exactly 4 reds and 6 blues?

The possible outcomes for each trial in this experiment are E1 = a red ball is drawn, E2 = a green ball is drawn and E3 = a blue ball is drawn. Thus p1 = 3/8, p2 = 1/8 and p3 = 4/8, x1 = 4, x2 = 0 and x3 = 6.

Page 16: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Multnomial Distribution

Excel Function: While Excel does not provide a function for the multinomial distribution, it does provide the following function:

MULTINOMIAL(x1 …, xk) = n! / (x1!∙…∙xk!)

Thus we could also calculate the answer to Example 9.10 by using the formula

MULTINOMIAL(4,0,6)*(3/8)^4*(1/8)^0*(4/8)^6 = .064888

We can also use a range as the argument of MULTINOMIAL as in Figure 1.

Page 17: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Multnomial Distribution

Figure 1.

Page 18: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Multnomial Distribution

Actually, we can use the following more complicated Excel formula to calculate the same result:

=B9*EXP(SUMPRODUCT(B3:B5,LN(B6:B8)))

Real Statistics Excel Function: The following supplemental function in the Real Statistics Resource Pack can be used to calculate the multinomial distribution.

MULTINOMDIST(R1, R2) = the value of the multinomial pdf where R1 is a range containing the values x1, …, xk and R2 is a range containing the values p1, …, pk

Referring to Figure 1, we have MULTINOMDIST(B3:B5,B6:B8) = 0.064888.

Page 19: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Poisson Distribution

The Poisson distribution has pdf given by

The parameter μ is often replaced by λ.

Observation: Some key statistical properties of the Poisson distribution are:

Mean = µMedian = µSkewness = 1 /\! \sqrt{\mu}Kurtosis = 1/µ

Page 20: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Poisson Distribution

Excel Function: Excel provides the following function for the Poisson distribution:

POISSON(x, μ, cum) where μ = the mean of the distribution and cum takes the values TRUE and FALSE

POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ.

POISSON(x, μ, TRUE) = cumulative probability distribution function F(x) at the value x for the Poisson distribution with mean μ.

Excel 2010/2013 provide the additional function POISSON.DIST which is equivalent to POISSON.

Page 21: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Theorem 1

If the probability p of success of a single trial approaches 0 while the number of trials n approaches infinity and the value μ = np stays fixed, then the binomial distribution B(n, p) approaches the Poisson distribution with mean μ.

Observation: Based on Theorem 1 the Poisson distribution can be used to estimate the binomial distribution when n ≥ 50 and p ≤ .01, preferably with np ≤ 5.

Page 22: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 1

A company produces high precision bolts so that the probability of a defect is .05%. In a sample of 4,000 units what is the probability of having more than 3 defects?

We can solve this problem using the distribution B(4000, .0005), namely the desired probability is

1 – BINOMDIST(3, 4000, .0006, TRUE) = 1 – 0.857169 = 0.142831

We can also use the Poisson approximation as follows:

μ = np = 4000(.0005) = 2

1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877

As you can see the approximation is quite accurate.

Page 23: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 1(cont.)

Observation: If the average number of occurrences of a particular event in an hour (or some other unit of time) is μ and the arrival times are random without any tendency to bunch up then the probability of x events occurring in an hour is given by

Page 24: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 2

A large department store sells on average 100 MP3 players a week. Assuming that purchases are as described in the above observation, what is the probability that the store will run out of MP3 players in a week if they stock 120 players? How many MP3 players should the store stock in order to make sure that it has a 99% probability of being able to supply a week’s demand?

The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331

Page 25: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Example 2(cont.)

Thus, the answer to the first problem is 97.7%. We can answer the second question by using successive approximations until we arrive at the correct answer. E.g. we could try x = 130, which is higher than 120. The cumulative Poisson is 0.998293, which is too high. We then pick x = 125 (halfway between 120 and 130). This yields 0.993202, which is a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally arrive at 124 which has cumulative Poisson of 0.991226.

Observation: We have observed that under the appropriate conditions the binomial distribution can be approximated by either the Poisson or normal distribution. We conclude this section by stating that the Poisson distribution can be approximated by the normal distribution.

Page 26: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

Theorem 2

For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution with mean μ, then x ~ N(μ, μ).

Page 27: 生醫統計學期末報告 Distributions 學生 : 劉俊成 學號 : 993002014 授課老師 : 蔡章仁

The End

Thanks for your atttention!