4
Statistics: Making Sense of Data Assignment 2: Answers Grading instructions: Below are model answers for each of the questions. For questions requiring some calculation, an excellent answer does not need to show all of the steps of the calculation. A correct final answer is sufficient for the calculation part of the question. Please rate each of the answers you are evaluating as one of: Excellent – Fully correct and addresses all required aspects of the solutions Good – Mostly correct, or correct but misses a small aspect of the solutions Fair – Partly correct but with a substantial error, or correct but misses an important aspect of the solutions Poor – Incorrect or misses the point of the question Please also provide some constructive written feedback for why you gave the answer the rating you did. Remember to be as positive and constructive as possible in your feedback. 1. Answer: The required test is a one-sided (with a less than alternative) test that the true proportion of people with the A variant is 0.5. The test statistic is -4.195. The p-value is the probability of getting a value smaller than this from a standard normal distribution, which is very small. So we have very strong evidence that the proportion of people with the A variant is less than one-half. The estimated value is 30% or 0.30. When giving feedback: Check if all of the required components of the test were addressed and that the conclusion is appropriate. An excellent answer notes that the evidence here is very strong and states the conclusion in practical terms, that is, doesn’t just conclude that we should reject the null hypothesis but goes on to conclude that less than half the population has the A variant. Calculation (not required to be shown) using numbers from Table 3: Calculation of the test statistic: ˆ p = 33/110 Test statistic = (ˆ p - 0.5)/ p .5(.5)/110 = -4.195 2. Answer: Although we have two proportions, which are estimates of proportion of the popula- tion with the A variant at each of the two positions, we cannot use the two-sample test of proportions used in this class because it requires that the two groups are independent. This is a matched pairs situation, but we haven’t learned a matched pairs test for the case of proportions. 1

yyz_A2ans

Embed Size (px)

DESCRIPTION

Selection

Citation preview

Page 1: yyz_A2ans

Statistics: Making Sense of DataAssignment 2: Answers

Grading instructions:Below are model answers for each of the questions. For questions requiring some calculation,an excellent answer does not need to show all of the steps of the calculation. A correctfinal answer is sufficient for the calculation part of the question. Please rate each of theanswers you are evaluating as one of:• Excellent – Fully correct and addresses all required aspects of the solutions• Good – Mostly correct, or correct but misses a small aspect of the solutions• Fair – Partly correct but with a substantial error, or correct but misses an importantaspect of the solutions• Poor – Incorrect or misses the point of the questionPlease also provide some constructive written feedback for why you gave the answer therating you did. Remember to be as positive and constructive as possible in your feedback.

1. Answer:The required test is a one-sided (with a less than alternative) test that the trueproportion of people with the A variant is 0.5. The test statistic is −4.195. Thep-value is the probability of getting a value smaller than this from a standard normaldistribution, which is very small. So we have very strong evidence that the proportionof people with the A variant is less than one-half. The estimated value is 30% or0.30.

When giving feedback:Check if all of the required components of the test were addressed and that theconclusion is appropriate. An excellent answer notes that the evidence here is verystrong and states the conclusion in practical terms, that is, doesn’t just conclude thatwe should reject the null hypothesis but goes on to conclude that less than half thepopulation has the A variant.

Calculation (not required to be shown) using numbers from Table 3:Calculation of the test statistic: p̂ = 33/110Test statistic = (p̂− 0.5)/

√.5(.5)/110 = −4.195

2. Answer:Although we have two proportions, which are estimates of proportion of the popula-tion with the A variant at each of the two positions, we cannot use the two-sampletest of proportions used in this class because it requires that the two groups areindependent. This is a matched pairs situation, but we haven’t learned a matchedpairs test for the case of proportions.

1

Page 2: yyz_A2ans

When giving feedback:The key point here is that the proportions quoted (as percentages) in the questionare not based on independent samples of people.

3. Answer:The confidence interval for the difference in proportions of the A variant betweenmales and females is [−0.193, 0.149]. Since the confidence interval includes 0, it isconsistent with no difference in the proportion of the A variant between sexes.

(Note that the confidence interval above is for pM − pF where pM and pF are theproportions of males and females, respectively, with the A variant. It is perfectly fineto calculate the confidence interval for pF − pM instead, which would result in aninterval of [−0.149, 0.193] and the same conclusion.)

When giving feedback:If the confidence interval is miscalculated, check the conclusion. If the confidenceinterval includes 0, the conclusion should be no evidence of a difference in the pro-portions between males and females. If the confidence interval does not include 0, theconclusion should be that there is evidence of a difference in the proportions betweenmales and females.

Calculation (not required to be shown):From Table 5, the estimate of the proportion of males with the A variant is 15/52and the estimate of the proportion of females with the variant is 18/58.The confidence interval is:15/52−18/58±1.96×

√(15/52)(1− 15/52)/52 + (18/58)(1− 18/58)/58 = [−0.193, 0.149]

4. Answer:The required test is a two-sided test that the mean of HDL is the same for males andfemales. The test statistic is 3.75. The p-value is the probability of being greaterthan 3.75 plus the probability of being less than −3.75 for a t-distribution with 108degrees of freedom. Even without using a computer, we know this will be smallsince this is a reasonably large number of degrees of freedom, and thus the requiredt-distribution values are fairly close to the normal distribution values, and 3.75 is farin the tails of a standard normal distribution. So we have strong evidence to rejectthe null hypothesis and we conclude that the mean HDL is not the same for men andwomen. From our data we see that women have a mean of 1.263 while men have amean of 1.087.

(Note that when calculating the test statistic, the solution above (and calculationbelow) subtract the average for males from the average for females. It is fine tocalculate the average for males from the average for females, which would result in atest statistic of −3.75 and the same conclusion.)

2

Page 3: yyz_A2ans

When giving feedback:If the test statistic is miscalculated, follow the reasoning to see if the estimate ofthe p-value and conclusion make sense. An excellent answer states the conclusion inpractical terms, that is, doesn’t just conclude that we should reject the null hypothesisbut goes on to conclude that the mean HDL differs between males and females.

Calculation (not required to be shown) using numbers from Table 4:The pooled estimate of the variance is:(57(0.24942) + 51(0.24112))/(57 + 51) = 0.060278Calculation of the test statistic: (1.263− 1.087)/

√0.060278(1/58 + 1/52) = 3.75

5. Answer:Looking at all of the data, the correlation between HDL and PUFA is 0.030 whichquite close to 0 so there does not seem to be a relationship between them. Thiscan be seen in the scatterplot (Figure 3(a)) which appears to be just random scatterand the slope of the regression line is not statistically significantly different from 0(p-value=0.755 from Table 7).

If we look at subjects without an A at position 308, there is also no indication of arelationship (correlation= 0.156, Figure 4(a) looks like random scatter, the p-valuefor the slope in the regression is 0.175).

However, for subjects with the A variant at position 308, is there is some weakevidence of a negative relationship. The correlation is −0.310, the scatterplot (Figure5(a)) shows some evidence of a downward linear trend, and there is weak evidence(p-value= 0.079) that the slope of the regression is not 0. However, this relationshipis in the opposite direction to what is expected.

Looking at the scatterplots, and residual plots, there is no evidence that a modelother than linear would be appropriate. In particular, all of the residual plots looklike random scatter. So we have no concerns that another model, such as somethingother than a line or using transformed data, might better describe the relationship.

When giving feedback:A good answer includes all the points in the sample answer. However, since therelationship between HDL and PUFA for subjects with the A variant is quite weak(p-value= 0.079), a good answer might understate this relationship more than thesample answer does.

6. Answer:There are clearly many possible analyses and many possible statistical tests thatcould have been carried out when investigating these data. When many tests arecarried out, we expect to find some significant results just by chance, even if the null

3

Page 4: yyz_A2ans

hypothesis is always true. So we must be cautious when doing exploratory analysessuch as this not to overstate the importance of any significant findings, as they maybe Type I errors.

When giving feedback:A good answer notes the potential of making Type I errors. An excellent answermight also note that this part of the study is observational, so no causal conclusionscan be made. There is lots of potential for confounding variables. Another goodpoint that could be made for this question: it would be worthwhile carrying out amore complicated analysis that controls for variables such as sex (or BMI, or someof the other dietary components.)

4