9814272795_279573

Embed Size (px)

Citation preview

  • 8/12/2019 9814272795_279573

    1/8

    E

    xperimentalists use statistical calculations to sharpen their judgments

    concerning the quality of experimental measurements. In this web chapter,

    we consider several of the most common applications of statistical tests to

    the treatment of analytical results. These applications include:

    1. Estimating the probability that (a) an experimental mean and a true value or

    (b) two experimental means are different, that is, whether the difference is

    real or simply the result of random error. This test is particularly important for

    discovering systematic errors in a method and for determining whether two

    samples come from the same source.

    2. Deciding whether what appears to be an outlier in a set of replicate measure-

    ments is with a certain probability the result of a gross error and can thus be

    rejected or whether it is a legitimate result that must be retained in calculatingthe mean of the set.

    19A STATISTICAL AIDS TO HYPOTHESIS TESTING

    Much of scientific and engineering endeavor is based on hypothesis testing.

    Thus, to explain an observation, a hypothetical model is advanced and tested

    experimentally to determine its validity. If the results from these experiments do

    not support the model, we reject it and seek a new hypothesis. If agreement is

    found, the hypothetical model serves as the basis for further experi-ments.

    When the hypothesis is supported by sufficient experimental data, it becomes

    recognized as a useful theory until such time as data are obtained that refute it.Experimental results seldom agree exactly with those predicted from a theoret-

    ical model. Consequently, scientists and engineers frequently must judge whether

    a numerical difference is a manifestation of the random errors inevitable in all

    measurements. Certain statistical tests are useful in sharpening these judgments.

    Tests of this kind make use of a null hypothesis, which assumes that the

    numerical quantities being compared are, in fact, the same. The probability of the

    observed differences appearing as a result of random error is then computed from

    a probability distribution. Usually, if the observed difference is greater than or

    equal to the difference that would occur 5 times in 100 (the 5% probability level),

    Statistical Aids to HypothesisTesting and Gross Errors

    Web Chapter 19

    In statistics, a null hypothesis postulate

    that two observed quantities are the sam

  • 8/12/2019 9814272795_279573

    2/8

    the null hypothesis is considered questionable and the difference is judged to

    be significant. Other probability levels, such as 1 in 100 or 10 in 100, may also be

    adopted, depending on the certainty desired in the judgment. These probability

    levels are often called significance levels and are given the symbol in statistics.

    The confidence level as a percentage is related to and is given by (1 )

    100%.

    The kinds of testing that chemists use most often include the comparison of

    (1) the mean of an experimental data set with what is believed to be the true

    value ; (2) the means or the standard deviations 1 and 2 from two

    sets of data; and (3) the mean to a predicted or theoretical value. The sections

    that follow consider some of the methods for making these comparisons.

    19A-1 Comparing an Experimental Mean with the True Value

    A common way of testing for bias in an analytical method is to use the

    method to analyze a sample whose composition is accurately known. Bias in

    an analytical method is illustrated by the two curves shown in Figure 19-1,

    which show the frequency distribution of replicate results in the analysis of

    identical samples by two analytical methods having random errors of exactly

    the same size. Method A has no bias, so the population mean A is the true

    valuext. MethodB has a systematic error, or bias, that is given by

    (19-1)

    Note that bias effects all the data in the set in the same way and that it can be

    either positive or negative.

    In testing for bias by analyzing a sample whose analyte concentration is known

    exactly, it is likely that the experimental mean will differ from the accepted

    valuext as shown in the figure; the judgment must then be made whether this dif-ference is the consequence of random error or, alternatively, a systematic error.

    In treating this type of problem statistically, the difference is com-

    pared with the difference that could be caused by random error. If the observed

    difference is less than that computed for a chosen probability level, the null

    hypothesis that are the same cannot be rejected; that is, no significant

    systematic error has been demonstrated. It is important to realize, however, that

    this statement does not say that there is no systematic error; it says only that

    whatever systematic error is present is so small that it cannot be distinguished

    from random error. If is significantly larger than either the expected or

    the critical value, we may assume that the difference is real and that the system-

    atic error is significant.

    The critical value for rejecting the null hypothesis is calculated by rewritingEquation 3-18 (Chapter 3) in the form

    (19-2)

    whereNis the number of replicate measurements used in the test. If a good estimate

    of is available, Equation 19-2 can be modified by replacing twithz and s with .

    Example 19-1 illustrates the use of an hypothesis test to determine whether there is

    bias in a method.

    xxt ts

    N

    xxt

    xandxt

    xxt

    x

    bias B xt B A

    x

    x1andx2

    x

    Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors2

    Relativefrequency,

    dN/N

    Analytical result,xi

    tx=A B

    B

    bias

    A

    Figure 19-1 Illustration of bias:

    bias B xt B A

  • 8/12/2019 9814272795_279573

    3/8

    Example 19-1

    A new procedure for the rapid determination of sulfur in kerosenes was tested

    on a sample known from its method of preparation to contain 0.123% S (xt).

    The results were % S 0.112, 0.118, 0.115, and 0.119. Do the data indicate

    that there is bias in the method?

    From Table 3-6 (Chapter 3), we find that at the 95% confidence level, thas a

    value of 3.18 for three degrees of freedom. Thus, we can calculate a test value

    of tfrom our data and compare it to the values given in the tables at the desired

    confidence level. The values of tfrom the tables are often called critical values

    and symbolized tcrit . The test value is calculated from

    If we reject the null hypothesis at the confidence level chosen. The

    absolute value of tis used because we are interested in testing only that there

    is a difference between our mean and the true value and do not care about thesign of the difference. This type of test is often called a two-tailed test. In our

    case

    Since 4.375 3.18, the critical value of tat the 95% confidence level, we con-

    clude that a difference this large is significant and reject the null hypothesis.

    At the 99% confidence level, tcrit 5.84 (Table 3-6). Since 4.375 5.84, we

    would accept the null hypothesis at the 99% confidence level and conclude that

    there is no difference between the results. Note that the probability (significance)

    level (0.05 or 0.01) is the probability of making an error by rejecting the nullhypothesis.

    19A-2 Comparing Two Experimental Means

    The results of chemical analyses are frequently used to determine whether two

    materials are identical. Here, the chemist must judge whether a difference in the

    t0.007

    0.0032/4 4.375

    t tcrit,

    txxt

    s/N

    s 0.053854 (0.464)2/44 1 0.0000303 0.0032x2i 0.012544 0.013924 0.013225 0.014161 0.053854

    xxt 0.116 0.123 0.007% S

    x0.464

    4 0.116% S

    xi 0.112 0.118 0.115 0.119 0.464

    19A Statistical Aids To Hypothesis Testing

    The probability of a difference this large

    occurring because of only random errors

    can be obtained from the Excel function

    TDIST(x, deg_freedom, tails), wherexis

    the test value of t(4.375), deg_freedom is

    for our case, and tails 2. The result is

    TDIST(4.375,3,2) 0.022. Hence, it is

    only 2.2% probable to get a value this lar

    because of random errors. The critical

    value of tfor a given confidence level

    can be obtained in Excel from

    TINV(probability,deg_freedom). In

    our case TINV(0,05,3) 3.1825.

    If it was confirmed by further experiment

    that the method always gave low results,

    would say that the method had a negative

    bias.

    Even if a mean value is shown to be equa

    the true value at a given confidence level,

    we cannot conclude that there is no system

    atic error in the data.

  • 8/12/2019 9814272795_279573

    4/8

    Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors4

    means of two sets of identical analyses is real and constitutes evidence that the

    samples are different or whether the discrepancy is simply a consequence of

    random errors in the two sets. To illustrate, let us assume thatN1 replicate analy-

    ses of material 1 yielded a mean value of and that N2 analyses of material 2

    obtained by the same method gave a mean of If the data were collected in an

    identical way, it is usually safe to assume that the standard deviations of the two

    sets of measurements are the same. We can then modify Equation 19-2 to take

    into account that one set of results is being compared with a second rather than

    with the true mean of the data,xt.

    In this case, as with the previous one, we invoke the null hypothesis that the

    samples are identical and that the observed difference in the results, is

    the result of random errors. To test this hypothesis statistically, we modify

    Equation 19-2 in the following way. First, we substitute for xt, thus making

    the left side of the equation the numerical difference between the two means

    Since we know from Equation 6-5 that the standard deviation of the

    mean is

    and likewise for

    Thus, the variance of the difference between the means is

    given by

    By substituting the values of sd, sm1 , and sm2 into this equation, we have

    If we then assume that the pooled standard deviation spooled is a good estimate of

    both sm1 and sm2 , then

    and

    Substituting this equation into Equation 19-2 (and also forxt), we find that

    (19-3)

    or the test value of tis given by

    (19-4)

    We then compare our test value of t with the critical value obtained from the

    tx1 x2

    spooledN1 N2N1N2

    x1 x2 tspooledN1 N2N1N2

    x2

    sd

    N spooledN1 N2

    N1N2

    sdN

    2

    spooled

    N1

    2

    spooled

    N2

    2

    s2pooledN1 N2N1N2

    sd

    N2

    sm1

    N1 2

    sm2

    N2 2

    s2d s2m1 s

    2m2

    (dx1 x2)s2d

    sm2 s2

    N2

    x2,

    sm1

    s1

    N1

    x1

    x1 x2.

    x2

    (x1 x2),

    x2.

    x1

  • 8/12/2019 9814272795_279573

    5/8

    table for the particular confidence level desired. The number of degrees of free-

    dom for finding the critical value of tin Table 3-6 isN1 N2 2. If the absolute

    value of the test statistic is smaller than the critical value, the null hypothesis is

    accepted and no significant difference between the means has been demon-

    strated. A test value of tgreater than the critical value of tindicates that there is a

    significant difference between the means.

    If a good estimate of is available, Equation 19-3 can be modified by inserting

    z for tand for s.

    Example 19-2

    Two barrels of wine were analyzed for their alcohol content to determine whether

    they were from different sources. On the basis of six analyses, the average content

    of the first barrel was established to be 12.61% ethanol. Four analyses of the

    second barrel gave a mean of 12.53% alcohol. The ten analyses yielded a pooled

    value of s 0.070%. Do the data indicate a difference between the wines?

    Here we employ Equation 19-4 to calculate the test statistic t.

    The critical value of tat the 95% confidence level for 10 2 8 degrees of free-

    dom is 2.31. Since 1.771 2.31, we accept the null hypothesis at the 95% confi-

    dence level and conclude that there is no difference in the alcohol content of the

    wines. The probability of getting a tvalue of 1.771 may be calculated using the

    Excel function TDIST() and is TDIST(1.771,8,2) 0.11. Hence there is a better

    than 10% chance that a value this large could occur just because of random error.

    In Example 19-2, no significant difference between the alcohol content of the

    two wines was indicated at the 95% confidence level. Note that this statement is

    equivalent to saying that is equal to with a certain probability, but the tests do

    not prove that the wines come from the same source. Indeed, it is conceivable that

    one wine is a red and the other is a white. To establish with a reasonable probability

    that the two wines are from the same source would require extensive testing of other

    characteristics, such as taste, color, odor, and refractive index as well as tartaric acid,

    sugar, and trace element content. If no significant differences are revealed by all

    these tests and by others, it might be possible to judge the two wines as having a

    common origin. In contrast, the finding of one significant difference in any test

    would clearly show that the two wines are different. Thus, the establishment of a

    significant difference by a single test is much more revealing than the establishmentof an absence of difference.

    19B DETECTING GROSS ERRORS

    A data point that differs excessively from the mean in a data set is termed an out-

    lier. When a set of data contains an outlier, the decision must be made whether to

    retain or reject it. The choice of criterion for the rejection of a suspected result

    has its perils. If we set a stringent standard that makes the rejection of a question-

    able measurement difficult, we run the risk of retaining results that are spurious

    x2x1

    tx1 x2

    spooledN1 N2N1N2

    12.61 12.53

    0.076 46 4 1.771

    19B Detecting Gross Errors

    Outliers are the result of gross errors.

  • 8/12/2019 9814272795_279573

    6/8

  • 8/12/2019 9814272795_279573

    7/8

    19B Detecting Gross Errors

    For five measurements, Qcrit at the 90% confidence level is 0.64. Because

    0.54 0.64, we must retain the outlier at the 90% confidence level.

    19B-2 A Word of Caution about Rejecting Outliers

    Several other statistical tests have been developed to provide criteria for rejectionor retention of outliers. Such tests, like the Q test, assume that the distribution of

    the population data is normal, or Gaussian. Unfortunately, this condition cannot

    be proved or disproved for samples that have many fewer than 50 results.

    Consequently, statistical rules, which are perfectly reliable for normal distribu-

    tions of data, should be used with extreme caution when applied to samples con-

    taining only a few data. J. Mandel, in discussing treatment of small sets of data,

    writes, Those who believe that they can discard observations with statistical

    sanction by using statistical rules for the rejection of outliers are simply deluding

    themselves. 3 Thus, statistical tests for rejection should be used only as aids to

    common sense when small samples are involved.

    The blind application of statistical tests to retain or reject a suspect measure-

    ment in a small set of data is not likely to be much more fruitful than an arbitrary

    decision. The application of good judgment based on broad experience with an

    analytical method is usually a sounder approach. In the end, the only valid reason

    for rejecting a result from a small set of data is the sure knowledge that a mistake

    was made in the measurement process. Without this knowledge, a cautious

    approach to rejection of an outlier is wise.

    19B-3 How Do We Deal with Outliers?

    Recommendations for the treatment of a small set of results that contains a

    suspect value are:

    1. Re-examine carefully all data relating to the outlying result to see if a grosserror could have affected its value. This recommendation demands a properly

    kept laboratory notebook containing careful notations of all observations

    (see Section 18I).

    2. If possible, estimate the precision that can be reasonably expected from the

    procedure to be sure that the outlying result actually is questionable.

    3. Repeat the analysis if sufficient sample and time are available. Agreement

    between the newly acquired data and those of the original set that appear to be

    valid will lend weight to the notion that the outlying result should be rejected.

    Furthermore, if retention is still indicated, the questionable result will have a

    smaller effect on the mean of the larger set of data.

    4. If more data cannot be obtained, apply the Q test to the existing set to see if

    the doubtful result should be retained or rejected on statistical grounds.

    5. If the Q test indicates retention, consider reporting the median of the set

    rather than the mean. The median has the great virtue of allowing inclusion

    of all data in a set without undue influence from an outlying value. In addi-

    tion, the median of a normally distributed set containing three measurements

    provides a better estimate of the correct value than the mean of the set after

    the outlying value has been discarded.

    Use extreme caution when rejecting data

    any reason.

    3J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,

    Part I, Vol. 1 (New York: Wiley, 1978), p. 282.

  • 8/12/2019 9814272795_279573

    8/8

    Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors8

    19C QUESTIONS AND PROBLEMS

    19-1. Lord Rayleigh prepared nitrogen samples by

    several different methods. The density of each

    sample was measured as the mass of gas re-

    quired to fill a particular flask at a certain

    temperature and pressure. Masses of nitrogensamples prepared by decomposition of various

    nitrogen compounds were 2.29890, 2.29940,

    2.29849, and 2.30054 g. Masses of nitrogen

    prepared by removing oxygen from air in vari-

    ous ways were 2.31001, 2.31163, and 2.31028

    g. Is the density of nitrogen prepared from nitro-

    gen compounds significantly different from that

    prepared from air? What are the chances of the

    conclusion being in error? (Study of this differ-

    ence led to the discovery of the inert gases by

    Sir William Ramsey, Lord Rayleigh.)

    19-2. Apply the Q test to the following data sets to de-

    termine whether the outlying result should be

    retained or rejected at the 95% confidence level.(a) 41.27, 41.61, 41.84, 41.70

    (b) 7.295, 7.284, 7.388, 7.292

    19-3. Apply the Q test to the following data sets to

    determine whether the outlying result should be

    retained or rejected at the 95% confidence level.

    (a) 85.10, 84.62, 84.70

    (b) 85.10, 84.62, 84.65, 84.70