9814272795_279573

8/12/2019 9814272795_279573

1/8

E

xperimentalists use statistical calculations to sharpen their judgments

concerning the quality of experimental measurements. In this web chapter,

we consider several of the most common applications of statistical tests to

the treatment of analytical results. These applications include:

1. Estimating the probability that (a) an experimental mean and a true value or

(b) two experimental means are different, that is, whether the difference is

real or simply the result of random error. This test is particularly important for

discovering systematic errors in a method and for determining whether two

samples come from the same source.

2. Deciding whether what appears to be an outlier in a set of replicate measure-

ments is with a certain probability the result of a gross error and can thus be

rejected or whether it is a legitimate result that must be retained in calculatingthe mean of the set.

19A STATISTICAL AIDS TO HYPOTHESIS TESTING

Much of scientific and engineering endeavor is based on hypothesis testing.

Thus, to explain an observation, a hypothetical model is advanced and tested

experimentally to determine its validity. If the results from these experiments do

not support the model, we reject it and seek a new hypothesis. If agreement is

found, the hypothetical model serves as the basis for further experi-ments.

When the hypothesis is supported by sufficient experimental data, it becomes

recognized as a useful theory until such time as data are obtained that refute it.Experimental results seldom agree exactly with those predicted from a theoret-

ical model. Consequently, scientists and engineers frequently must judge whether

a numerical difference is a manifestation of the random errors inevitable in all

measurements. Certain statistical tests are useful in sharpening these judgments.

Tests of this kind make use of a null hypothesis, which assumes that the

numerical quantities being compared are, in fact, the same. The probability of the

observed differences appearing as a result of random error is then computed from

a probability distribution. Usually, if the observed difference is greater than or

equal to the difference that would occur 5 times in 100 (the 5% probability level),

Statistical Aids to HypothesisTesting and Gross Errors

Web Chapter 19

In statistics, a null hypothesis postulate

that two observed quantities are the sam

8/12/2019 9814272795_279573

2/8

the null hypothesis is considered questionable and the difference is judged to

be significant. Other probability levels, such as 1 in 100 or 10 in 100, may also be

adopted, depending on the certainty desired in the judgment. These probability

levels are often called significance levels and are given the symbol in statistics.

The confidence level as a percentage is related to and is given by (1 )

100%.

The kinds of testing that chemists use most often include the comparison of

(1) the mean of an experimental data set with what is believed to be the true

value ; (2) the means or the standard deviations 1 and 2 from two

sets of data; and (3) the mean to a predicted or theoretical value. The sections

that follow consider some of the methods for making these comparisons.

19A-1 Comparing an Experimental Mean with the True Value

A common way of testing for bias in an analytical method is to use the

method to analyze a sample whose composition is accurately known. Bias in

an analytical method is illustrated by the two curves shown in Figure 19-1,

which show the frequency distribution of replicate results in the analysis of

identical samples by two analytical methods having random errors of exactly

the same size. Method A has no bias, so the population mean A is the true

valuext. MethodB has a systematic error, or bias, that is given by

(19-1)

Note that bias effects all the data in the set in the same way and that it can be

either positive or negative.

In testing for bias by analyzing a sample whose analyte concentration is known

exactly, it is likely that the experimental mean will differ from the accepted

valuext as shown in the figure; the judgment must then be made whether this dif-ference is the consequence of random error or, alternatively, a systematic error.

In treating this type of problem statistically, the difference is com-

pared with the difference that could be caused by random error. If the observed

difference is less than that computed for a chosen probability level, the null

hypothesis that are the same cannot be rejected; that is, no significant

systematic error has been demonstrated. It is important to realize, however, that

this statement does not say that there is no systematic error; it says only that

whatever systematic error is present is so small that it cannot be distinguished

from random error. If is significantly larger than either the expected or

the critical value, we may assume that the difference is real and that the system-

atic error is significant.

The critical value for rejecting the null hypothesis is calculated by rewritingEquation 3-18 (Chapter 3) in the form

(19-2)

whereNis the number of replicate measurements used in the test. If a good estimate

of is available, Equation 19-2 can be modified by replacing twithz and s with .

Example 19-1 illustrates the use of an hypothesis test to determine whether there is

bias in a method.

xxt ts

N

xxt

xandxt

xxt

x

bias B xt B A

x

x1andx2

x

Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors2

Relativefrequency,

dN/N

Analytical result,xi

tx=A B

B

bias

A

Figure 19-1 Illustration of bias:

bias B xt B A

8/12/2019 9814272795_279573

3/8

Example 19-1

A new procedure for the rapid determination of sulfur in kerosenes was tested

on a sample known from its method of preparation to contain 0.123% S (xt).

The results were % S 0.112, 0.118, 0.115, and 0.119. Do the data indicate

that there is bias in the method?

From Table 3-6 (Chapter 3), we find that at the 95% confidence level, thas a

value of 3.18 for three degrees of freedom. Thus, we can calculate a test value

of tfrom our data and compare it to the values given in the tables at the desired

confidence level. The values of tfrom the tables are often called critical values

and symbolized tcrit . The test value is calculated from

If we reject the null hypothesis at the confidence level chosen. The

absolute value of tis used because we are interested in testing only that there

is a difference between our mean and the true value and do not care about thesign of the difference. This type of test is often called a two-tailed test. In our

case

Since 4.375 3.18, the critical value of tat the 95% confidence level, we con-

clude that a difference this large is significant and reject the null hypothesis.

At the 99% confidence level, tcrit 5.84 (Table 3-6). Since 4.375 5.84, we

would accept the null hypothesis at the 99% confidence level and conclude that

there is no difference between the results. Note that the probability (significance)

level (0.05 or 0.01) is the probability of making an error by rejecting the nullhypothesis.

19A-2 Comparing Two Experimental Means

The results of chemical analyses are frequently used to determine whether two

materials are identical. Here, the chemist must judge whether a difference in the

t0.007

0.0032/4 4.375

t tcrit,

txxt

s/N

s 0.053854 (0.464)2/44 1 0.0000303 0.0032x2i 0.012544 0.013924 0.013225 0.014161 0.053854

xxt 0.116 0.123 0.007% S

x0.464

4 0.116% S

xi 0.112 0.118 0.115 0.119 0.464

19A Statistical Aids To Hypothesis Testing

The probability of a difference this large

occurring because of only random errors

can be obtained from the Excel function

TDIST(x, deg_freedom, tails), wherexis

the test value of t(4.375), deg_freedom is

for our case, and tails 2. The result is

TDIST(4.375,3,2) 0.022. Hence, it is

only 2.2% probable to get a value this lar

because of random errors. The critical

value of tfor a given confidence level

can be obtained in Excel from

TINV(probability,deg_freedom). In

our case TINV(0,05,3) 3.1825.

If it was confirmed by further experiment

that the method always gave low results,

would say that the method had a negative

bias.

Even if a mean value is shown to be equa

the true value at a given confidence level,

we cannot conclude that there is no system

atic error in the data.

8/12/2019 9814272795_279573

4/8


means of two sets of identical analyses is real and constitutes evidence that the

samples are different or whether the discrepancy is simply a consequence of

random errors in the two sets. To illustrate, let us assume thatN1 replicate analy-

ses of material 1 yielded a mean value of and that N2 analyses of material 2

obtained by the same method gave a mean of If the data were collected in an

identical way, it is usually safe to assume that the standard deviations of the two

sets of measurements are the same. We can then modify Equation 19-2 to take

into account that one set of results is being compared with a second rather than

with the true mean of the data,xt.

In this case, as with the previous one, we invoke the null hypothesis that the

samples are identical and that the observed difference in the results, is

the result of random errors. To test this hypothesis statistically, we modify

Equation 19-2 in the following way. First, we substitute for xt, thus making

the left side of the equation the numerical difference between the two means

Since we know from Equation 6-5 that the standard deviation of the

mean is

and likewise for

Thus, the variance of the difference between the means is

given by

By substituting the values of sd, sm1 , and sm2 into this equation, we have

If we then assume that the pooled standard deviation spooled is a good estimate of

both sm1 and sm2 , then

and

Substituting this equation into Equation 19-2 (and also forxt), we find that

(19-3)

or the test value of tis given by

(19-4)

We then compare our test value of t with the critical value obtained from the

tx1 x2

spooledN1 N2N1N2

x1 x2 tspooledN1 N2N1N2

x2

sd

N spooledN1 N2

N1N2

sdN

2

spooled

N1

2

spooled

N2

2

s2pooledN1 N2N1N2

sd

N2

sm1

N1 2

sm2

N2 2

s2d s2m1 s

2m2

(dx1 x2)s2d

sm2 s2

N2

x2,

sm1

s1

N1

x1

x1 x2.

x2

(x1 x2),

x2.

x1

8/12/2019 9814272795_279573

5/8

table for the particular confidence level desired. The number of degrees of free-

dom for finding the critical value of tin Table 3-6 isN1 N2 2. If the absolute

value of the test statistic is smaller than the critical value, the null hypothesis is

accepted and no significant difference between the means has been demon-

strated. A test value of tgreater than the critical value of tindicates that there is a

significant difference between the means.

If a good estimate of is available, Equation 19-3 can be modified by inserting

z for tand for s.

Example 19-2

Two barrels of wine were analyzed for their alcohol content to determine whether

they were from different sources. On the basis of six analyses, the average content

of the first barrel was established to be 12.61% ethanol. Four analyses of the

second barrel gave a mean of 12.53% alcohol. The ten analyses yielded a pooled

value of s 0.070%. Do the data indicate a difference between the wines?

Here we employ Equation 19-4 to calculate the test statistic t.

The critical value of tat the 95% confidence level for 10 2 8 degrees of free-

dom is 2.31. Since 1.771 2.31, we accept the null hypothesis at the 95% confi-

dence level and conclude that there is no difference in the alcohol content of the

wines. The probability of getting a tvalue of 1.771 may be calculated using the

Excel function TDIST() and is TDIST(1.771,8,2) 0.11. Hence there is a better

than 10% chance that a value this large could occur just because of random error.

In Example 19-2, no significant difference between the alcohol content of the

two wines was indicated at the 95% confidence level. Note that this statement is

equivalent to saying that is equal to with a certain probability, but the tests do

not prove that the wines come from the same source. Indeed, it is conceivable that

one wine is a red and the other is a white. To establish with a reasonable probability

that the two wines are from the same source would require extensive testing of other

characteristics, such as taste, color, odor, and refractive index as well as tartaric acid,

sugar, and trace element content. If no significant differences are revealed by all

these tests and by others, it might be possible to judge the two wines as having a

common origin. In contrast, the finding of one significant difference in any test

would clearly show that the two wines are different. Thus, the establishment of a

significant difference by a single test is much more revealing than the establishmentof an absence of difference.

19B DETECTING GROSS ERRORS

A data point that differs excessively from the mean in a data set is termed an out-

lier. When a set of data contains an outlier, the decision must be made whether to

retain or reject it. The choice of criterion for the rejection of a suspected result

has its perils. If we set a stringent standard that makes the rejection of a question-

able measurement difficult, we run the risk of retaining results that are spurious

x2x1

tx1 x2

spooledN1 N2N1N2

12.61 12.53

0.076 46 4 1.771

19B Detecting Gross Errors

Outliers are the result of gross errors.

8/12/2019 9814272795_279573

6/8

8/12/2019 9814272795_279573

7/8

19B Detecting Gross Errors

For five measurements, Qcrit at the 90% confidence level is 0.64. Because

0.54 0.64, we must retain the outlier at the 90% confidence level.

19B-2 A Word of Caution about Rejecting Outliers

Several other statistical tests have been developed to provide criteria for rejectionor retention of outliers. Such tests, like the Q test, assume that the distribution of

the population data is normal, or Gaussian. Unfortunately, this condition cannot

be proved or disproved for samples that have many fewer than 50 results.

Consequently, statistical rules, which are perfectly reliable for normal distribu-

tions of data, should be used with extreme caution when applied to samples con-

taining only a few data. J. Mandel, in discussing treatment of small sets of data,

writes, Those who believe that they can discard observations with statistical

sanction by using statistical rules for the rejection of outliers are simply deluding

themselves. 3 Thus, statistical tests for rejection should be used only as aids to

common sense when small samples are involved.

The blind application of statistical tests to retain or reject a suspect measure-

ment in a small set of data is not likely to be much more fruitful than an arbitrary

decision. The application of good judgment based on broad experience with an

analytical method is usually a sounder approach. In the end, the only valid reason

for rejecting a result from a small set of data is the sure knowledge that a mistake

was made in the measurement process. Without this knowledge, a cautious

approach to rejection of an outlier is wise.

19B-3 How Do We Deal with Outliers?

Recommendations for the treatment of a small set of results that contains a

suspect value are:

1. Re-examine carefully all data relating to the outlying result to see if a grosserror could have affected its value. This recommendation demands a properly

kept laboratory notebook containing careful notations of all observations

(see Section 18I).

2. If possible, estimate the precision that can be reasonably expected from the

procedure to be sure that the outlying result actually is questionable.

3. Repeat the analysis if sufficient sample and time are available. Agreement

between the newly acquired data and those of the original set that appear to be

valid will lend weight to the notion that the outlying result should be rejected.

Furthermore, if retention is still indicated, the questionable result will have a

smaller effect on the mean of the larger set of data.

4. If more data cannot be obtained, apply the Q test to the existing set to see if

the doubtful result should be retained or rejected on statistical grounds.

5. If the Q test indicates retention, consider reporting the median of the set

rather than the mean. The median has the great virtue of allowing inclusion

of all data in a set without undue influence from an outlying value. In addi-

tion, the median of a normally distributed set containing three measurements

provides a better estimate of the correct value than the mean of the set after

the outlying value has been discarded.

Use extreme caution when rejecting data

any reason.

3J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,

Part I, Vol. 1 (New York: Wiley, 1978), p. 282.

8/12/2019 9814272795_279573

8/8


19C QUESTIONS AND PROBLEMS

19-1. Lord Rayleigh prepared nitrogen samples by

several different methods. The density of each

sample was measured as the mass of gas re-

quired to fill a particular flask at a certain

temperature and pressure. Masses of nitrogensamples prepared by decomposition of various

nitrogen compounds were 2.29890, 2.29940,

2.29849, and 2.30054 g. Masses of nitrogen

prepared by removing oxygen from air in vari-

ous ways were 2.31001, 2.31163, and 2.31028

g. Is the density of nitrogen prepared from nitro-

gen compounds significantly different from that

prepared from air? What are the chances of the

conclusion being in error? (Study of this differ-

ence led to the discovery of the inert gases by

Sir William Ramsey, Lord Rayleigh.)

19-2. Apply the Q test to the following data sets to de-

termine whether the outlying result should be

retained or rejected at the 95% confidence level.(a) 41.27, 41.61, 41.84, 41.70

(b) 7.295, 7.284, 7.388, 7.292

19-3. Apply the Q test to the following data sets to

determine whether the outlying result should be

retained or rejected at the 95% confidence level.

(a) 85.10, 84.62, 84.70

(b) 85.10, 84.62, 84.65, 84.70

Documents

9814272795_279573