Statistics Exploring Chemical Analysis Exploring Chemical Analysis Fourth Edition 4 歐亞書局

Preview:

Citation preview

StatisticsStatistics

ExploringExploring ChemicalChemical AnalysisAnalysis

Fourth Edition

44

歐亞書局

4-1 The Gaussian Distribution

The example given in Figure 4-1 introduces the Gaussian distribution.

As shown in Figure 4-1, neurotransmitters bind to membrane proteins of the muscle cell and open up channels that permit cations to diffuse into the cell.

(a) In the absence of neurotransmitter, the ion channel is closed and cations cannot enter the muscle cell.

(b) In the presence of neurotransmitter, the channel opens, cations enter the cell, and muscle action is initiated.

Figure 4-2Figure 4-2 Observed cation current passing through individual channels of a frog muscle cell.

P.84

Of the 922 ion-channel responses recorded, 190 are in narrow range fr

om 2.64 to 2.68 pA.

The smooth, bellshaped curve superimposed on the data is

Gaussian distribution.

Mean and Standard DeviationMean and Standard Deviation

A Gaussian distribution is characterized by mean standard deviation.

The arithmetic mean, x, also called the average, is the sum of the measured values divided by the number of measurements.

mean (average)

The standard deviation, s, is a measure of the width of the distribution.

The smaller the standard deviation, the narrower the distribution.

P.84

degrees of freedom

degrees of freedom

Initially we have n independent data points, which represent n pieces of information.

After computing the average, there are only n-1 independent pieces of information.

Figure 4-3

P.85

Figure 4-3 Gaussian curves showing theeffect of doubling the standard deviation.

If the standard deviation were doubled

the Gaussian curve for the same number of observations would be shorter and broader.

* The total number of observations is fixed

P.85

Relative standard deviation

The relative standard deviation is the standard deviation divided by the average.

%4.3%100)670.2

090.0(

standard deviation

averagerelative standard deviation

Variance is the square ( 平方值 ) of the standard deviation.

Variance = 0.0902 = 0.081

Variance

ExampleExample: Mean and Standard DeviationMean and Standard Deviation

Find the mean, standard deviation, and relative standard deviation for the set of measurements (7, 18, 10, 15).

SOLUTION:SOLUTION:

P.85

Standard Deviation and ProbabilityStandard Deviation and Probability

In an ideal Gaussian distribution,

68.3% of the measurement lie within μ± 1σ region

95.5% of the measurement lie within μ± 2σ region

99.7% of the measurement lie within μ± 3σ region

對任何的常態分佈而言 平均數上下 0.5 個標準差之間的面積為 0.38 上下 1個標準差之間的面積為 0.68 上下 1.645 個標準差之間的面積為 0.90 上下 1.96 個標準差之間的面積為 0.95 上下 3個標準差之間的面積為 0.997

Standard Deviation and ProbabilityStandard Deviation and Probability

Table 4-1 shows the correspondence between ideal Gaussian behavior and the observations in Figure 4-2.

Ideally, there should be 68.3% of measurement fall in μ±1σ region.

In Figure 4-2, 71% of measurement fall in m μ±1σ region.

4-2 Student’s t

t test

2 applications of Student’s t

(i) it is the statistical tool used to express confidence intervals

(ii) It can be used to compare results from different experiments.

P.86

信賴區間

The confidence interval is a range of values within which there is a specified probability of finding the true mean.

P.86

Confidence IntervalsConfidence Intervals

the measured standard deviation

measurement #

t is Student’s t, taken from Table 4-2

Table 4-2

P.87

ExampleExample : : Calculating Confidence IntervalsCalculating Confidence Intervals

In replicate analyses, the carbohydrate content of a glycoprotein (a protein with sugars attached to it) is found to be 12.6, 11.9, 13.0, 12.7, and 12.5 g of carbohydrate per 100 g of protein.

Find the 50% and 90% confidence intervals for the carbohydrate content.

P.87

the 50 % confidence interval is

SOLUTION:SOLUTION:

5 - 1

the 90 % confidence interval is

SOLUTION:SOLUTION:

5 - 1

There is 50% chance that the true mean lies in the range 12.54 ± 0.13, i.e., 12.41 to 12.67.

There is 90% chance that the true mean lies in the range 12.54 ± 0.38, i.e., 12.16 to 12.92.

Better precision gives smaller confidence intervals.

A higher chance to find the true mean within a small interval.

公式用 measurement # : n

查表用 degree of freedom : n-1

Improving the Reliability of Your MeasurementsImproving the Reliability of Your Measurements

Better precision gives smaller confidence intervals. The confidence interval is ± ts/√n.

The only way to reduce s is to improve your experimental procedure. In the absence of a procedural change, the way to reduce the confidence interval is to increase the number of measurements.

To reduce the size of the confidence interval

we may make more measurements to increase n

we may improve measurement procedure to decrease s

Comparison of Means with Student’s tComparison of Means with Student’s t

Student’s t can be used to compare two sets of measurements to decide whether they are “statistically different.”

t-test

Figure 4-4

Figure 4-4 Lord Rayleigh’s measurements of the mass of nitrogen isolated from air or generated by decomposition of nitrogen compounds.

An example comes from the work of Lord Rayleigh (John W. Strutt), who received the Nobel Prize in 1904 for discovering the inert gas argon.

Here Spooled is a pooled standard deviation making use of both sets of data.

If the calculated t is greater than the tabulated t at the 95% confidence level

the two results are considered to be significantly different.

Degrees of freedom of two sets of measurements

ExampleExample : :

Is Rayleigh’s NIs Rayleigh’s N22 from Air Denser from Air Denser

than Nthan N22 from Chemicals? from Chemicals?

Are the two masses significantly different?

P.90

SOLUTION:SOLUTION:

7 measurements 8 measurements

Degrees of freedom = 7 + 8 – 2 = 13

For 13 degrees of freedom, t lies between 2.228 and 2.131 at the 95% confidence level.

The calculated t = 20.2 is greater than the tabulated t

so the difference is significant.

4-3 A Spreadsheet for the t Test

skip

P.91

Gcalculated questionable value mean /s. If Gcalculated Gtable, the value in question can be rejected with 95% confidence. Values in this table are for aone-sided test, as recommended by ASTM.SOURCE: ASTM E 178-02 Standard Practice for Dealing with OutlyingObservations; F. E. Grubbs and G.Beck, Technometrics 1972, 14, 847.

4-4 Grubbs Test for an Outlier

Four students performed the experiment in triplicate and pooled their results:

10.2, 10.8, 11.6 9.9, 9.4, 7.8 10.0, 9.2, 11.3 9.5, 10.6, 11.6

Sidney Cheryl Tien Dick

It appears that the value of 7.8 by Cheryl is questionable.

Should we retain this measurement?

To answer this question, use Grubbs test.

Compute the Grubbs statistic G, defined as

Grubbs test

If G calculated from Equation 4-6 is greater than G in Table 4-4, the questionable point should be discarded.

Eq. 4-6

10.2, 10.8, 11.6 9.9, 9.4, 7.8 10.0, 9.2, 11.3 9.5, 10.6, 11.6

Sidney Cheryl Tien Dick

Gcalculated = (7.8 - 10.16) / 1.11 = 2.13

Gtable = 2.285 for 12 observation in Table 4-4

Gcalculated < Gtable

The observation of value 7.8 should be retained.

There is 5% chance that the value 7.8 is a member of the same population as the other measurements.

10.2, 10.8, 11.6 9.9, 9.4, 7.8 10.0, 9.2, 11.3 9.5, 10.6, 11.6

Sidney Cheryl Tien Dick

Gcalculated (2.285) < Gtable (2.13)

The observation of value 7.8 should be retained.

There is 5% chance that the value 7.8 is a member of the same

population as the other measurements.

Common sense must always prevail. If Cheryl knows that her measurement was low because she spilled some of her unknown, then the probability that the result is wrong is 100%

the data should be discarded.

Any data based on a faulty procedure should be discarded, no matter how well it fits the rest of data.

4-5 Finding the “Best” Straight Line

m: the slope b: the y-intercept

The method of least squares

finds the “best” straight line through experimental data points.

Figure 4-6

Figure 4-6

P.93

One example

The method of least squares finds the “best” line by adjusting the line to minimize the vertical deviations between the points and the line.

Figure 4-7

P.93

Figure 4-7 Least-squares curve fitting minimizes the sum of the squares of the vertical deviations of the measured points from the line. The Gaussian curve drawn over the point (3, 3) is a schematic indication of the distribution of measured y values about the straight line. The most probable value of y falls on the line, but there is a finite probability of measuring y some distance from the line.

vertical deviation = di = yi ( 實際測量值 ) - y ( 直線模型預測值 ) = yi - (mxi + b)

Σdi2 = Σ (yi - y)2 = Σ ( yi - mxi - b)2

Because we minimize the squares of the deviations, this is called the method of least squares.

You may find out how to derive the slope and intercept in your calculus textbook.

Table 4-5 Table 4-5 sets out an example in which the four points (n = 4) in Figure 4-7 are treated.

4 data points:

(1,2), (3,3), (4,4), and (6,5)

Figure 4-7

How Reliable Are Least-Squares Parameters?How Reliable Are Least-Squares Parameters?

The uncertainties in m and b are related to the uncertainty in measuring each value of y.

standard deviation of y:

y (± sy) = [m (± sm) ]x + [b (± sb) ]

The first decimal place of the standard deviation is the last significant figure of the slope or intercept.

4-6 Constructing a Calibration Curve

A calibration curve is a graph showing how the experimentally measured property depends on the known concentrations of the standards.

The detection limit (DL) is the smallest concentration that can be reported with a certain level of confidence. Skoog

Linear range

Table 4-6Use data from a spectrophotometric analysis to show how to construct a calibration curve.

A result obtained with zero analyte is called a blank (or a reagent blank), because it measures effects due to the analytical reagents.

A solution containing a known quantity of analyte (or other reagent) is called a standard solution.

Table 4-6

Subtract the average absorbance of blanks (0.0993)

This data point is removed because it is a bad data.

Theses three data points are not included because they lie below the straight line.

The linear range is between 0 and 2.0 g but not between 0 and 2.5 g.

Figure 4-8

Once the calibration line is constructed, you may estimate the unknown’s concentration by measuring its absorbance.

method of least squares

Suppose that the measured absorbance of an unknown sample is 0.373.

How many micrograms of protein does it contain? What uncertainty is associated with the answer?

Finding the Protein in a UnknownFinding the Protein in a Unknown

0.373 – 0.0993 blank

What uncertainty is associated with the answer?

k: the number of replicate measurements of the unknown

n: the number of data points included in the calibration line

mean values of x and y for the points in the calibration line

4-7 A Spreadsheet for Least Squares

Figure 4-9

Figure 4-9 Spreadsheet for least-squares calculations.

Figure 4-9 uses built-in power of Excel for least-squares calculations of straight lines.

skip

Figure 4-10Adding error bars corresponding to the 95% confidence interval for each data point

Figure 4-11

Figure 4-11 Format Data Series window for Adding error bars to a graph.

Recommended