Upload
dgooo
View
214
Download
0
Embed Size (px)
Citation preview
8/3/2019 ITK 226 2 Statistics
1/42
Dicky Dermawanwww.dickydermawan.net78.net
8/3/2019 ITK 226 2 Statistics
2/42
I
n statistics we are concerned with method for designing andevaluating experiments to obtain information about practical
problems.
In most cases the inspection of each item ofpopulation would
be too expensive, time-consuming, or even impossible. Hencea few ofsample are drawn at random and from this inspection
conclusion about the population are inferred.
8/3/2019 ITK 226 2 Statistics
3/42
POPULATION SAMPLE
n
y
y
n
1i
i!!!Q
1n
yy
S
2n
1i
i22
!!
!
W
!
!N
1j
jj )x(fxQMean Average
VarianceVariance )x(fx jj
2
j
2 ! QW
Size large number N Size small number n
Probability function/density f(x) Relative frequency function
Distribution function F(x) Cumulative frequency function
8/3/2019 ITK 226 2 Statistics
4/42
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
320 380 340 410 380 340 360 350 320 370
350 340 350 360 370 350 380 370 300 420
370 390 390 440 330 390 330 360 400 370
320 350 360 340 340 350 350 390 380 340
400 360 350 390 400 350 360 340 370 420
420 400 350 370 330 320 390 380 400 370
390 330 360 380 350 330 360 300 360 360
360 390 350 370 370 350 390 370 370 340
370 400 360 350 380 380 360 340 330 370
340 360 390 400 370 410 360 400 340 360
8/3/2019 ITK 226 2 Statistics
5/42
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
0
2
4
6
8
10
12
14
16
18
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
AbsoluteFrequency
Tensile Strength
8/3/2019 ITK 226 2 Statistics
6/42
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
Relati
veFrequency
Tensile Strength
8/3/2019 ITK 226 2 Statistics
7/42
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
0
20
40
60
80
100
120
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
CumulativeAbsoluteFrequency
Tensile Strength
8/3/2019 ITK 226 2 Statistics
8/42
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 310 320 330 340 350 360 370 380 390 400 410 420 430440CumulativeRelativeFrequency
Tensile Strength
8/3/2019 ITK 226 2 Statistics
9/42
Min
Lower Quartile
Middle Quartile = Median
Upper Quartile
Interquartile range
Max
8/3/2019 ITK 226 2 Statistics
10/42
DOX 6E Montgomery 10
8/3/2019 ITK 226 2 Statistics
11/42
Experimental errorHypothesis testing: null hypothesis, alternative hypothesisType I error : rejecting a true hypothesisType II error : accepting a false hypohesis
One-tail test vs Two-tail testConfidence level = Significance LevelP-valueConfidence interval
8/3/2019 ITK 226 2 Statistics
12/42
I
f Y1, .Yn are independent normal random variables each ofwhich has mean Q and variance 2, then the normal randomvariable:
Is normal with the mean Q and variance 2/n and the random
variable
Is normal with the mean 0 and variance 1
The confidence interval for Q is
)Y....YYY(n
1Y n321 !
n/YZW
Q!
So far we have regarded the value y1, y2, .of a sample as n observedvalue of a single random variable Y. We may equally well regard these nvalues as single observations of n random variables Y1, Y2,.that have thesame distribution and are independent
ee
n
cy
n
cyCONF
WQ
W
8/3/2019 ITK 226 2 Statistics
13/42
A vendor submits lots of fabric to a textile manufacturer.
The manufacturer wants to know if the lot average breakingstrength exceeds 200 psi. If so, she wants to accept the lot.
Past experience indicates that a reasonable value for thevariance of breaking strength is 100 (psi)2.
Four speciments are randomly selected, and the average
breaking strength observed is psi.214y !
8/3/2019 ITK 226 2 Statistics
14/42
The hypothesis to be tested are:
This is a one-sided alternative hypothesis
The value ofthe test statistic is:
If the confidence level of 95% is chosen, i.e. type I error = 0.05, wefind Z = 1.645
Thus the difference is significant: H0 is rejected and we conclude that
the lot average breaking strength exceeds 200 psi.Thus, we accept the lot.
The confidence interval for Q at 95% confidence level is 205.8 Q 222.2. Clearly, 200 is outside the interval.
The P-value is 0.0026.
200:H
200:H
1
0
"
!
Q
Q
80.2
n/
yZ 00 !
!
W
Q
8/3/2019 ITK 226 2 Statistics
15/42
8/3/2019 ITK 226 2 Statistics
16/42
8/3/2019 ITK 226 2 Statistics
17/42
n/S
Yt 00 Q!
ee
n
Sty
n
StyCONF Q
The test statistic is
The confidence interval is
At (n-1) degree of freedom
The same as previous, but we use..t distribution instead of normal distribution
Sample standard deviationS instead of tNormal
S
W
8/3/2019 ITK 226 2 Statistics
18/42
If Variance Known
2
2
2
1
2
1
210
nn
yyZWW
!The test statistic is
The confidence interval is
ee1
2
1
1
2
12/2121
1
2
1
1
2
12/21
nnZyyyy
nnZyyCONF
WWWWEE
Normal
nnn
y
2
2
2
1
2
1
2
WWW
Q
8/3/2019 ITK 226 2 Statistics
19/42
If Variance Unknown, but 12 = 2
2
1 20
1 2
The test statistic is
1 1
p
y yt
Sn n
!
)1n()1n(
S)1n(S)1n(S
21
222
2112
P !
Choose confidence level, usually 95%, then find critical t value at associated degree offreedom, i.e. tE/2,RIf |t0|> t E/2,R, we have enough reason to reject null hypothesis and conclude that thetwo method differ significantlyAlternatively, calculate P value, i.e. the risk ofwrongly rejecting the null hypothesisOr set confidence interval and reject null hypothesis if 0 is not included in the interval
2nn 21 !R
2nn,tNormal
n
1
n
1S
n
y
21
21
P
2
!
R
W
Q
8/3/2019 ITK 226 2 Statistics
20/42
If Variance Unknown, 12 2
2
1n1n
2
n
S
n
S
2
2
2
1
2
1
2
2
2
2n
22S
1
2
1n
21S
2
22
1
21
tNormal
n
S
n
S
n
y
!
R
W
Q
2
2
2
1
2
1
210
n
S
n
S
yy
t
!
The test statistic is
8/3/2019 ITK 226 2 Statistics
21/42
8/3/2019 ITK 226 2 Statistics
22/42
DOX 6E Montgomery 22
Tension bond strength of portland cementmortar is an important characteristics of the
product. An engineer is interested incomparing the strength of a modified
formulation in which polymer latex
emulsions have been added during mixing tothe strength of the unmodified mortar. Hecollected 10 observations (Table 2.1)
Plot the dot diagram.
Plot the Box & Whisker plot
Are the two formulations really different?
Or perhaps the observed difference is theresults of sampling fluctuation and the two
formulations are really identical?
8/3/2019 ITK 226 2 Statistics
23/42
8/3/2019 ITK 226 2 Statistics
24/42
8/3/2019 ITK 226 2 Statistics
25/42
8/3/2019 ITK 226 2 Statistics
26/42
Bloking is a design technique used to improve the precision with which the
comparisons among the factors of interest are made. Often blocking is used to reduceor eliminate the variability transmitted from nuisance factors, i.e.factors that may influence the experimental response but in which we are notinterested.
The term block refers to a relatively homogeneous experimental unit, and the block
represents a restriction on complete randomization because the treatmentcombinations are only randomized within the block. Blocking is carried out by makingcomparisons within matched pairs of experimental material.
The confidence interval based on paired analysis usually much narrower than thatfrom the independent analysis. This illustrates the noise reduction property of
blocking.
8/3/2019 ITK 226 2 Statistics
27/42
S
tatistical model 4 complete randomization:
with (2ni -1) degree of freedom
Statistical model with blocking:
with only (ni pair -1) degree of freedom
The test statistic:
The confidence interval for 2-sided test:
n/S
dt
d
0 !
!
!!
i
ijiij
n1,2,...,j
1,2iy IQ
!
!!
i
ijjiijn1,2,...,j
1,2iy IFQ
j2j1j yyd !
n/Stddn,2/
sE
8/3/2019 ITK 226 2 Statistics
28/42
Consider a hardness testing machine that presses a rod with a pointed tip into a
metal specimen with a known force. Two different tips are available for this machine,and it is suspected that one tip produces different hardness readings than the other.
The test could be performed as follows: a number of metal specimens couldrandomly be selected. Half are tested by tip 1 and the other half by tip 2.
The metal specimens might be cut from different bar stock that were not exactlydifferent in their hardness. To protect against this possibility, an alternativeexperimental design should be considered: divide each specimen into two part andrandomly assign each tip to of each specimen
8/3/2019 ITK 226 2 Statistics
29/42
- Use the paired data to determine a 95% confidence interval for the difference- What if we use pooled or independent analysis?
Speciment Tip 1 Tip 2
1 7 6
2 3 3
3 3 5
4 4 3
5 8 8
6 3 2
7 2 4
8 9 9
9 5 4
10 4 5
8/3/2019 ITK 226 2 Statistics
30/42
8/3/2019 ITK 226 2 Statistics
31/42
8/3/2019 ITK 226 2 Statistics
32/42
In some experiments it is the comparison of variability in the data that is important.
For example, in chemical laboratories, we may wish to compare the variability of twoanalytical methods.Unlike the tests on means, the procedures for tests on variances are rather sensitiveto the normality assumption.
Suppose we wish to test the hypothesis weather or not the variance of a normalpopulation equals a constant, viz. 0
2 . The test statistic is:
The appropriate distribution for G02 is chi-square distribution with (n-1) degree of
freedom. The confidence interval for 02 is
2
0
2
2
0
2
0
S)1n(SS
WWG !!
2
1n,1
22
2
1n,
2
22
S)1n(S)1n(
ee
EE GW
G
8/3/2019 ITK 226 2 Statistics
33/42
Suppose we wish to test equality of the variances of two normal populations.If independent random samples of size n1 and n2 are taken from populations 1 & 2,respectively, the test statistic for:
Is the ratio of the sample variances:
The appropriate distribution for F0 is the F distribution with (n1-1) numerator degree
of freedom and (n2-1) denominator degree of freedom. The null hypothesis would berejected if F0 > F/2,n1-1,n2-1
The confidence interval for 12 / 2
2 is
2
2
2
10
S
SF !
1n,1n,22
2
1
22
2
1
1n,1n,122
2
1
122122
FS
SF
S
S
ee EEW
W
2
0
2
1
2
0
2
0
:H
:H
WW
WW
{
!
1n,1n,1n,1n,1
122
212 F
1F:Note
!E
E
8/3/2019 ITK 226 2 Statistics
34/42
Probability plotting is a graphical technique for determining whether sample data
conform to a hypothesized distribution based on a subjective visual examination ofthe data.
To construct a probability plot, the observation in the sample are first rank fromsmallest to largest. That is, the sample y1,y2,,yn is arranged as y(1) ,y(2) ,.,y(n) wherey(1) is the smallest observation, with y(n) the largest.The ordered observations y(j) are then plotted against their observed cumulativefrequency (j-0.5)/n.The cumulative frequency scale has been arranged so that if the hypothesizeddistribution adequately describes the data, the plotted points will fall approximatelyalong a straight line. Usually, this is subjective.
8/3/2019 ITK 226 2 Statistics
35/42
8/3/2019 ITK 226 2 Statistics
36/42
8/3/2019 ITK 226 2 Statistics
37/42
8/3/2019 ITK 226 2 Statistics
38/42
8/3/2019 ITK 226 2 Statistics
39/42
8/3/2019 ITK 226 2 Statistics
40/42
DOX 6E Montgomery 40
An experiment is a test or a series of tests Experiments are used widely in the engineering
world
Process characterization & optimization Evaluation of material properties
Product design & development
Component & system tolerance determination
All experiments are designed experiments,some are poorly designed, some are well-
designed
8/3/2019 ITK 226 2 Statistics
41/42
DOX 6E Montgomery 41
Randomization
Running the trials in an experiment in random order
Notion of balancing out effects oflurking variables
Replication
Sample size (improving precision of effect estimation,estimation of error or background noise)
Replication versus repeat measurements? (see page 13)
Blocking
Dealing with nuisance factors
8/3/2019 ITK 226 2 Statistics
42/42
DOX 6E Montgomery 42
Best-guess experiments Used a lot
More successful than you might suspect, but there are
disadvantages One-factor-at-a-time (OFAT) experiments
Sometimes associated with the scientific orengineering method
Devastated by interaction, also very inefficient Statistically designed experiments
Based on Fishers factorial concept