Upload
raul-soto
View
157
Download
9
Embed Size (px)
Citation preview
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Statistics for Non-Statisticians
PRACTICAL APPROACH FOR VALIDATION ANDQUALITY ENGINEERING PROFESSIONALS
(c) 2016 / Raul Soto 1
The contents of this presentation represent the opinion of the speaker; and not necessarily that of his present or past employers.
2IVT APRIL - PHILADELPHIA (C) 2016 RAUL SOTO
About the Author• 20+ years of experience in the medical devices, pharmaceutical, biotechnology, and consumer electronics industries
– MS Biotechnology, emphasis in Biomedical Engineering– BS Mechanical Engineering– ASQ Certified Quality Engineer (CQE)
• I have led validation / qualification efforts in multiple scenarios:
– High-speed, high-volume automated manufacturing and packaging equipment; machine vision systems– Laboratory information systems and instruments– Enterprise resource planning applications (i.e. SAP)– IT network infrastructure, Cognos & Business Objects reports– Manufacturing Execution Systems (MES)– Mobile apps– Product improvements, material changes, vendor changes
• Contact information:– Raul Soto [email protected]
(c) 2016 / Raul Soto 3
What this talk is about• Introduce and describe the main statistical tools used for
validation, process development, optimization, control.
• Understand basic concepts, underlying assumptions, and limitations
• Understand why we can’t just plug in numbers into Minitab without knowing the fundamental assumptions (“the fine print”)
• Can’t teach two semesters of Statistics in 90 minutes …
(c) 2016 / Raul Soto 4
Some Uses of Statistics in Validation• Quantify how well a process can meet its specifications (new
process, before/after changes)
• Determine if a process change had the intended effect
• Determine if a process change that was not supposed to impact the product actually had no impact
• Quantify the sources of variation in your process
(c) 2016 / Raul Soto 5
• Compare equivalence of materials from different vendors
• Determine if multiple lines running the same product are equivalent or not
• Model your process outputs in terms of your process inputs
• Find the process input settings that optimize your process outputs
• Make claims about your process average, or about every unit made by your process
(c) 2016 / Raul Soto 6
Some Uses of Statistics in Validation
Tools that will be presented
• Process Capability Analysis• Hypothesis Testing• Simple Linear Regression• Analysis of Variance (ANOVA)• Design of Experiments (DoE) • Confidence / Prediction / Tolerance intervals
(c) 2016 / Raul Soto 7
Process Capability AnalysisHOW TO DETERMINE IF YOUR PROCESS IS CAPABLE
(c) 2016 / Raul Soto 8
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Process Capability• Capable Process: We can make product
that meets specifications
• Process Capability: Quantifies numerically how capable a process is of meeting its specifications
• Compares the actual process width vs the design width (USL – LSL)
• Assumes NORMALITY of the data
(c) 2016 Raul Soto 9
134.4133.0131.6130.2128.8127.4126.0
LSL 125Target 130USL 135Sample Mean 130.5Sample N 100StDev(Overall) 0.675586StDev(Within) 0.704465
Process Data
CI for Cp (2.00, 2.73)CPL 2.60CPU 2.13Cpk 2.13CI for Cpk (1.80, 2.46)
Pp 2.47CI for Pp (2.12, 2.81)PPL 2.71PPU 2.22Ppk 2.22CI for Ppk (1.91, 2.53)Cpm 0.79LB for Cpm 0.70
Cp 2.37Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.00 0.00PPM > USL 0.00 0.00 0.00PPM Total 0.00 0.00 0.00
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability Report for Line 1(using 95.0% confidence)
CPK vs Sigma Levels
• sigma levels : how many standard deviations can we fit between the process mean and the closest specification limit
• Cpk = lowest of CPU or CPL
= ( − )3= ( − )3(c) 2016 / Raul Soto 10
http://www.six-sigma-material.com/Tables.html
Other capability indexes:
(c) 2016 / Raul Soto 11
Cp : ratio of the design spread to the process spread
Cpm : overall capability. Penalizes when the process is off-center
= ∗ s ( )
= ( − )6
Cpk vs Ppk
(c) 2016 / Raul Soto 12
• The basic difference between Ppk and Cpk is the standard deviation used for the calculation. The formulas used are basically the same.
• For Cpk we use the within-lot variability.
• For Ppk we use the overall variability (all lots).
• Cpk will typically be higher than Ppk.
• As we reduce lot-to-lot variability, the process becomes stable, and Ppk approaches Cpk.
Short-term vs Long-Term Capability
• Short – term variation: the variation within a subgroup (for example, one shift, one operator, or one material batch.)
• Overall, long-term variation: the variation of all measurements, which is an estimate of the overall process variation.
• Long-term capability of a process allows us to view the bigger picture
• LT capability is typically lower than the process capability of individual lots
(c) 2016 / Raul Soto 13
Short-term vs Long-Term Capabilityor “why is my process performing poorly, if my three validation batches were so good?”
• The distribution or the mean of a process may be stable for a single set of raw materials, operator, machine, etc.
• But long term variation may increase because of process shifts and drifts
• Possible causes of process shifts : different operators, raw material batches, machines, changes in the environment
• Possible causes of process drifts : equipment aging, parts wear off, environmental factors
• Some authors assume a short term / long term process shift of 1.5 sigma as a rule of thumb. This means that the long term variation will be approximately 1.5 times the short term variation.
(c) 2016 / Raul Soto 14
Process Shifts, Drifts
• Looking only at short term distributions may give the impression that the process is in control and capable
• But if the process mean or the process variability are drifting over time, the long term distribution will show a different picture
• Usually when this happens, the validation of the process is questioned.
• Validation should not be seen as a one-shot deal, but as part of a continuousimprovement process.
(c) 2016 / Raul Soto 15
(c) 2016 / Raul Soto 16
Lot 7Lot 6Lot 5Lot 4Lot 3Lot 2Lot 1
15.0
12.5
10.0
7.5
5.0
Dat
a
Process Mean Shifting Over Time
10
2
4
6
8
01
21
41
61
81
5.4 0.6 5.7 0.9 5.01 0.21 5.31 0.5
7.392 0.9618 3010.29 0.9634 3013.30 0.8041 3010.07 1.226 30
6.635 0.8716 3012.71 0.8753 30
9.237 0.9759 309.947 2.492 210
Mean StDev N
D
ycneuqerF
ata
LelbairaV
llarevO7 toL6 toL5 toL4 toL3 toL2 toL1 to
P lamroN
emiT revO gnitfihS naeM ssecor
(c) 2016 / Raul Soto 17
20
5
01
51
02
4 8 21 61 02 42 8
10.09 0.6285 3014.83 0.8489 3020.36 1.138 3024.82 1.142 3017.53 5.666 120
Mean StDev N
D
ycneuqerF
ata
LelbairaV
_llarevOD toLC toLB toLA to
P lamroN
emiT revO gnitfirD naeM ssecor
Lot DLot CLot BLot A
30
25
20
15
10
Dat
a
Process Mean Drifting Over Time
• When you base your validation only on the Cpk of individual lots, you are not taking into account long-term variation
• This method does not provide a realistic view of how the process will perform in the long term.
• Long term process capability should also be calculated, by using the data from all the lots.
(c) 2016 / Raul Soto 18
The 3-consecutive-lots Rule of Thumb?Short-term vs Long-Term capability
(c) 2016 / Raul Soto 19
Process Capability Example 1:
Calculate lot-to-lot (short term) and process (long term) capabilityusing 3 validation lots
Descriptive Statistics: Lot A, Lot B, Lot C, Lot D
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 MaximumLot A 60 0 29.748 0.129 0.999 27.271 28.965 29.737 30.331 31.944Lot B 60 0 30.071 0.131 1.015 27.520 29.351 30.135 30.709 32.050Lot C 60 0 30.192 0.132 1.022 27.560 29.548 30.203 30.763 33.302Lot D 60 0 29.671 0.143 1.106 27.045 28.835 29.694 30.541 31.859
(c) 2016 / Raul Soto 20
Lot DLot CLot BLot A
35.0
32.5
30.0
27.5
25.0
Dat
a
Consistent process
(c) 2016 / Raul Soto 21
34.533.031.530.028.527.025.5
LSL 25Target 30USL 35Sample Mean 30Sample N 60StDev(Overall) 0.999348StDev(Within) 1.00359
Process Data
CI for Cp (1.36, 1.96)CPL 1.66CPU 1.66Cpk 1.66CI for Cpk (1.36, 1.96)
Pp 1.67CI for Pp (1.37, 1.97)PPL 1.67PPU 1.67Ppk 1.67CI for Ppk (1.37, 1.97)Cpm 1.62LB for Cpm 1.37
Cp 1.66Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.28 0.31PPM > USL 0.00 0.28 0.31PPM Total 0.00 0.56 0.63
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Lot A(using 95.0% confidence)
(c) 2016 / Raul Soto 22
34.533.031.530.028.527.025.5
LSL 25Target 30USL 35Sample Mean 30Sample N 60StDev(Overall) 1.01513StDev(Within) 1.01944
Process Data
CI for Cp (1.34, 1.93)CPL 1.63CPU 1.63Cpk 1.63CI for Cpk (1.34, 1.93)
Pp 1.64CI for Pp (1.35, 1.94)PPL 1.64PPU 1.64Ppk 1.64CI for Ppk (1.35, 1.94)Cpm 1.64LB for Cpm 1.39
Cp 1.63Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.42 0.47PPM > USL 0.00 0.42 0.47PPM Total 0.00 0.84 0.94
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Lot B(using 95.0% confidence)
(c) 2016 / Raul Soto 23
34.533.031.530.028.527.025.5
LSL 25Target 30USL 35Sample Mean 30Sample N 60StDev(Overall) 1.02202StDev(Within) 1.02636
Process Data
CI for Cp (1.33, 1.92)CPL 1.62CPU 1.62Cpk 1.62CI for Cpk (1.33, 1.92)
Pp 1.63CI for Pp (1.34, 1.92)PPL 1.63PPU 1.63Ppk 1.63CI for Ppk (1.34, 1.92)Cpm 1.60LB for Cpm 1.36
Cp 1.62Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.50 0.55PPM > USL 0.00 0.50 0.55PPM Total 0.00 1.00 1.11
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Lot C(using 95.0% confidence)
(c) 2016 / Raul Soto 24
34.232.430.628.827.025.2
LSL 25Target 30USL 35Sample Mean 30Sample N 60StDev(Overall) 1.10599StDev(Within) 1.11068
Process Data
CI for Cp (1.23, 1.77)CPL 1.50CPU 1.50Cpk 1.50CI for Cpk (1.23, 1.77)
Pp 1.51CI for Pp (1.24, 1.78)PPL 1.51PPU 1.51Ppk 1.51CI for Ppk (1.24, 1.78)Cpm 1.44LB for Cpm 1.22
Cp 1.50Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 3.08 3.37PPM > USL 0.00 3.08 3.37PPM Total 0.00 6.16 6.74
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Lot D(using 95.0% confidence)
(c) 2016 / Raul Soto 25
Short term (lot to lot) results:
Lot ACpk = 1.66expected defects per million = 0.56
Lot BCpk = 1.63expected defects per million = 0.84
Lot CCpk = 1.62expected defects per million = 1.0
Lot DCpk = 1.50expected defects per million = 6.16
If our validation criteria is based solely on the Cpk’s of individual lots being equal or greater than 1.0, we would pass.
Let’s look at Long-Term variation:
(c) 2016 / Raul Soto 26
34.533.031.530.028.527.025.5
LSL 25Target 30USL 35Sample Mean 30Sample N 240StDev(Overall) 1.05268StDev(Within) 1.03755
Process Data
CI for Cp (1.46, 1.75)CPL 1.61CPU 1.61Cpk 1.61CI for Cpk (1.46, 1.75)
Pp 1.58CI for Pp (1.44, 1.73)PPL 1.58PPU 1.58Ppk 1.58CI for Ppk (1.44, 1.73)Cpm 1.58LB for Cpm 1.46
Cp 1.61Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 1.02 0.72PPM > USL 0.00 1.02 0.72PPM Total 0.00 2.04 1.44
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Long Term (All lots)(using 95.0% confidence)
LONG TERM RESULTS
Ppk = 1.58
Expected defects per million units = 2.04
(c) 2016 / Raul Soto 27
Process Capability Example 2:
Calculate lot-to-lot (short term) and process (long term) capabilityusing 3 validation lots
Descriptive Statistics: Lot 1, Lot 2, Lot 3
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 MaximumLot 1 60 0 15.500 0.0754 0.584 14.138 15.020 15.513 15.807 16.992Lot 2 60 0 11.914 0.0605 0.469 10.562 11.633 11.902 12.218 13.114Lot 3 60 0 17.869 0.0810 0.627 16.299 17.401 17.959 18.232 19.513
(c) 2016 / Raul Soto 28
Lot 3Lot 2Lot 1
20
18
16
14
12
10
Dat
a
Boxplot of Lot 1, Lot 2, Lot 3
USL = 20, LSL = 10
Inconsistent process
(c) 2016 / Raul Soto 29
19.518.016.515.013.512.010.5
LSL 10Target 15USL 20Sample Mean 15Sample N 60StDev(Overall) 0.584095StDev(Within) 0.565121
Process Data
CI for Cp (2.42, 3.48)CPL 2.95CPU 2.95Cpk 2.95CI for Cpk (2.42, 3.48)
Pp 2.85CI for Pp (2.34, 3.37)PPL 2.85PPU 2.85Ppk 2.85CI for Ppk (2.34, 3.37)Cpm 2.16LB for Cpm 1.83
Cp 2.95Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.00 0.00PPM > USL 0.00 0.00 0.00PPM Total 0.00 0.00 0.00
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability for Lot 1(using 95.0% confidence)
(c) 2016 / Raul Soto 30
19.518.016.515.013.512.010.5
LSL 10Target 15USL 20Sample Mean 15Sample N 60StDev(Overall) 0.468703StDev(Within) 0.499807
Process Data
CI for Cp (2.73, 3.93)CPL 3.33CPU 3.33Cpk 3.33CI for Cpk (2.73, 3.93)
Pp 3.56CI for Pp (2.92, 4.20)PPL 3.56PPU 3.56Ppk 3.56CI for Ppk (2.92, 4.20)Cpm 0.53LB for Cpm 0.45
Cp 3.33Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.00 0.00PPM > USL 0.00 0.00 0.00PPM Total 0.00 0.00 0.00
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability for Lot 2(using 95.0% confidence)
(c) 2016 / Raul Soto 31
19.217.616.014.412.811.2
LSL 10Target 15USL 20Sample Mean 15Sample N 60StDev(Overall) 0.627216StDev(Within) 0.594381
Process Data
CI for Cp (2.30, 3.31)CPL 2.80CPU 2.80Cpk 2.80CI for Cpk (2.30, 3.31)
Pp 2.66CI for Pp (2.18, 3.13)PPL 2.66PPU 2.66Ppk 2.66CI for Ppk (2.18, 3.13)Cpm 0.56LB for Cpm 0.48
Cp 2.80Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 0.00 0.00PPM > USL 0.00 0.00 0.00PPM Total 0.00 0.00 0.00
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability for Lot 3(using 95.0% confidence)
(c) 2016 / Raul Soto 32
Individually, based solely in short term process variation, each lot looks good.
Lot 1Cpk = 2.95expected defects per million = 0
Lot 2Cpk = 3.33expected defects per million = 0
Lot 3Cpk = 2.80expected defects per million = 0
If our validation criteria is based solely on the Cpk’s of individual lots being equal or greater than 1.0, we would pass.
But when we look at the long-term variation of the overall process, the picture changes:
(c) 2016 / Raul Soto 33
201816141210
LSL 10Target 15USL 20Sample Mean 15Sample N 180StDev(Overall) 2.51829StDev(Within) 0.564786
Process Data
CI for Cp (2.64, 3.26)CPL 2.95CPU 2.95Cpk 2.95CI for Cpk (2.64, 3.26)
Pp 0.66CI for Pp (0.59, 0.73)PPL 0.66PPU 0.66Ppk 0.66CI for Ppk (0.59, 0.73)Cpm 0.66LB for Cpm 0.60
Cp 2.95Potential (Within) Capability
Overall Capability
PPM < LSL 0.00 23545.98 0.00PPM > USL 0.00 23545.98 0.00PPM Total 0.00 47091.95 0.00
Observed Expected Overall Expected WithinPerformance
LSL Target USLOverallWithin
Process Capability - Long Term(using 95.0% confidence)
The overall capability of the process is significantly lower than the individual capability of the individual lots.
This is so because the process mean is shifting.
Overall:
Ppk = 0.66expected defects per million = 47,092 (4.7%)
Look at your long term process capability
This will allow you to obtain a better picture of how the process will perform in the long term.
(c) 2016 / Raul Soto 34
Hypothesis TestingHOW TO MAKE STATISTICAL COMPARISONSBETWEEN TWO SAMPLES, OR BETWEEN ASAMPLE AND A SET VALUE
(c) 2016 / Raul Soto 35
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Hypothesis Testing
• Compare two samples
– before/after a change in the process– Raw materials from different vendors– Two plants / machines / tool sets / operators / shifts, etc.
• Compare a sample against a specific value
– Determine if a process / product change has effect on process mean or std dev– Claim that your mean is equal, less than, or greater than a specific value
(c) 2016 / Raul Soto 36
• Hypothesis tests can be used to test claims of :– Equivalency– Superiority– Non-Inferiority
• Tests for quantitative data– Tests for Means (Z-test, t-test)
• Paired test for means (paired t-Test)– Tests for Variation (F-test, ChiSquared-
test)– Tests for Proportions (Z-test)
• Tests for qualitative data– Two sample test, binary data: Fisher’s
Exact test– Two sample test, paired binary data:
McNemar’s test
(c) 2016 / Raul Soto 37
Hypothesis Testing
• A hypothesis is a CLAIM we want to make, and substantiate with evidence
• NULL and ALTERNATE hypotheses
– NULL (Ho): what is assumed to be true– ALTERNATE (Ha): typically what we want to prove
– You need a high level of certainty (typically 90, 95 or 99%) to prove the Alternate Hypothesis
– This means that IF the null hypothesis is true, the probability of observing a difference as extreme, or more, than the one observed, is at least 90 / 95 / 99%(“Frequentist” statement)
(c) 2016 / Raul Soto 38
What is a Hypothesis?
• [1] Start by formulating your QUESTION OF INTEREST– What claim do you want to make? What question do you want to answer?– Be sure everyone in the team is aligned with what the question of interest is– Select the correct type hypothesis test to answer your question
• Test for means, for variance, for proportions• One sided vs Two sided
• [2] How accurate do you want your hypothesis test to be? – This necessary to calculate the sample size you need for your test
• [3] Determine your desired confidence level– 99%, 95%, 90% are the most commonly used
(c) 2016 / Raul Soto 39
What is a Hypothesis?
• Question: Do Manufacturing Lines 1 and 2 make equivalent product with respect to a specific characteristic (weight, length, thickness, hardness, pull strength, etc.)
– NULL (Ho):Lines 1 and 2 make equivalent product with respect to the selected characteristic
– ALT (H1):Lines 1 and 2 do NOT make equivalent product with respect to the selected characteristic
(c) 2016 / Raul Soto 40
Null vs Alternate Hypothesis
• Normality:– The populations from which you get your samples should follow a normal distribution– Check assumption with a normal probability plot, or a histogram of residuals
• Independence:– Assumes that one data point does not influence the value of the next data point– Randomization of runs helps– Check assumption with a Runs vs Order run chart
• Random:– Your sample units must be selected at random from the population
(c) 2016 / Raul Soto 41
Hypothesis Testing: Assumptions
Independence
• Independence means that one value or reading from a process is not affected or dependent on the previous value
• Required by many statistical procedures
• If this is violated, results are likely to be severely biased
(c) 2016 / Raul Soto 42
Using existing / historical data • Always be careful when using historical data –
understand the risks involved
• Look out for confounding or hidden factors (variables)
• Do you have the run order of the data?
• Is the data set complete?
• Were the instruments used to measure characteristics capable?
• When was the data collected? Has the process changedsince?
(c) 2016 / Raul Soto 43
p-values• Used to determine whether to reject, or not reject, a null
hypothesis
• Definition: p-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value of the statistic when the null hypothesis is true. (Montgomery 1997)
• For a 95% confidence level:– If p-value < 0.05, reject the null hypothesis, accept the
alternate– If p-value > 0.05, do not reject the null hypothesis
• This is the typical way to support your conclusions with statistical evidence
(c) 2016 / Raul Soto 44
Montgomery, Douglas C. Design and Analysis of Experiments, 4th Edition. New York: Wiley, 1997. Print. [Chapter 2.4, page 37]
p-values
1. The p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false. It is not connected to either.
2. The p-value is not the probability that a finding is "merely a fluke.“
3. The p-value is not the probability of falsely rejecting the null hypothesis.
4. The p-value is not the probability that replicating the experiment would yield the same conclusion.
5. The significance level, such as 0.05, is not determined by the p-value.
6. The p-value does not indicate the size or importance of the observed effect.
(c) 2016 / Raul Soto 45
Sterne, J. A. C.; Smith, G. Davey (2001). "Sifting the evidence–what's wrong with significance tests?". BMJ (Clinical research ed.) 322 (7280): 226–231.
Schervish, M. J. (1996). "P Values: What They Are and What They Are Not". The American Statistician 50 (3): 203.
Example• We are testing a new set of tooling for a manufacturing process. • Cheaper, lasts longer.• Claim: No significant change in the product’s critical quality characteristics, at 95% confidence• We need to provide documented evidence that supports that claim.
Product Characteristic: Pull strength (in lb)
State the null and alternate hypotheses:Ho : μ1 = μ2 (no statistically significant shift in the process mean)
H1 : μ1 < > μ2(statistically significant shift in the process mean)
Select the appropriate test : Two-sample hypothesis test for means
(c) 2016 / Raul Soto 46
(c) 2016 / Raul Soto 47
• Historically the process mean is 4.5 lb, with a standard deviation of approx 1 lb• We want our hypothesis test to be able to detect a difference of 0.5 lb• To calculate the required sample size (Minitab):
(c) 2016 / Raul Soto 48
• You will need a sample of at least n = 32 units if you want your hypothesis test to be able to detect a difference of 0.5 lbs
What is “Power”?• Power is the probability of rejecting a false null hypothesis (1 – β)• Confidence is the probability of accepting a true null hypothesis (1 - α)• Increasing power increases the sample size
(c) 2016 / Raul Soto 49
Power How sample size is calculated
0.5 Sample size is calculated using only α error
0.8 Using 80% probability of rejecting a false null hypothesis AND (1-α)% probability of accepting a true one
0.9 Using 90% probability of rejecting a false null hypothesis AND (1-α)% probability of accepting a true one
(c) 2016 / Raul Soto 50
Two-Sample T-Test and CI: Old, New
Two-sample T for Old vs New
N Mean StDevSE MeanOld 32 4.55 1.24 0.22New 32 5.72 1.08 0.19
Difference = mu (Old) - mu (New)Estimate for difference: -1.17195% CI for difference: (-1.752, -0.590)T-Test of difference = 0 (vs not =): T-Value = -4.03 P-Value = 0.000 DF = 60
(Minitab)
(c) 2016 / Raul Soto 51
Statistical conclusion
p-value = 0.000
We reject Ho, and conclude that the population means are different.
Practical conclusion
• We have obtained documented evidence which provides a high degree of certainty (>95%) that the new tooling causes a statistically significant shift in the process mean.
• We may need to change process parameters if we want to bring the process mean back to its original target. > Design of Experiments
(c) 2016 / Raul Soto 52
• What do you call…
… a successful hypothesis test? A Measurement
… an unsuccessful hypothesis test? A Discovery
(c) 2016 / Raul Soto 53
Hypothesis Testing
Simple Linear RegressionCREATE A MATHEMATICAL MODEL OF YOUR RESPONSES (EX: PRODUCT QUALITY CHARACTERISTICS) AS A FUNCTION OF YOURFACTORS (EX: PROCESS CRITICAL PARAMETERS)
(c) 2016 / Raul Soto 54
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Simple Linear Regression (SLR)
• Allows us to develop a mathematical model which will allow us to estimate and predict the response as a function of one or more factors.
• Use for quantitative data, not qualitative(use Logistic Regression for qualitative data)
• Correlation does not imply causation!
• Cannot handle interactions easily
• ANOVA can handle interactions and qualitative data
(c) 2016 / Raul Soto 55
(c) 2016 / Raul Soto 56
• Normality:– In the independent variable(s), in the dependent variable, and in the residuals– Check assumption with a normal probability plot, or a histogram of residuals
• Independence:– Assumes that one data point does not influence the value of the next data point– Randomization of runs helps– Check assumption with a Runs vs Order run chart– FACTORS must be independent from one another (no confounding)
• check using Pearson correlation and Variable Inflation Factor (VIF)
• Homogeneity:– All residuals follow the same distribution, with mean = 0 and constant variance– Check assumption with a residuals vs fits plot
(c) 2016 / Raul Soto 57
SLR: Assumptions
Replication vs Subsampling
• NOT the same thing!
• REPLICATION : when each treatment is applied to multiple EUs• SUBSAMPLING : when a treatment is applied to multiple OUs
• EXAMPLE: You are testing your process at Pressure = 300 psi, Temperature = 100°C– If you run multiple product units at this setting => this is SUBSAMPLING– If you run this setting multiple times => this is REPLICATION
• Each allows you to look at different types of variation in the process
(c) 2016 / Raul Soto 58
Data Collection• When you document the results of your statistical tests:
• Include the RUN ORDER – which should be randomized
• When you document the run order it’s possible to analyze the data and determine if there are any time-related effects; for example if your variability is increasing, or decreasing, with time.
• Decreasing : Learning curve effect?• Increasing: Human fatigue? Machine wear?
Periodic equipment adjustment needed?Calibration issues?
(c) 2016 / Raul Soto 59
If these effects are present, then the assumption of independence does NOT hold
• This is used to REDUCE bias – but it does not eliminate it
• The order of the various runs of an experiment or trials should be established at RANDOM
• When there are unknown factors affecting a process, randomization helps block / even out the effects of those factors
– This helps to AVERAGE OUT all unknown / extraneous factors– Also helps to average out time-related effects, such as learning curves or human fatigue
• How ?– Use software that randomizes runs– Use a random numbers table (from a statistics book) and sort
(c) 2016 / Raul Soto 60
Randomization
Randomization• EXAMPLE: You want to use regression to model a sealing
process defined by three variables:– Pressure range: 200 – 300 psi– Temperature range: 100 – 140 °C– Time: 0.9 – 1.1 sec
• You need to test 8 treatments to challenge the process envelope
• DON’T run them in order. Randomize the order• Include REPLICATION: run each possible treatment more
than once
(c) 2016 / Raul Soto 61
Runs P (psi) T (°C) t (s)
1 200 100 0.9
2 200 100 1.1
3 200 140 0.9
4 200 140 1.1
5 300 100 0.9
6 300 100 1.1
7 300 140 0.9
8 300 140 1.1
Randomization
(c) 2016 / Raul Soto 62
• Run order is randomized• 3 replicates per treatment
– This means each treatment is tested 3 times
• If we take multiple sample units at each treatment, that is subsampling.
Experimental vs Observational Units
• Experimental Unit (EU):– the smallest set of entities or units where we apply a treatment– EUs are key to determine sample size
• Observational Unit (OU):– entities or units where we measure one or more characteristics– OUs are important when there is variability in the EU reading
(c) 2016 / Raul Soto 63
• EXAMPLES– You measure the sealing strength in two sides of a foil or blister package– EU: each individual blister, all sides get the same treatment– OU: 2 per EU
– You have a process where you make 4 packaging blisters at the same time using a 4-cavity forming die– EU: each 4-blister set– OU: each blister, 4 per EU
– You want to test an improved coating process. You take 100 units from each of 6 lots, randomize which of the 6 sets gets the control vs experimental process
– EU: each group of 100 units– OU: each unit, 100 OUs per EU
(c) 2016 / Raul Soto 64
Experimental vs Observational Units
Residuals
• Difference between an observation and the predicted value, in the y-axis
• Least-squares regression finds the equation of the curve that minimizes residuals
(c) 2016 / Raul Soto 65
(c) 2016 / Raul Soto 66
(a) Unbiased and homoscedastic. The residuals average to zero in each thin vertical strip and the SD is the same all across the plot.(b) Biased and homoscedastic. The residuals show a linear pattern, probably due to a lurking variable not included in the experiment.(c) Biased and homoscedastic. The residuals show a quadratic pattern, possibly because of a nonlinear relationship. Sometimes a variable transform will eliminate the bias.(d) Unbiased, but heteroscedastic. The SD is small to the left of the plot and large to the right: the residuals are heteroscedastic.(e) Biased and heteroscedastic. The pattern is linear.(f) Biased and heteroscedastic. The pattern is quadratic
Confounding / Multicollinearity• CONFOUNDING is when you cannot distinguish between the effects of two or more
factors.
• Confounding introduces bias and increases variance in your data
• EXAMPLE:– You want to test if raw materials from vendors A and B are equivalent. You run lots from vendor A in Line
1, and lots from vendor B in Line 2.– If you see a difference, is it due to an actual difference in the materials, or a difference in the LINES?– If you can’t tell, then the effects of these two factors, lines and raw materials, are said to be confounded
• Can happen in poorly planned experiments
• Also happens when you analyze historical data, if the data collection run sheets do not capture the appropriate variables used
(c) 2016 / Raul Soto 67
Confounding / Multicollinearity• Variance Inflation Factor (VIF) is a measure of confounding• VIF = 1 : there is no confounding in your model• VIF < 3 : partial confounding, but we can still trust p-values and conclusions• VIF > 3 : confounding, don’t trust p-values or conclusions
(c) 2016 / Raul Soto 68
BAD!
GOOD!
In this example there was major confounding because 3 of the factors were NOT independent (weight, waist, bmi). Look at VIF and p-values
Removing those factors from the model, and re-computing using only one factor (weight), a good model was obtained
Confounding / Multicollinearity
• Principal Components Analysis can be used to visually detect collinearity or correlation among the factors, and identifying factors that are highly collinear
• [Minitab] Stat > Multivariate > Principal Components, enter the variables, select Graphs, and check Loading Plot.
• Principal Components Analysis shows highcollinearity among 3 factors: weight, bmi, and waist.
0.60.50.40.30.20.10.0
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
First Component
Seco
nd C
ompo
nent
bmi
height
weight
waist
Loading Plot of waist, height, weight, and bmi
Outliers
(c) 2016 / Raul Soto 70
Outlier?
Outlier?
Outlier?
Outliers• Observation that lies an abnormal distance from other values in a random sample• They may indicate the presence of special-cause variation• They may provide important information about the manufacturing process and about the
data gathering and recording process• Outliers can have big impact on the model
equation
(c) 2016 / Raul Soto 71
Use Studentized Residuals to detect outliers (a.k.a. Standardized Residuals)
If the Studentized Residual for a specific value is larger than ± 3, you are 99.7% confident that the value is a potential outlier
Regression Example
• Simple linear regression will be used to determine if there is a linear relationship between the time in hours a machine is used (input) and off-target values in millimeters (output) with 95% confidence.
• If there is a relationship, then we will determine what that relationship is. Simple linear regression is appropriate since both input and output are continuous variables.
• Random samples will be taken, 5 parts every 12 hours during a 1-week period, for a total of 60 parts.
• The assumptions of normality and equal variances will be checked by graphical means using residuals.
• Studentized residuals will be used to determine any potential outliers; any studentizedresidual greater than 3 or lower than -3 will be considered a potential outlier.
(c) 2016 / Raul Soto 72
With a p-value <0.0001 we are 95% confident that there is a linear relationship between the time (hrs) a machine is used and the off-target value (mm).
This relationship is modeled by the following equation (with an R2
adj = 79.96%)
Off target value = 0.04233 + 0.000805 * HRS
With a p-value of 0.614 we are at least 95% confident that there is no lack of fit
p-values for the factor and the response are both <0.001, therefore both terms belong in the model
Two data points were flagged as possible outliers: 42 and 60. The Studentized residuals are within ± 3, therefore they will not be treated as outliers
VIF = 1, therefore there is no confounding present in the model
S = Root Mean Squared Error = Standard deviation of the residuals(c) 2016 / Raul Soto 73
0.0500.0250.000-0.025-0.050
99.999
90
50
10
1
0.1
Residual
Perc
ent
0.180.150.120.090.06
0.050
0.025
0.000
-0.025
-0.050
Fitted Value
Resi
dual
0.040.020.00-0.02-0.04
16
12
8
4
0
Residual
Freq
uenc
y
7065605550454035302520151051
0.050
0.025
0.000
-0.025
-0.050
Observation Order
Resi
dual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for off
Scatterplot of off-center vs hours shows that the relationship between the response and the factor is linear. The R2
adj supports this statement.
Assumptions:1. Normality: Normal probability plot and the histogram
support the assumption of normality.2. Homogeneity: The residuals vs fits plot shows that the
residuals are random. This supports the assumption of homogeneity.
3. Independence: the residuals vs order plot does not show obvious trends or patterns, this supports the assumption of independence.
(c) 2016 / Raul Soto 74
Coefficient of Regression
• R2 always increases as we add terms (variables) to the model… not good• R2
adj can be used instead:
• R2adj actually decreases if unnecessary terms are added to the model equation.
• If R2 and R2adj differ significantly, there is a good chance the model includes non-
significant terms.
(c) 2016 / Raul Soto 7575
Analysis of Variance (ANOVA)HOW TO COMPARE MULTIPLE DATA SETS
(c) 2016 / Raul Soto 76
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
ANOVA• ANalysis Of VAriance• Used to compare multiple groups and determine if at least one of them is different from
the others• Compares two types of variability: group to group vs within group
• How can I use this?– Compare output from multiple
manufacturing lines making the same product, determine if they are all equivalent
– Compare lots from multiple vendors– Compare 1st, 2nd, 3rd shifts
(c) 2016 / Raul Soto 77
• Equal variances– Assumes that “within group” variances are equal– Use Levene’s Test to test. If variances are unequal, use Welch’s ANOVA
• Normality (means)– Assumes that the means of the groups follow a normal distribution– Use a normal probability plot or a normality test
• Independence– Assumes that one data point does not influence the value of the next data point– Randomization of runs helps
(c) 2016 / Raul Soto 78
ANOVA - Assumptions
• Question of interest:– Is the mean product weight equivalent for Lines 1, 2, and 3?
(c) 2016 / Raul Soto 79
ANOVA - Example
3
2
1
0.120.110.100.090.08
P-Value 0.817
P-Value 0.991
Multiple Comparisons
Levene’s Test
line
Test for Equal Variances: weight vs lineMultiple comparison intervals for the standard deviation, α = 0.05
If intervals do not overlap, the corresponding stdevs are significantly different.
Levene’s testp-value of 0.991 => we have more than 95% confidence that the equal variances assumption is valid
With a p-value of <0.001 we are at least 95% confident that at least one of the lines makes product with weight that is notequivalent to the others
(c) 2016 / Raul Soto 80
321
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
line
ctIndividual Value Plot of ct vs lineIndividual Value Plot of weight vs line
wei
ght
Tukey’s comparisons shows that we are at least 95% confident that:
Lines 2 and 3 are equivalentThe mean weight for Line 1 is greater than for Lines 2 and 3
(c) 2016 / Raul Soto 81
Regression vs ANOVA?
Regression ANOVAResponse (dependent variable) is quantitative
Response (dependent variable) is quantitative
Factors (independent variables) are quantitative
Factors (independent variables) are categorical (qualitative)
Goal is to determine if response and factors are related.
If they are, then the goal is to determine how they are related
Goal is to determine if multiple factors give (on the average) different or similarvalues of the response.
(c) 2016 / Raul Soto 82
Design of ExperimentsHOW TO PROPERLY DESIGN EXPERIMENTS, COMBINEREGRESSION AND ANOVA, AND CREATE MATHEMATICALMODELS THAT CAN HELP YOU OPTIMIZE YOUR PROCESS
(c) 2016 / Raul Soto 83
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Design of Experiments (DoE)
• “Design of Experiments is a test, or series of tests, in which purposeful changes are made to input variables of a process or system so that the changes in the output variables can be observed and identified” – Douglas C. Montgomery
• We deliberately make changes to the process Inputs, and observe the results on the process Outputs
(c) 2016 / Raul Soto 84
VERY powerful process optimization tool
DoE & Response Surface Methods
(c) 2016 / Raul Soto 85
PROCESSINPUTSX’sFACTORS
OUTPUTSY’sRESPONSES
Y = a0 + a1X1 + a2X2 + a3X3 + a4X1X2 + …
DOE allows us to characterize our process response as a function of the process inputs
DoE
(c) 2016 / Raul Soto 86
Much more efficient that the typical one-factor-at-a-time (OFAAT) approach practiced almost everywhere! It provides more information about the process with less runs. It takes into account interactions between factors. The OFAAT does not address interactions
DoE allows us to : determine which inputs (factors) have significant impact in the process output (response) discover interactions between significant factors create a mathematical model of the response in terms of the significant factors Include attribute data in the factors use this model to optimize the process, by determining the appropriate settings of the factors required
to bring the response to a desired target
(c) 2016 / Raul Soto 87
Unbalanced OFAT Experimental designCoating DAY 0 DAY 3 DAY 7
Collagen 5mg 1 0.88 2.02
Collagen 10mg 1 0.85 2.34
Fibrin 5mg 1 0.90 2.07
Fibrin 10mg 1 0.94 3.03
No Coat + Cells 1 1.19 2.34
No Coat - Cells 1 1.12 0.99
Fibrin
Colla
gen
0
00
10
5
5 10
7
3
• Unbalanced design => no orthogonality
• No orthogonality => confounding of effects of factors
• Leads to – LOW p-values – LOW R2
adj fitness– poor mathematical model
Montgomery, Douglas C. Design and Analysis of Experiments. 4th. New York: Wiley, 1997. Print.
(c) 2016 / Raul Soto 88
Balanced Factorial Experimental design• Better balance• Orthogonal• Less confounding
– Better p-values – Better R2
adj fitness– Better mathematical model
• LESS experimental treatments (8 instead of 15)
Fibrin
Colla
gen
0
00
10
10
7
3
Factorial DesignA B C
1 + + +
2 + + -
3 + - +
4 + - -
5 - + +
6 - + -
7 - - +
8 - - -
(c) 2016 / Raul Soto 89
A
C
B
(+, +, +)
(-, -, -)
+ Max setting of a factor
- Minimum settings of a factor
(c) 2016 / Raul Soto 90
Balanced 3-factor models
Central Composite Design (CCD) Box-Behnken Design
Source:http://www.mathworks.com/help/toolbox/stats/f56635.html
Many Experimental Design models!
• Full factorial• Fractional Factorials• Central Composite Design (CCD)• Box-Behnken• Taguchi Methods• Latin Squares• Graeco-Latin Squares• Hyper-Graeco-Latin Squares• Optimal Designs (D, A, G, V – Optimals)
(c) 2016 / Raul Soto 91
Blocking / Covariates
• Often you will need to BLOCK a factor that is not controllable to add ROBUSTNESS
• Use for non-controllable factors that add variation to the process
• Typical factors blocked: raw materials variation from lot to lot, operators, shifts, machines.
• For example, to block the effect of raw materials lot – to – lot variation, use multiple lots in your validation, and know which lots are used in which tests
– You can determine statistically which % of your total variation is due to lot-lot variation (Variance Components a.k.a. Random Effects Model)
(c) 2016 / Raul Soto 92
Interactions• An interaction occurs when the effect of one input factor on the response depends on the
level of another input factor
• Correct modeling of interactions between factors is key to building accurate mathematical models with predictive power.
• Interaction terms show in your model equation as: Response Y = co + c1 A + c2 B + c3 AB(where A and B are the factors, cn are coefficients)
• DOE allows us to take into account interactions between factors. Simple linear regression does not.
(c) 2016 / Raul Soto 93
Interactions
(c) 2016 / Raul Soto 94
C C
ABinteraction
ACinteraction
BCinteraction
Masking / Blinding
• MASKING or BLINDING is when you don’t let testers know which treatments are being applied.
• If the experimenter does not know either, it’s a DOUBLE-BLIND test
• Used to eliminate human bias, preconceived biases.
(c) 2016 / Raul Soto 95
DoE Sequence• Screening Experiment
– Used to evaluate the effect of many factors in the desired response– Determine which factor(s) have a significant effect on the response
• Modeling / Optimization Experiment– Used to develop a mathematical model of the response, using the factors that were
found to be significant in the screening experiment– Determine which settings of the factors allow us to optimize the response– Some software packages, such as Minitab, allow us to optimize multiple responses
simultaneously
(c) 2016 / Raul Soto 96
DoE Example• Optimize adhesive dispensing process• Process factors :
– Pressure (P)
– Pump Voltage (V)
– A screening experiment was performed previously, determined that these two factors (out of 8 total) had significant impact on the response
• Process responses :
– Variable: Adhesive weight (W), target = 9 ± 1 mg
• DOE model selected: Central Composite Design
– 2 factors, 3 levels per factor (max, mid, min)
– CCD tests for nonlinear (quadratic) terms in mathematical model
(c) 2016 / Raul Soto 97
(c) 2016 / Raul Soto 98
Goodness-of-Fit : very goodR-Sq = 98.0% R-Sq(adj) = 97.6%
Terms with significant effect on response:VoltagePressureVoltage * Voltage quadratic termConstant
Terms with no significant effect on response:Voltage * Pressure interactionPressure * Pressure quadratic term
Mathematical Model for weight (uncoded units):
(c) 2016 / Raul Soto 99
= 2.9 + 2.8 + .03 + 0.3
Response Surface (Optimization):
The contour plot allows us to determine the settings of V and P necessary to dispense 9 ± 1 mg
This allows us to set process parameters based on actual scientific data, not guesswork.
(c) 2016 / Raul Soto 100
Confidence, Tolerance, and Prediction IntervalsHOW TO MAKE CLAIMS ABOUT YOUR DATA’SMEAN, OR ABOUT ALL UNITS
(c) 2016 / Raul Soto 101
RAUL SOTO, MSC, CQEIVT CONFERENCE - JUNE 2016
PHILADELPHIA PA
Intervals• Allow us to make a statistical-based claims or statement with a specified degree of
confidence
• Can be used to Accept / Reject lots for quantitative product characteristics
• Easier to understand instinctively by non-statisticians than p-values or mean / stdev
o Weight: ̅ = 8.5 mg, s = 0.02 o Weight: [8.44 – 8.56 mg]
• Different types of intervals allow you to make different claims
(c) 2016 / Raul Soto 102
Types of Intervals• Confidence Interval
– Make a claim about the mean, the variance, or a proportion of a sample• “We are 95% confident that the mean capsule body length falls within this interval: 0.874 ± 0.018 in”• “We are 99% confident that the mean capsule external diameter falls within this interval: 7.72 – 8.64
mm”
• Prediction Interval– Range that contains response value of a single observation, given specified settings of predictors
• Used often in Regression• When you use your regression model to predict the value of your response for a specific input value,
a prediction interval gives you a range around that value based on a given % confidence
(c) 2016 / Raul Soto 103
Types of Intervals• Tolerance Interval
– Range that contains a specified proportion % of the population – “There is 95% confidence that 99% of the population meets the following specification: 2.000 ± .016
mm”• This example is called a 95%/99% tolerance interval• Can be used as basis for product release if interval is within specifications.• You can run your validation lots, compute a tolerance interval, and compare this interval against
specifications. If the interval is within the specification, you can make the claim.
• Which one should I use?http://www.qualitydigest.com/inside/quality-insider-column/when-should-i-use-confidence-intervals-prediction-intervals.html
(c) 2016 / Raul Soto 104
Confidence Interval• Example:
• Mechanical attachment process, input factor is Pressure (psi), response is pull strength (lb)
• You want to know what is the mean pull strength, with 95% confidence
• Confidence Interval for the mean (Minitab):
One-Sample T: y Variable N Mean StDev SE Mean 95% CIy 12 2.792 0.421 0.122 (2.524, 3.059)
• Claim: We are 95% confident that the true mean pull strength is between 2.524 and 3.059 lb.
(c) 2016 / Raul Soto 105
Confidence Interval for the Mean
(c) 2016 / Raul Soto 106
Prediction Interval for a Forecasted Value• Example:• We perform a regression test to model how pull strength changes as a function of pressure• We want to know what the pull strength will be at a pressure setting of 75 psi, with 95% confidence• Using Minitab:
(c) 2016 / Raul Soto 107
• Claim:At a pressure setting of 75 psi, we are 95% confident that the pull strength will be between 1.97 and 2.98 lbs
Prediction Interval for a Forecasted Value
(c) 2016 / Raul Soto 108
Tolerance Interval (95/95)
• Claim: We are 95% confident that 95% of the population will have a pull strength between 1.455 and 4.128 lb.
(c) 2016 / Raul Soto 109
(Minitab)
4.23.63.02.41.8
Nonparametric
Normal
4.54.03.53.02.52.01.5
4.03.53.02.52.0
99
90
50
10
1
Perc
ent P-Value 0.950
N 12Mean 2.792StDev 0.421
Lower 1.455Upper 4.128
Lower 2.100Upper 3.500
11.8%
AD 0.148
Statistics
Normal
Nonparametric
Achieved Confidence
Normality TestNormal Probability Plot
Tolerance Interval Plot for y95% Tolerance Interval
At Least 95% of Population Covered
Tolerance Interval (95/99)
• Claim: We are 95% confident that 99% of the population will have a pull strength between 1.042 and 4.541 lb.
(c) 2016 / Raul Soto 110
(Minitab)
4.84.23.63.02.41.81.2
Nonparametric
Normal
54321
4.03.53.02.52.0
99
90
50
10
1
Perc
ent P-Value 0.950
N 12Mean 2.792StDev 0.421
Lower 1.042Upper 4.541
Lower 2.100Upper 3.500
0.6%
AD 0.148
Statistics
Normal
Nonparametric
Achieved Confidence
Normality TestNormal Probability Plot
Tolerance Interval Plot for y95% Tolerance Interval
At Least 99% of Population Covered
Other important statistical tools / techniques• Variance Components (a.k.a. Random
Effects Model)– Determine the sources of variation in a process,
and their relative contributions to the total variation
– Example: Gage R&R / Measurement Systems Capability Analysis
• Statistical Process Control (SPC)– Monitor a process to detect if it moves out of
statistical control
• Acceptance Sampling– Accept or Reject a lot of product based on a
sample
• Logistic Regression– Regression model where the dependent variable
is categorical, not quantitative
• Stability Analysis– Calculate product expiration dates– Calculate the probability that a given product
will survive to a predetermined expiration date
• Fisher’s Exact Test– Compare two samples of binomial data
• Mc Nemar’s Test– Compare two paired samples of binomial data
(c) 2016 / Raul Soto 111
Questions
(c) 2016 / Raul Soto 112
Extra Slides
(c) 2016 / Raul Soto 113
Descriptive vs Inferential Statistics• Descriptive statistics provide information about a population, or a sample
– Characterize and summarize the most prominent features of a data set– Measures of central tendency: Mean, Mode, Median
– Measures of variability: Variance, Standard Deviation, Range
• Inferential statistics use information obtained from a sample to make inferences about the population that sample came from– Use the sample mean and standard deviation to estimate the population mean and standard
deviation– Draw conclusions, make statements
(c) 2016 / Raul Soto 114
What is “Variability”THE BASICS
(c) 2016 / Raul Soto 115
RAUL SOTO, MSC, CQEIVT CONFERENCE - MARCH 2016
SAN DIEGO, CA
Da t a
Freq
uenc
y
3 2 .43 1 .230 .02 8 .82 7 .6
9
8
7
6
5
4
3
2
1
0
M ean
30.28 1.092 3029.79 1.031 30
S tD ev N29.83 0.8977 3030.08 1.021 30
V ar iab leD ay 1D ay 2D ay 3D ay 4
C o m m o n C a us e V a r ia tio nP ro ce s s o u tpu t s ta b le , p r e d ic ta b le o ve r tim e
(c) 2016 / Raul Soto 116
Variability: Common vs Special Cause• Common cause variability (“natural” variability, general)
– Inherent in the process
– Caused by multiple small factors that act randomly
– Not controllable by operators
– Examples:
• Natural variations in raw materials
• Natural variations in ambient conditions such as temperature and humidity
– Confusing common cause with special cause => Over-adjustment
(c) 2016 / Raul Soto 117
(c) 2016 / Raul Soto 118
30
01
02
03
04
42 72 03 33 63 9
29.75 0.9993 6030.07 1.015 6030.19 1.022 6030.67 3.199 60
Mean StDev N
D
ycneuqerF
ata
DelbairaV
4 yaD3 yaD2 yaD1 ya
NnoitairaV esuaC laicepS
lamro
(c) 2016 / Raul Soto 119
393633302724
40
30
20
10
0
393633302724
40
30
20
10
0
Mean 29.75StDev 0.9993N 60
Day 1
Mean 30.07StDev 1.015N 60
Day 2
Mean 30.19StDev 1.022N 60
Day 3
Mean 30.67StDev 3.199N 60
Day 4
Day 1
Freq
uenc
y
Day 2
Day 3 Day 4
Normal Special Cause Variation
Variability: Common vs Special Cause• Special cause variability (“assignable” causes)
– Unusual events that, if detected, can be removed or adjusted
– Caused by one or more specific factors
– Examples:
• Tool wear
• Major changes in raw materials
• Equipment failure
– Confusing special cause with common cause => Under-adjustment
(c) 2016 / Raul Soto 120
Box Plots : Elements– Asterisk : Outlier - an unusually large or small
observation. Values beyond the whiskers are outliers.
– Top of the box : third quartile (Q3) - 75% of the data values are less than or equal to this value
– Upper whisker : the highest data value within the upper limit.
• Upper limit = Q3 + 1.5 (Q3 - Q1)
– Line in the middle of the box : Median, the middle of the data. Half the observations are less than or equal to it.
– Bottom of the box is the first quartile (Q1) - 25% of the data values are less than or equal to this value
– Lower whisker : the lowest value within the lower limit.
• Lower limit = Q1- 1.5 (Q3 - Q1)
(c) 2016 / Raul Soto 121