Upload
arifmukhtar
View
215
Download
0
Embed Size (px)
Citation preview
7/29/2019 six sigmaaa
1/45
Measure PhaseSix Sigma StatisticsMeasure PhaseSix Sigma Statistics
7/29/2019 six sigmaaa
2/45
2
Six Sigma Statistics
Descriptive Statistics
Normal Distribution
Assessing Normality
Graphing Techniques
Basic Statistics
Special Cause / Common Cause
Wrap Up & Action Items
Process Capability
Measurement System Analysis
Six Sigma Statistics
Process Discovery
Welcome to Measure
7/29/2019 six sigmaaa
3/45
3
Purpose of Basic Statistics
The purpose of Basic Statistics is to: Provide a numerical summary of the data being analyzed.
Data (n)
Factual information organized for analysis.
Numerical or other information represented in a form suitable for processing bycomputer
Values from scientific experiments.
Provide the basis for making inferences about the future. Provide the foundation for assessing process capability.
Provide a common language to be used throughout an organization todescribe processes.
R ela x.i t w on t
be t ha t bad !
7/29/2019 six sigmaaa
4/45
4
Statistical Notation Cheat Sheet
An individual value, an observation
A particular (1st) individual value
For each, all, individual values
The Mean, average of sample data
The grand Mean, grand average
The Mean of population data
A proportion of sample data
A proportion of population data
Sample size
Population size
Summation
The Standard Deviation of sample data
The Standard Deviation of population data
The variance of sample data
The variance of population data
The range of data
The average range of data
Multi-purpose notation, i.e. # of subgroups, #of classes
The absolute value of some term
Greater than, less than
Greater than or equal to, less than or equal to
7/29/2019 six sigmaaa
5/45
5
Parameters vs. Statistics
Population Parameters: Arithmetic descriptions of a population
, , P, 2, N
Population
Sample
Sample
Sample
Sample Statistics: Arithmetic descriptions of a
sample
X-bar , s, p, s2, n
Population: All the items that have the property of interest under study.
Frame: An identifiable subset of the population.
Sample: A significantly smaller subset of the population used to make an inference.
7/29/2019 six sigmaaa
6/45
6
Types of Data
Attribute Data (Qualitative)
Is always binary, there are only two possible values (0, 1) Yes, No
Go, No go
Pass/Fail
Variable Data (Quantitative)
Discrete (Count) Data
Can be categorized in a classification and is based on counts. Number of defects
Number of defective units
Number of customer returns
Continuous Data
Can be measured on a continuum, it has decimal subdivisions that aremeaningful
Time, Pressure, Conveyor Speed, Material feed rate Money
Pressure
Conveyor Speed
Material feed rate
7/29/2019 six sigmaaa
7/45
7
Discrete Variables
Discrete Variable Possible values for the variable
The number of defective needles in boxes of 100diabetic syringes
0,1,2, , 100
The number of individuals in groups of 30 with aType A personality
0,1,2, , 30
The number of surveys returned out of 300mailed in a customer satisfaction study.
0,1,2, 300
The number of employees in 100 having finishedhigh school or obtained a GED
0,1,2, 100
The number of times you need to flip a coinbefore a head appears for the first time
1,2,3,
(note, there is no upper limit because you mightneed to flip forever before the first head appears.
7/29/2019 six sigmaaa
8/45
8
Continuous Variables
Continuous Variable Possible Values for the Variable
The length of prison time served for individualsconvicted of first degree murder
All the real numbers between aand b, where aisthe smallest amount of time served and bis the
largest.
The household income for households withincomes less than or equal to $30,000
All the real numbers between aand $30,000,where ais the smallest household income in the
population
The blood glucose reading for those individualshaving glucose readings equal to or greater than
200
All real numbers between 200 and b, where bisthe largest glucose reading in all such individuals
7/29/2019 six sigmaaa
9/45
9
Definitions of Scaled Data
Understanding the nature of data and how to represent it can affect the
types of statistical tests possible.
Nominal Scale data consists of names, labels, or categories. Cannotbe arranged in an ordering scheme. No arithmetic operations areperformed for nominal data.
Ordinal Scale data is arranged in some order, but differences betweendata values either cannot be determined or are meaningless.
Interval Scale data can be arranged in some order and for whichdifferences in data values are meaningful. The data can be arranged inan ordering scheme and differences can be interpreted.
Ratio Scale data that can be ranked and for which all arithmeticoperations including division can be performed. (division by zero is ofcourse excluded) Ratio level data has an absolute zero and a value ofzero indicates a complete absence of the characteristic of interest.
7/29/2019 six sigmaaa
10/45
10
Nominal Scale
Qualitative Variable Possible nominal level data values forthe variable
Blood Types A, B, AB, O
State of Residence Alabama, , Wyoming
Country of Birth United States, China, other
T i m e t o w ei g h i n !
7/29/2019 six sigmaaa
11/45
11
Ordinal Scale
Qualitative Variable Possible Ordinal level datavalues
Automobile Sizes Subcompact, compact,intermediate, full size, luxury
Product rating Poor, good, excellent
Baseball team classification Class A, Class AA, Class AAA,Major League
7/29/2019 six sigmaaa
12/45
12
Interval Scale
Interval Variable Possible Scores
IQ scores of students inBlackBelt Training
100
(the difference between scoresis measurable and hasmeaning but a difference of 20points between 100 and 120does not indicate that onestudent is 1.2 times more
intelligent )
7/29/2019 six sigmaaa
13/45
13
Ratio Scale
Ratio Variable Possible Scores
Grams of fat consumed per adult in theUnited States 0 (If person A consumes 25 grams of fat andperson B consumes 50 grams, we can saythat person B consumes twice as much fatas person A. If a person C consumes zerograms of fat per day, we can say there is acomplete absence of fat consumed on that
day. Note that a ratio is interpretable andan absolute zero exists.)
7/29/2019 six sigmaaa
14/45
14
Converting Attribute Data to Continuous Data
Continuous Data is always more desirable
In many cases Attribute Data can be converted toContinuous
Which is more useful?
15 scratches or Total scratch length of 9.25
22 foreign materials or 2.5 fm/square inch
200 defects or 25 defects/hour
7/29/2019 six sigmaaa
15/45
15
Descriptive Statistics
Measures of Location (central tendency)
Mean Median
Mode
Measures of Variation (dispersion)
Range Interquartile Range
Standard deviation
Variance
7/29/2019 six sigmaaa
16/45
16
Descriptive Statistics
Open the MINITAB Project Measure Data Sets.mpj and
select the worksheet basicstatistics.mtw
7/29/2019 six sigmaaa
17/45
17
Measures of Location
Mean is: Commonly referred to as the average.
The arithmetic balance point of a distribution of data.
PopulationSample
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum Q1Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.99005.0000 5.0100
Variable MaximumData 5.0200
Stat>Basic Statistics>Display Descriptive Statistics>Graphs>Histogram of data, with normal curve
Data
Frequency
5.025.015.004.994.984.97
80
70
60
50
40
30
20
10
0
Mean 5. 000
StDev 0.01007
N 200
Histogram (w ith Normal Curve) of Data
7/29/2019 six sigmaaa
18/45
18
Measures of Location
Median is:
The mid-point, or 50th percentile, of a distribution of data. Arrange the data from low to high, or high to low.
It is the single middle value in the ordered list if there is an oddnumber of observations
It is the average of the two middle values in the ordered list if thereare an even number of observations
Data
Fre
quency
5.025.015.004.994.984.97
80
70
60
50
40
30
20
10
0
Mean 5.000
StDev 0.01007
N 200
Histogram (w ith Normal Curve) of Data
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
Variable MaximumData 5.0200
7/29/2019 six sigmaaa
19/45
19
Measures of Location
Trimmed Mean is a:
Compromise between the Mean and Median. The Trimmed Mean is calculated by eliminating a specified percentage
of the smallest and largest observations from the data set and thencalculating the average of the remaining observations
Useful for data with potential extreme values.
Stat>Basic Statistics>Display Descriptive Statistics>Statistics> Trimmed Mean
Descriptive Statistics: Data
Variable N N* Mean SE Mean TrMean StDev Minimum Q1 MedianData 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000
Variable Q3 MaximumData 5.0100 5.0200
7/29/2019 six sigmaaa
20/45
20
Measures of Location
Mode is:The most frequently occurring value in a distribution of data.
Data
Frequency
5.025.015.004.994.984.97
80
70
60
50
40
30
20
10
0
Mean 5.000
StDev 0.01007N 200
Histogram (with Normal Curve) of Data
Mode = 5
7/29/2019 six sigmaaa
21/45
21
Measures of Variation
Range is the:Difference between the largest observation and the smallest
observation in the data set.
A small range would indicate a small amount of variability and a largerange a large amount of variability.
Interquartile Range is the:Difference between the 75th percentile and the 25th percentile.
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
Variable MaximumData 5.0200
Use Range or Interquartile Range when the data distribution is Skewed.
7/29/2019 six sigmaaa
22/45
22
Measures of Variation
Standard Deviation is:
Equivalent of the average deviation of values from the Mean for adistribution of data.
A unit of measure for distances from the Mean.
Use when data are symmetrical.
PopulationSample
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
Variable MaximumData 5.0200
Cannot calculate population Standard Deviation because this is sample data.
7/29/2019 six sigmaaa
23/45
23
Measures of Variation
Variance is the:Average squared deviation of each individual data point from theMean.
Sample Population
7/29/2019 six sigmaaa
24/45
24
Normal Distribution
The Normal Distributionis the most recognized distribution in
statistics.
What are the characteristics of a Normal Distribution?
Only random error is present
Process free of assignable cause
Process free of drifts and shifts
So what is present when the data is Non-normal?
7/29/2019 six sigmaaa
25/45
25
The Normal Curve
The normal curve is a smooth, symmetrical, bell-shaped
curve, generated by the density function.
It is the most useful continuous probability model as
many naturally occurring measurements such asheights, weights, etc. are approximately NormallyDistributed.
7/29/2019 six sigmaaa
26/45
26
Normal Distribution
Each combination of Mean and Standard Deviation generates a
unique Normal curve:
Standard Normal Distribution
Has a = 0, and = 1
Data from any Normal Distribution can be made tofit the standard Normal by converting raw scores
to standard scores.
Z-scores measure how many Standard Deviations from themean a particular data-value lies.
7/29/2019 six sigmaaa
27/45
27
Normal Distribution
The area under the curve between any 2 points represents the
proportion of the distribution between those points.
Convert any raw score to a Z-score using the formula:
Refer to a set of Standard Normal Tables to find theproportion between and x.
x
The area between theMean and any otherpoint depends upon theStandard Deviation.
7/29/2019 six sigmaaa
28/45
28
The Empirical Rule
The Empirical Rule
+6-1-3-4-5-6 -2 +4+3+2+1 +5
68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations
99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations99.9999998 % of the data will fall within +/- 6 standard deviations
7/29/2019 six sigmaaa
29/45
29
The Empirical Rule (cont.)
No matter what the shape of your distribution is, as you travel 3 StandardDeviations from the Mean, the probability of occurrence beyond that pointbegins to converge to a very low number.
7/29/2019 six sigmaaa
30/45
30
Why Assess Normality?
While many processes in nature behave according to the normal
distribution, many processes in business, particularly in the areas ofservice and transactions, do not
There are many types of distributions:
There are many statistical tools that assume Normal Distributionproperties in their calculations.
So understanding just how Normal the data are will impact how welook at the data.
7/29/2019 six sigmaaa
31/45
31
Tools for Assessing Normality
The shape of any Normal curve can be calculated based on
the Normal Probability density function.
Tests for Normality basically compare the shape of thecalculated curve to the actual distribution of your data points.
For the purposes of this training, we will focus on 2 ways inMINITAB to assess Normality:
The Anderson-Darling test
Normal probability test
W a tch th a t cu rv e!
7/29/2019 six sigmaaa
32/45
32
Goodness-of-Fit
The Anderson-Darling test uses an empirical density function.
Cumulative
Percent
0
20
40
60
80
100
3.0 3.5 4.0 4.5 5.0 5.5
Raw Data Scale
Expected for Normal DistributionActual Data
20%20%
20%20%
Departure of theactual data from theexpected Normal
Distribution. TheAnderson-DarlingGoodness-of-Fit testassesses themagnitude of thesedepartures using an
Observed minusExpected formula.
7/29/2019 six sigmaaa
33/45
33
The Normal Probability Plot
Amount
Percent
11010090807060
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Mean
0.684
84.69
StDev 7.913
N 70
AD 0.265
P-Value
Probability Plot of AmountNormal
The Anderson-Darling test is a good litmustest for normality: if the P-value is more
than .05, your data are normal enough formost purposes.
7/29/2019 six sigmaaa
34/45
34
Descriptive Statistics
The Anderson-Darling test also appears in this output. Again,
if the P-value is greater than .05, assume the data are Normal.
The reasoning behind the
decision to assumeNormality based on the P-value will be covered inthe Analyze Phase. Fornow, just accept this as ageneral guideline.
7/29/2019 six sigmaaa
35/45
35
Anderson-Darling Caveat
Use the Anderson Darling column to generate these graphs.
In this case, both the Histogram and the Normality Plot look very normal. However,because the sample size is so large, the Anderson-Darling test is very sensitive and anyslight deviation from Normal will cause the P-value to be very low. Again, the topic ofsensitivity will be covered in greater detail in the Analyze Phase.
For now, just assume that if N > 100 and the data lookNormal, then they probably are.
Anderson Darling
Percent
65605550454035
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Mean 50.03
StDev 4.951
N 500
AD 0.177
P-Value 0.921
Probability Plot of Anderson DarlingNormal
60565248444036
Median
Mean
50.5050.2550.0049.7549.50
1st Quartile 46.800
M edian 50. 006
3rd Quarti le 53.218
Maximum 62. 823
49. 596 50. 466
49. 663 50. 500
4.662 5.278
A-S quared 0.18
P -V alue 0.921
Mean 50.031
StDev 4.951
Variance 24.511
Skewness -0.061788
Kurtosis -0.180064
N 500
Min imum 35. 727
A nderson-Darling Normality Test
95% C onfidence Interval for Mean
95% C onfidence Interval for Median
95% C onfidence Interval for StDev
95 % Conf idence Intervals
Summary f or Anderson Darling
7/29/2019 six sigmaaa
36/45
36
If the Data Are Not Normal, Dont Panic!
Normal Data are not common in the transactional world.
There are lots of meaningful statistical tools you can use toanalyze your data (more on that later).
It just means you may have to think about your data in aslightly different way.
D on t t ou ch tha t bu t t on !
7/29/2019 six sigmaaa
37/45
37
Normality Exercise
Exercise objective: To demonstrate how to test
for Normality.
1. Generate Normal Probability Plots and thegraphical summary using the DescriptiveStatistics.MTW file.
2. Use only the columns Dist A and Dist D.
3. Answer the following quiz questions based onyour analysis of this data set.
7/29/2019 six sigmaaa
38/45
38
Isolating Special Causes from Common Causes
Special Cause:Variation is caused by known factors that result in
a non-random distribution of output. Also referred to as AssignableCause.
Common Cause:Variation caused by unknown factors resulting ina steady but random distribution of output around the average of thedata. It is the variation left over after Special Cause variation hasbeen removed and typically (not always) follows a NormalDistribution.
If we know that the basic structure of the data should follow aNormal Distribution, but plots from our data shows otherwise; we
know the data contain Special Causes.
Special Causes = Opportunity
7/29/2019 six sigmaaa
39/45
39
Introduction to Graphing
The purpose of Graphing is to:
Identify potential relationships between variables. Identify risk in meeting the critical needs of the Customer,
Business and People.
Provide insight into the nature of the Xs which may or maynot control Y.
Show the results of passive data collection.
In this section we will cover1. Box Plots
2. Scatter Plots
3. Dot Plots
4. Time Series Plots
5. Histograms
7/29/2019 six sigmaaa
40/45
40
Data Sources
Data sources are suggested by many of the tools that have
been covered so far: Process Map
X-Y Matrix
Fishbone Diagrams
FMEA
Examples are:
1. TimeShiftDay of the weekWeek of the monthSeason of the year
2. Location/positionFacilityRegionOffice
3. OperatorTrainingExperienceSkillAdherence to procedures
4. Any other sources?
7/29/2019 six sigmaaa
41/45
41
Graphical Concepts
The characteristics of a good graph include:
Variety of data Selection of
Variables
Graph
Range
Information to interpret relationships
Explore quantitative relationships
7/29/2019 six sigmaaa
42/45
42
The Histogram
A Histogram displays data that have been summarized into
intervals. It can be used to assess the symmetry or Skewness ofthe data.
To construct a Histogram, the horizontal axis is divided into equalintervals and a vertical bar is drawn at each interval to represent itsfrequency (the number of values that fall within the interval).
Histogram
Frequency
1031021011009998
40
30
20
10
0
Histogram of Histogram
7/29/2019 six sigmaaa
43/45
43
Histogram Caveat
All the Histograms below were generated using random samples of
the data from the worksheet Graphing Data.mtw.
Be careful not to determine Normality simply from a Histogram plot,if the sample size is low the data may not look very Normal.
Frequency
4
3
2
1
0
1021011009998
4
3
2
1
0
1021011009998
8
6
4
2
0
8
6
4
2
0
H1_20 H2_20
H3_20 H4_20
Histogram of H1_2 0, H2_ 20 , H3_2 0, H4_ 20
7/29/2019 six sigmaaa
44/45
44
Variation on a Histogram
Using the worksheet Graphing Data.mtw create a simple
Histogram for the data column called granular.
Granular
Frequency
56545250484644
25
20
15
10
5
0
Histogram of Granular
7/29/2019 six sigmaaa
45/45
45
Dot Plot
The Dot Plot can be a useful alternative to the Histogram
especially if you want to see individual values or you want to brushthe data.
Granular
56545250484644
Dotplot of Granular