six sigmaaa

Embed Size (px)

Citation preview

  • 7/29/2019 six sigmaaa

    1/45

    Measure PhaseSix Sigma StatisticsMeasure PhaseSix Sigma Statistics

  • 7/29/2019 six sigmaaa

    2/45

    2

    Six Sigma Statistics

    Descriptive Statistics

    Normal Distribution

    Assessing Normality

    Graphing Techniques

    Basic Statistics

    Special Cause / Common Cause

    Wrap Up & Action Items

    Process Capability

    Measurement System Analysis

    Six Sigma Statistics

    Process Discovery

    Welcome to Measure

  • 7/29/2019 six sigmaaa

    3/45

    3

    Purpose of Basic Statistics

    The purpose of Basic Statistics is to: Provide a numerical summary of the data being analyzed.

    Data (n)

    Factual information organized for analysis.

    Numerical or other information represented in a form suitable for processing bycomputer

    Values from scientific experiments.

    Provide the basis for making inferences about the future. Provide the foundation for assessing process capability.

    Provide a common language to be used throughout an organization todescribe processes.

    R ela x.i t w on t

    be t ha t bad !

  • 7/29/2019 six sigmaaa

    4/45

    4

    Statistical Notation Cheat Sheet

    An individual value, an observation

    A particular (1st) individual value

    For each, all, individual values

    The Mean, average of sample data

    The grand Mean, grand average

    The Mean of population data

    A proportion of sample data

    A proportion of population data

    Sample size

    Population size

    Summation

    The Standard Deviation of sample data

    The Standard Deviation of population data

    The variance of sample data

    The variance of population data

    The range of data

    The average range of data

    Multi-purpose notation, i.e. # of subgroups, #of classes

    The absolute value of some term

    Greater than, less than

    Greater than or equal to, less than or equal to

  • 7/29/2019 six sigmaaa

    5/45

    5

    Parameters vs. Statistics

    Population Parameters: Arithmetic descriptions of a population

    , , P, 2, N

    Population

    Sample

    Sample

    Sample

    Sample Statistics: Arithmetic descriptions of a

    sample

    X-bar , s, p, s2, n

    Population: All the items that have the property of interest under study.

    Frame: An identifiable subset of the population.

    Sample: A significantly smaller subset of the population used to make an inference.

  • 7/29/2019 six sigmaaa

    6/45

    6

    Types of Data

    Attribute Data (Qualitative)

    Is always binary, there are only two possible values (0, 1) Yes, No

    Go, No go

    Pass/Fail

    Variable Data (Quantitative)

    Discrete (Count) Data

    Can be categorized in a classification and is based on counts. Number of defects

    Number of defective units

    Number of customer returns

    Continuous Data

    Can be measured on a continuum, it has decimal subdivisions that aremeaningful

    Time, Pressure, Conveyor Speed, Material feed rate Money

    Pressure

    Conveyor Speed

    Material feed rate

  • 7/29/2019 six sigmaaa

    7/45

    7

    Discrete Variables

    Discrete Variable Possible values for the variable

    The number of defective needles in boxes of 100diabetic syringes

    0,1,2, , 100

    The number of individuals in groups of 30 with aType A personality

    0,1,2, , 30

    The number of surveys returned out of 300mailed in a customer satisfaction study.

    0,1,2, 300

    The number of employees in 100 having finishedhigh school or obtained a GED

    0,1,2, 100

    The number of times you need to flip a coinbefore a head appears for the first time

    1,2,3,

    (note, there is no upper limit because you mightneed to flip forever before the first head appears.

  • 7/29/2019 six sigmaaa

    8/45

    8

    Continuous Variables

    Continuous Variable Possible Values for the Variable

    The length of prison time served for individualsconvicted of first degree murder

    All the real numbers between aand b, where aisthe smallest amount of time served and bis the

    largest.

    The household income for households withincomes less than or equal to $30,000

    All the real numbers between aand $30,000,where ais the smallest household income in the

    population

    The blood glucose reading for those individualshaving glucose readings equal to or greater than

    200

    All real numbers between 200 and b, where bisthe largest glucose reading in all such individuals

  • 7/29/2019 six sigmaaa

    9/45

    9

    Definitions of Scaled Data

    Understanding the nature of data and how to represent it can affect the

    types of statistical tests possible.

    Nominal Scale data consists of names, labels, or categories. Cannotbe arranged in an ordering scheme. No arithmetic operations areperformed for nominal data.

    Ordinal Scale data is arranged in some order, but differences betweendata values either cannot be determined or are meaningless.

    Interval Scale data can be arranged in some order and for whichdifferences in data values are meaningful. The data can be arranged inan ordering scheme and differences can be interpreted.

    Ratio Scale data that can be ranked and for which all arithmeticoperations including division can be performed. (division by zero is ofcourse excluded) Ratio level data has an absolute zero and a value ofzero indicates a complete absence of the characteristic of interest.

  • 7/29/2019 six sigmaaa

    10/45

    10

    Nominal Scale

    Qualitative Variable Possible nominal level data values forthe variable

    Blood Types A, B, AB, O

    State of Residence Alabama, , Wyoming

    Country of Birth United States, China, other

    T i m e t o w ei g h i n !

  • 7/29/2019 six sigmaaa

    11/45

    11

    Ordinal Scale

    Qualitative Variable Possible Ordinal level datavalues

    Automobile Sizes Subcompact, compact,intermediate, full size, luxury

    Product rating Poor, good, excellent

    Baseball team classification Class A, Class AA, Class AAA,Major League

  • 7/29/2019 six sigmaaa

    12/45

    12

    Interval Scale

    Interval Variable Possible Scores

    IQ scores of students inBlackBelt Training

    100

    (the difference between scoresis measurable and hasmeaning but a difference of 20points between 100 and 120does not indicate that onestudent is 1.2 times more

    intelligent )

  • 7/29/2019 six sigmaaa

    13/45

    13

    Ratio Scale

    Ratio Variable Possible Scores

    Grams of fat consumed per adult in theUnited States 0 (If person A consumes 25 grams of fat andperson B consumes 50 grams, we can saythat person B consumes twice as much fatas person A. If a person C consumes zerograms of fat per day, we can say there is acomplete absence of fat consumed on that

    day. Note that a ratio is interpretable andan absolute zero exists.)

  • 7/29/2019 six sigmaaa

    14/45

    14

    Converting Attribute Data to Continuous Data

    Continuous Data is always more desirable

    In many cases Attribute Data can be converted toContinuous

    Which is more useful?

    15 scratches or Total scratch length of 9.25

    22 foreign materials or 2.5 fm/square inch

    200 defects or 25 defects/hour

  • 7/29/2019 six sigmaaa

    15/45

    15

    Descriptive Statistics

    Measures of Location (central tendency)

    Mean Median

    Mode

    Measures of Variation (dispersion)

    Range Interquartile Range

    Standard deviation

    Variance

  • 7/29/2019 six sigmaaa

    16/45

    16

    Descriptive Statistics

    Open the MINITAB Project Measure Data Sets.mpj and

    select the worksheet basicstatistics.mtw

  • 7/29/2019 six sigmaaa

    17/45

    17

    Measures of Location

    Mean is: Commonly referred to as the average.

    The arithmetic balance point of a distribution of data.

    PopulationSample

    Descriptive Statistics: Data

    Variable N N* Mean SE Mean StDev Minimum Q1Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.99005.0000 5.0100

    Variable MaximumData 5.0200

    Stat>Basic Statistics>Display Descriptive Statistics>Graphs>Histogram of data, with normal curve

    Data

    Frequency

    5.025.015.004.994.984.97

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Mean 5. 000

    StDev 0.01007

    N 200

    Histogram (w ith Normal Curve) of Data

  • 7/29/2019 six sigmaaa

    18/45

    18

    Measures of Location

    Median is:

    The mid-point, or 50th percentile, of a distribution of data. Arrange the data from low to high, or high to low.

    It is the single middle value in the ordered list if there is an oddnumber of observations

    It is the average of the two middle values in the ordered list if thereare an even number of observations

    Data

    Fre

    quency

    5.025.015.004.994.984.97

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Mean 5.000

    StDev 0.01007

    N 200

    Histogram (w ith Normal Curve) of Data

    Descriptive Statistics: Data

    Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

    Variable MaximumData 5.0200

  • 7/29/2019 six sigmaaa

    19/45

    19

    Measures of Location

    Trimmed Mean is a:

    Compromise between the Mean and Median. The Trimmed Mean is calculated by eliminating a specified percentage

    of the smallest and largest observations from the data set and thencalculating the average of the remaining observations

    Useful for data with potential extreme values.

    Stat>Basic Statistics>Display Descriptive Statistics>Statistics> Trimmed Mean

    Descriptive Statistics: Data

    Variable N N* Mean SE Mean TrMean StDev Minimum Q1 MedianData 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000

    Variable Q3 MaximumData 5.0100 5.0200

  • 7/29/2019 six sigmaaa

    20/45

    20

    Measures of Location

    Mode is:The most frequently occurring value in a distribution of data.

    Data

    Frequency

    5.025.015.004.994.984.97

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Mean 5.000

    StDev 0.01007N 200

    Histogram (with Normal Curve) of Data

    Mode = 5

  • 7/29/2019 six sigmaaa

    21/45

    21

    Measures of Variation

    Range is the:Difference between the largest observation and the smallest

    observation in the data set.

    A small range would indicate a small amount of variability and a largerange a large amount of variability.

    Interquartile Range is the:Difference between the 75th percentile and the 25th percentile.

    Descriptive Statistics: Data

    Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

    Variable MaximumData 5.0200

    Use Range or Interquartile Range when the data distribution is Skewed.

  • 7/29/2019 six sigmaaa

    22/45

    22

    Measures of Variation

    Standard Deviation is:

    Equivalent of the average deviation of values from the Mean for adistribution of data.

    A unit of measure for distances from the Mean.

    Use when data are symmetrical.

    PopulationSample

    Descriptive Statistics: Data

    Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

    Variable MaximumData 5.0200

    Cannot calculate population Standard Deviation because this is sample data.

  • 7/29/2019 six sigmaaa

    23/45

    23

    Measures of Variation

    Variance is the:Average squared deviation of each individual data point from theMean.

    Sample Population

  • 7/29/2019 six sigmaaa

    24/45

    24

    Normal Distribution

    The Normal Distributionis the most recognized distribution in

    statistics.

    What are the characteristics of a Normal Distribution?

    Only random error is present

    Process free of assignable cause

    Process free of drifts and shifts

    So what is present when the data is Non-normal?

  • 7/29/2019 six sigmaaa

    25/45

    25

    The Normal Curve

    The normal curve is a smooth, symmetrical, bell-shaped

    curve, generated by the density function.

    It is the most useful continuous probability model as

    many naturally occurring measurements such asheights, weights, etc. are approximately NormallyDistributed.

  • 7/29/2019 six sigmaaa

    26/45

    26

    Normal Distribution

    Each combination of Mean and Standard Deviation generates a

    unique Normal curve:

    Standard Normal Distribution

    Has a = 0, and = 1

    Data from any Normal Distribution can be made tofit the standard Normal by converting raw scores

    to standard scores.

    Z-scores measure how many Standard Deviations from themean a particular data-value lies.

  • 7/29/2019 six sigmaaa

    27/45

    27

    Normal Distribution

    The area under the curve between any 2 points represents the

    proportion of the distribution between those points.

    Convert any raw score to a Z-score using the formula:

    Refer to a set of Standard Normal Tables to find theproportion between and x.

    x

    The area between theMean and any otherpoint depends upon theStandard Deviation.

  • 7/29/2019 six sigmaaa

    28/45

    28

    The Empirical Rule

    The Empirical Rule

    +6-1-3-4-5-6 -2 +4+3+2+1 +5

    68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations

    99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations99.9999998 % of the data will fall within +/- 6 standard deviations

  • 7/29/2019 six sigmaaa

    29/45

    29

    The Empirical Rule (cont.)

    No matter what the shape of your distribution is, as you travel 3 StandardDeviations from the Mean, the probability of occurrence beyond that pointbegins to converge to a very low number.

  • 7/29/2019 six sigmaaa

    30/45

    30

    Why Assess Normality?

    While many processes in nature behave according to the normal

    distribution, many processes in business, particularly in the areas ofservice and transactions, do not

    There are many types of distributions:

    There are many statistical tools that assume Normal Distributionproperties in their calculations.

    So understanding just how Normal the data are will impact how welook at the data.

  • 7/29/2019 six sigmaaa

    31/45

    31

    Tools for Assessing Normality

    The shape of any Normal curve can be calculated based on

    the Normal Probability density function.

    Tests for Normality basically compare the shape of thecalculated curve to the actual distribution of your data points.

    For the purposes of this training, we will focus on 2 ways inMINITAB to assess Normality:

    The Anderson-Darling test

    Normal probability test

    W a tch th a t cu rv e!

  • 7/29/2019 six sigmaaa

    32/45

    32

    Goodness-of-Fit

    The Anderson-Darling test uses an empirical density function.

    Cumulative

    Percent

    0

    20

    40

    60

    80

    100

    3.0 3.5 4.0 4.5 5.0 5.5

    Raw Data Scale

    Expected for Normal DistributionActual Data

    20%20%

    20%20%

    Departure of theactual data from theexpected Normal

    Distribution. TheAnderson-DarlingGoodness-of-Fit testassesses themagnitude of thesedepartures using an

    Observed minusExpected formula.

  • 7/29/2019 six sigmaaa

    33/45

    33

    The Normal Probability Plot

    Amount

    Percent

    11010090807060

    99.9

    99

    95

    90

    80

    7060504030

    20

    10

    5

    1

    0.1

    Mean

    0.684

    84.69

    StDev 7.913

    N 70

    AD 0.265

    P-Value

    Probability Plot of AmountNormal

    The Anderson-Darling test is a good litmustest for normality: if the P-value is more

    than .05, your data are normal enough formost purposes.

  • 7/29/2019 six sigmaaa

    34/45

    34

    Descriptive Statistics

    The Anderson-Darling test also appears in this output. Again,

    if the P-value is greater than .05, assume the data are Normal.

    The reasoning behind the

    decision to assumeNormality based on the P-value will be covered inthe Analyze Phase. Fornow, just accept this as ageneral guideline.

  • 7/29/2019 six sigmaaa

    35/45

    35

    Anderson-Darling Caveat

    Use the Anderson Darling column to generate these graphs.

    In this case, both the Histogram and the Normality Plot look very normal. However,because the sample size is so large, the Anderson-Darling test is very sensitive and anyslight deviation from Normal will cause the P-value to be very low. Again, the topic ofsensitivity will be covered in greater detail in the Analyze Phase.

    For now, just assume that if N > 100 and the data lookNormal, then they probably are.

    Anderson Darling

    Percent

    65605550454035

    99.9

    99

    95

    90

    80

    7060504030

    20

    10

    5

    1

    0.1

    Mean 50.03

    StDev 4.951

    N 500

    AD 0.177

    P-Value 0.921

    Probability Plot of Anderson DarlingNormal

    60565248444036

    Median

    Mean

    50.5050.2550.0049.7549.50

    1st Quartile 46.800

    M edian 50. 006

    3rd Quarti le 53.218

    Maximum 62. 823

    49. 596 50. 466

    49. 663 50. 500

    4.662 5.278

    A-S quared 0.18

    P -V alue 0.921

    Mean 50.031

    StDev 4.951

    Variance 24.511

    Skewness -0.061788

    Kurtosis -0.180064

    N 500

    Min imum 35. 727

    A nderson-Darling Normality Test

    95% C onfidence Interval for Mean

    95% C onfidence Interval for Median

    95% C onfidence Interval for StDev

    95 % Conf idence Intervals

    Summary f or Anderson Darling

  • 7/29/2019 six sigmaaa

    36/45

    36

    If the Data Are Not Normal, Dont Panic!

    Normal Data are not common in the transactional world.

    There are lots of meaningful statistical tools you can use toanalyze your data (more on that later).

    It just means you may have to think about your data in aslightly different way.

    D on t t ou ch tha t bu t t on !

  • 7/29/2019 six sigmaaa

    37/45

    37

    Normality Exercise

    Exercise objective: To demonstrate how to test

    for Normality.

    1. Generate Normal Probability Plots and thegraphical summary using the DescriptiveStatistics.MTW file.

    2. Use only the columns Dist A and Dist D.

    3. Answer the following quiz questions based onyour analysis of this data set.

  • 7/29/2019 six sigmaaa

    38/45

    38

    Isolating Special Causes from Common Causes

    Special Cause:Variation is caused by known factors that result in

    a non-random distribution of output. Also referred to as AssignableCause.

    Common Cause:Variation caused by unknown factors resulting ina steady but random distribution of output around the average of thedata. It is the variation left over after Special Cause variation hasbeen removed and typically (not always) follows a NormalDistribution.

    If we know that the basic structure of the data should follow aNormal Distribution, but plots from our data shows otherwise; we

    know the data contain Special Causes.

    Special Causes = Opportunity

  • 7/29/2019 six sigmaaa

    39/45

    39

    Introduction to Graphing

    The purpose of Graphing is to:

    Identify potential relationships between variables. Identify risk in meeting the critical needs of the Customer,

    Business and People.

    Provide insight into the nature of the Xs which may or maynot control Y.

    Show the results of passive data collection.

    In this section we will cover1. Box Plots

    2. Scatter Plots

    3. Dot Plots

    4. Time Series Plots

    5. Histograms

  • 7/29/2019 six sigmaaa

    40/45

    40

    Data Sources

    Data sources are suggested by many of the tools that have

    been covered so far: Process Map

    X-Y Matrix

    Fishbone Diagrams

    FMEA

    Examples are:

    1. TimeShiftDay of the weekWeek of the monthSeason of the year

    2. Location/positionFacilityRegionOffice

    3. OperatorTrainingExperienceSkillAdherence to procedures

    4. Any other sources?

  • 7/29/2019 six sigmaaa

    41/45

    41

    Graphical Concepts

    The characteristics of a good graph include:

    Variety of data Selection of

    Variables

    Graph

    Range

    Information to interpret relationships

    Explore quantitative relationships

  • 7/29/2019 six sigmaaa

    42/45

    42

    The Histogram

    A Histogram displays data that have been summarized into

    intervals. It can be used to assess the symmetry or Skewness ofthe data.

    To construct a Histogram, the horizontal axis is divided into equalintervals and a vertical bar is drawn at each interval to represent itsfrequency (the number of values that fall within the interval).

    Histogram

    Frequency

    1031021011009998

    40

    30

    20

    10

    0

    Histogram of Histogram

  • 7/29/2019 six sigmaaa

    43/45

    43

    Histogram Caveat

    All the Histograms below were generated using random samples of

    the data from the worksheet Graphing Data.mtw.

    Be careful not to determine Normality simply from a Histogram plot,if the sample size is low the data may not look very Normal.

    Frequency

    4

    3

    2

    1

    0

    1021011009998

    4

    3

    2

    1

    0

    1021011009998

    8

    6

    4

    2

    0

    8

    6

    4

    2

    0

    H1_20 H2_20

    H3_20 H4_20

    Histogram of H1_2 0, H2_ 20 , H3_2 0, H4_ 20

  • 7/29/2019 six sigmaaa

    44/45

    44

    Variation on a Histogram

    Using the worksheet Graphing Data.mtw create a simple

    Histogram for the data column called granular.

    Granular

    Frequency

    56545250484644

    25

    20

    15

    10

    5

    0

    Histogram of Granular

  • 7/29/2019 six sigmaaa

    45/45

    45

    Dot Plot

    The Dot Plot can be a useful alternative to the Histogram

    especially if you want to see individual values or you want to brushthe data.

    Granular

    56545250484644

    Dotplot of Granular