six sigmaaa

7/29/2019 six sigmaaa

1/45

Measure PhaseSix Sigma StatisticsMeasure PhaseSix Sigma Statistics


2/45

2

Six Sigma Statistics

Descriptive Statistics

Normal Distribution

Assessing Normality

Graphing Techniques

Basic Statistics

Special Cause / Common Cause

Wrap Up & Action Items

Process Capability

Measurement System Analysis

Six Sigma Statistics

Process Discovery

Welcome to Measure


3/45

3

Purpose of Basic Statistics

The purpose of Basic Statistics is to: Provide a numerical summary of the data being analyzed.

Data (n)

Factual information organized for analysis.

Numerical or other information represented in a form suitable for processing bycomputer

Values from scientific experiments.

Provide the basis for making inferences about the future. Provide the foundation for assessing process capability.

Provide a common language to be used throughout an organization todescribe processes.

R ela x.i t w on t

be t ha t bad !


4/45

4

Statistical Notation Cheat Sheet

An individual value, an observation

A particular (1st) individual value

For each, all, individual values

The Mean, average of sample data

The grand Mean, grand average

The Mean of population data

A proportion of sample data

A proportion of population data

Sample size

Population size

Summation

The Standard Deviation of sample data

The Standard Deviation of population data

The variance of sample data

The variance of population data

The range of data

The average range of data

Multi-purpose notation, i.e. # of subgroups, #of classes

The absolute value of some term

Greater than, less than

Greater than or equal to, less than or equal to


5/45

5

Parameters vs. Statistics

Population Parameters: Arithmetic descriptions of a population

, , P, 2, N

Population

Sample

Sample

Sample

Sample Statistics: Arithmetic descriptions of a

sample

X-bar , s, p, s2, n

Population: All the items that have the property of interest under study.

Frame: An identifiable subset of the population.

Sample: A significantly smaller subset of the population used to make an inference.


6/45

6

Types of Data

Attribute Data (Qualitative)

Is always binary, there are only two possible values (0, 1) Yes, No

Go, No go

Pass/Fail

Variable Data (Quantitative)

Discrete (Count) Data

Can be categorized in a classification and is based on counts. Number of defects

Number of defective units

Number of customer returns

Continuous Data

Can be measured on a continuum, it has decimal subdivisions that aremeaningful

Time, Pressure, Conveyor Speed, Material feed rate Money

Pressure

Conveyor Speed

Material feed rate


7/45

7

Discrete Variables

Discrete Variable Possible values for the variable

The number of defective needles in boxes of 100diabetic syringes

0,1,2, , 100

The number of individuals in groups of 30 with aType A personality

0,1,2, , 30

The number of surveys returned out of 300mailed in a customer satisfaction study.

0,1,2, 300

The number of employees in 100 having finishedhigh school or obtained a GED

0,1,2, 100

The number of times you need to flip a coinbefore a head appears for the first time

1,2,3,

(note, there is no upper limit because you mightneed to flip forever before the first head appears.


8/45

8

Continuous Variables

Continuous Variable Possible Values for the Variable

The length of prison time served for individualsconvicted of first degree murder

All the real numbers between aand b, where aisthe smallest amount of time served and bis the

largest.

The household income for households withincomes less than or equal to $30,000

All the real numbers between aand $30,000,where ais the smallest household income in the

population

The blood glucose reading for those individualshaving glucose readings equal to or greater than

200

All real numbers between 200 and b, where bisthe largest glucose reading in all such individuals


9/45

9

Definitions of Scaled Data

Understanding the nature of data and how to represent it can affect the

types of statistical tests possible.

Nominal Scale data consists of names, labels, or categories. Cannotbe arranged in an ordering scheme. No arithmetic operations areperformed for nominal data.

Ordinal Scale data is arranged in some order, but differences betweendata values either cannot be determined or are meaningless.

Interval Scale data can be arranged in some order and for whichdifferences in data values are meaningful. The data can be arranged inan ordering scheme and differences can be interpreted.

Ratio Scale data that can be ranked and for which all arithmeticoperations including division can be performed. (division by zero is ofcourse excluded) Ratio level data has an absolute zero and a value ofzero indicates a complete absence of the characteristic of interest.


10/45

10

Nominal Scale

Qualitative Variable Possible nominal level data values forthe variable

Blood Types A, B, AB, O

State of Residence Alabama, , Wyoming

Country of Birth United States, China, other

T i m e t o w ei g h i n !


11/45

11

Ordinal Scale

Qualitative Variable Possible Ordinal level datavalues

Automobile Sizes Subcompact, compact,intermediate, full size, luxury

Product rating Poor, good, excellent

Baseball team classification Class A, Class AA, Class AAA,Major League


12/45

12

Interval Scale

Interval Variable Possible Scores

IQ scores of students inBlackBelt Training

100

(the difference between scoresis measurable and hasmeaning but a difference of 20points between 100 and 120does not indicate that onestudent is 1.2 times more

intelligent )


13/45

13

Ratio Scale

Ratio Variable Possible Scores

Grams of fat consumed per adult in theUnited States 0 (If person A consumes 25 grams of fat andperson B consumes 50 grams, we can saythat person B consumes twice as much fatas person A. If a person C consumes zerograms of fat per day, we can say there is acomplete absence of fat consumed on that

day. Note that a ratio is interpretable andan absolute zero exists.)


14/45

14

Converting Attribute Data to Continuous Data

Continuous Data is always more desirable

In many cases Attribute Data can be converted toContinuous

Which is more useful?

15 scratches or Total scratch length of 9.25

22 foreign materials or 2.5 fm/square inch

200 defects or 25 defects/hour


15/45

15


Measures of Location (central tendency)

Mean Median

Mode

Measures of Variation (dispersion)

Range Interquartile Range

Standard deviation

Variance


16/45

16


Open the MINITAB Project Measure Data Sets.mpj and

select the worksheet basicstatistics.mtw


17/45

17

Measures of Location

Mean is: Commonly referred to as the average.

The arithmetic balance point of a distribution of data.

PopulationSample

Descriptive Statistics: Data

Variable N N* Mean SE Mean StDev Minimum Q1Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.99005.0000 5.0100

Variable MaximumData 5.0200

Stat>Basic Statistics>Display Descriptive Statistics>Graphs>Histogram of data, with normal curve

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5. 000

StDev 0.01007

N 200

Histogram (w ith Normal Curve) of Data


18/45

18


Median is:

The mid-point, or 50th percentile, of a distribution of data. Arrange the data from low to high, or high to low.

It is the single middle value in the ordered list if there is an oddnumber of observations

It is the average of the two middle values in the ordered list if thereare an even number of observations

Data

Fre

quency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000

StDev 0.01007

N 200

Histogram (w ith Normal Curve) of Data


Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100



19/45

19


Trimmed Mean is a:

Compromise between the Mean and Median. The Trimmed Mean is calculated by eliminating a specified percentage

of the smallest and largest observations from the data set and thencalculating the average of the remaining observations

Useful for data with potential extreme values.

Stat>Basic Statistics>Display Descriptive Statistics>Statistics> Trimmed Mean


Variable N N* Mean SE Mean TrMean StDev Minimum Q1 MedianData 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000

Variable Q3 MaximumData 5.0100 5.0200


20/45

20


Mode is:The most frequently occurring value in a distribution of data.

Data

Frequency

5.025.015.004.994.984.97

80

70

60

50

40

30

20

10

0

Mean 5.000

StDev 0.01007N 200

Histogram (with Normal Curve) of Data

Mode = 5


21/45

21

Measures of Variation

Range is the:Difference between the largest observation and the smallest

observation in the data set.

A small range would indicate a small amount of variability and a largerange a large amount of variability.

Interquartile Range is the:Difference between the 75th percentile and the 25th percentile.




Use Range or Interquartile Range when the data distribution is Skewed.


22/45

22


Standard Deviation is:

Equivalent of the average deviation of values from the Mean for adistribution of data.

A unit of measure for distances from the Mean.

Use when data are symmetrical.

PopulationSample




Cannot calculate population Standard Deviation because this is sample data.


23/45

23


Variance is the:Average squared deviation of each individual data point from theMean.

Sample Population


24/45

24

Normal Distribution

The Normal Distributionis the most recognized distribution in

statistics.

What are the characteristics of a Normal Distribution?

Only random error is present

Process free of assignable cause

Process free of drifts and shifts

So what is present when the data is Non-normal?


25/45

25

The Normal Curve

The normal curve is a smooth, symmetrical, bell-shaped

curve, generated by the density function.

It is the most useful continuous probability model as

many naturally occurring measurements such asheights, weights, etc. are approximately NormallyDistributed.


26/45

26

Normal Distribution

Each combination of Mean and Standard Deviation generates a

unique Normal curve:

Standard Normal Distribution

Has a = 0, and = 1

Data from any Normal Distribution can be made tofit the standard Normal by converting raw scores

to standard scores.

Z-scores measure how many Standard Deviations from themean a particular data-value lies.


27/45

27

Normal Distribution

The area under the curve between any 2 points represents the

proportion of the distribution between those points.

Convert any raw score to a Z-score using the formula:

Refer to a set of Standard Normal Tables to find theproportion between and x.

x

The area between theMean and any otherpoint depends upon theStandard Deviation.


28/45

28

The Empirical Rule

The Empirical Rule

+6-1-3-4-5-6 -2 +4+3+2+1 +5

68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations

99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations99.9999998 % of the data will fall within +/- 6 standard deviations


29/45

29

The Empirical Rule (cont.)

No matter what the shape of your distribution is, as you travel 3 StandardDeviations from the Mean, the probability of occurrence beyond that pointbegins to converge to a very low number.


30/45

30

Why Assess Normality?

While many processes in nature behave according to the normal

distribution, many processes in business, particularly in the areas ofservice and transactions, do not

There are many types of distributions:

There are many statistical tools that assume Normal Distributionproperties in their calculations.

So understanding just how Normal the data are will impact how welook at the data.


31/45

31

Tools for Assessing Normality

The shape of any Normal curve can be calculated based on

the Normal Probability density function.

Tests for Normality basically compare the shape of thecalculated curve to the actual distribution of your data points.

For the purposes of this training, we will focus on 2 ways inMINITAB to assess Normality:

The Anderson-Darling test

Normal probability test

W a tch th a t cu rv e!


32/45

32

Goodness-of-Fit

The Anderson-Darling test uses an empirical density function.

Cumulative

Percent

0

20

40

60

80

100

3.0 3.5 4.0 4.5 5.0 5.5

Raw Data Scale

Expected for Normal DistributionActual Data

20%20%

20%20%

Departure of theactual data from theexpected Normal

Distribution. TheAnderson-DarlingGoodness-of-Fit testassesses themagnitude of thesedepartures using an

Observed minusExpected formula.


33/45

33

The Normal Probability Plot

Amount

Percent

11010090807060

99.9

99

95

90

80

7060504030

20

10

5

1

0.1

Mean

0.684

84.69

StDev 7.913

N 70

AD 0.265

P-Value

Probability Plot of AmountNormal

The Anderson-Darling test is a good litmustest for normality: if the P-value is more

than .05, your data are normal enough formost purposes.


34/45

34


The Anderson-Darling test also appears in this output. Again,

if the P-value is greater than .05, assume the data are Normal.

The reasoning behind the

decision to assumeNormality based on the P-value will be covered inthe Analyze Phase. Fornow, just accept this as ageneral guideline.


35/45

35

Anderson-Darling Caveat

Use the Anderson Darling column to generate these graphs.

In this case, both the Histogram and the Normality Plot look very normal. However,because the sample size is so large, the Anderson-Darling test is very sensitive and anyslight deviation from Normal will cause the P-value to be very low. Again, the topic ofsensitivity will be covered in greater detail in the Analyze Phase.

For now, just assume that if N > 100 and the data lookNormal, then they probably are.

Anderson Darling

Percent

65605550454035

99.9

99

95

90

80

7060504030

20

10

5

1

0.1

Mean 50.03

StDev 4.951

N 500

AD 0.177

P-Value 0.921

Probability Plot of Anderson DarlingNormal

60565248444036

Median

Mean

50.5050.2550.0049.7549.50

1st Quartile 46.800

M edian 50. 006

3rd Quarti le 53.218

Maximum 62. 823

49. 596 50. 466

49. 663 50. 500

4.662 5.278

A-S quared 0.18

P -V alue 0.921

Mean 50.031

StDev 4.951

Variance 24.511

Skewness -0.061788

Kurtosis -0.180064

N 500

Min imum 35. 727

A nderson-Darling Normality Test

95% C onfidence Interval for Mean

95% C onfidence Interval for Median

95% C onfidence Interval for StDev

95 % Conf idence Intervals

Summary f or Anderson Darling


36/45

36

If the Data Are Not Normal, Dont Panic!

Normal Data are not common in the transactional world.

There are lots of meaningful statistical tools you can use toanalyze your data (more on that later).

It just means you may have to think about your data in aslightly different way.

D on t t ou ch tha t bu t t on !


37/45

37

Normality Exercise

Exercise objective: To demonstrate how to test

for Normality.

1. Generate Normal Probability Plots and thegraphical summary using the DescriptiveStatistics.MTW file.

2. Use only the columns Dist A and Dist D.

3. Answer the following quiz questions based onyour analysis of this data set.


38/45

38

Isolating Special Causes from Common Causes

Special Cause:Variation is caused by known factors that result in

a non-random distribution of output. Also referred to as AssignableCause.

Common Cause:Variation caused by unknown factors resulting ina steady but random distribution of output around the average of thedata. It is the variation left over after Special Cause variation hasbeen removed and typically (not always) follows a NormalDistribution.

If we know that the basic structure of the data should follow aNormal Distribution, but plots from our data shows otherwise; we

know the data contain Special Causes.

Special Causes = Opportunity


39/45

39

Introduction to Graphing

The purpose of Graphing is to:

Identify potential relationships between variables. Identify risk in meeting the critical needs of the Customer,

Business and People.

Provide insight into the nature of the Xs which may or maynot control Y.

Show the results of passive data collection.

In this section we will cover1. Box Plots

2. Scatter Plots

3. Dot Plots

4. Time Series Plots

5. Histograms


40/45

40

Data Sources

Data sources are suggested by many of the tools that have

been covered so far: Process Map

X-Y Matrix

Fishbone Diagrams

FMEA

Examples are:

1. TimeShiftDay of the weekWeek of the monthSeason of the year

2. Location/positionFacilityRegionOffice

3. OperatorTrainingExperienceSkillAdherence to procedures

4. Any other sources?


41/45

41

Graphical Concepts

The characteristics of a good graph include:

Variety of data Selection of

Variables

Graph

Range

Information to interpret relationships

Explore quantitative relationships


42/45

42

The Histogram

A Histogram displays data that have been summarized into

intervals. It can be used to assess the symmetry or Skewness ofthe data.

To construct a Histogram, the horizontal axis is divided into equalintervals and a vertical bar is drawn at each interval to represent itsfrequency (the number of values that fall within the interval).

Histogram

Frequency

1031021011009998

40

30

20

10

0

Histogram of Histogram


43/45

43

Histogram Caveat

All the Histograms below were generated using random samples of

the data from the worksheet Graphing Data.mtw.

Be careful not to determine Normality simply from a Histogram plot,if the sample size is low the data may not look very Normal.

Frequency

4

3

2

1

0

1021011009998

4

3

2

1

0

1021011009998

8

6

4

2

0

8

6

4

2

0

H1_20 H2_20

H3_20 H4_20

Histogram of H1_2 0, H2_ 20 , H3_2 0, H4_ 20


44/45

44

Variation on a Histogram

Using the worksheet Graphing Data.mtw create a simple

Histogram for the data column called granular.

Granular

Frequency

56545250484644

25

20

15

10

5

0

Histogram of Granular


45/45

45

Dot Plot

The Dot Plot can be a useful alternative to the Histogram

especially if you want to see individual values or you want to brushthe data.

Granular

56545250484644

Dotplot of Granular

Documents

six sigmaaa