Tableau 라이브 온라인 교육 - 통계 분석 (Statistics)

Preview:

Citation preview

Advanced Training: Statistics

Agenda

 간편 통계• 히스토그램• 요약 카드• 추세선

• 회귀분석

 통계 시각화

• 박스 플롯

• 관리도 (Control Chart)

Be a Detective, Find Fraud! • 벤포드의 법칙

수학 공식을  Tableau 수식으로 변환

히스토그램 Histogram

기본 히스토그램           누적 히스토그램 

SAT Math Scores (bin)

350 400 450 500 550 600 650 700 7500%2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

22%

24%

26%

28%

30%

32%

34%of Total Count of SAT Math Scores %

0.58%1.73%2.31%2.31%

15.61%

26.01%

31.79%

16.76%

2.89%

SAT Math Scores (bin)

350 400 450 500 550 600 650 700 7500%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

of Total Running Sum of Count of SAT Math Scores %

100.00%99.42%97.69%95.38%

93.06%

77.46%

51.45%

19.65%

2.89%

• 히스토그램은 데이터의 분포를 보여줌• 막대의 길이는 특정 구간에 속하는 관측치의 개수를 표현• 구간은 빈 (bin)이라고 지칭됨 

구간 차원 (bin) 만들기

©2012 Tableau Software Inc. All rights reserved.

구간 차원 (bin)은 데이터 (측정값 )를 동일 크기의 구간에 분류해서 넣음

히스토그램

©2012 Tableau Software Inc. All rights reserved.

히스토그램은 데이터의 분포를 보여줌 

비대칭적인 구간 차원 (bin)

©2012 Tableau Software Inc. All rights reserved.

구간을 임의로 정의하려면 계산 필드를 먼저 만들고 나서 다시 구간 차원 (bin)을 생성

요약 카드

요약 카드

Skewness(왜곡도 , 왜도 , 비대칭도 )

Positive Excess Kurtosis on top, Negative Excess Kurtosis on bottom

Excess Kurtosis(첨도 )ID

1 2 3 4 5 6 7 8 9 10 11

0

2

4

6

8

10

Positive

0

2

4

6

8

10

Negative

추세선

ANOVA and Summary Tables

©2012 Tableau Software Inc. All rights reserved.

 

ANOVA and Summary Tables

©2012 Tableau Software Inc. All rights reserved.

 

ANOVA and Summary Tables

©2012 Tableau Software Inc. All rights reserved.

 

잔차모형의 예측치 (prediction)와 관측치 (observation) 간의 차이

Understanding ResidualsStandard Scatterplot: X and Y Predictions on X, Residuals on

Y

Pattern of Residuals Tells you the fit of the model…

Great Residuals: Poor Residuals:

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

Predictions(Avg# power output (MW),Avg# wind speed (m/s))

-1.6

-1.4

-1.2

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Residuals(Avg# power output (MW),Avg# wind speed (m/s))

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Avg# power output (MW) (SUM)

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

Residuals(Avg# power output (MW),Avg# wind speed (m/s))

The better the model fit, the less pattern (more random) the predictions vs. residuals scatterplot should look

Tableau에서의 잔차 분석  (Windows Only)

Traditional

박스 플롯

Tableau• Bullet 2

– Bullet 3

• Tableau의  Show Me 박스 플롯은 관측치를 모두 보이며 , 1.5*IQR 기준의 수염 표시 

• 간단한 옵션 설정으로  traditional boxplot으로 변경

“Show Me”… a Box-and-whisker plotPrimary Category

Beverages Dairy Fats,Sweets..

Fruits Grain Proteins StarchyVegetables

Two orMore Cat..

Vegetables

0

100

200

300

400

500

600

700

800

900

1000

Calories

참조선 옵션 변경

Primary Category

Beverages Dairy Fats,Sweets..

Fruits Grain Proteins StarchyVegetables

Two orMore Cat..

Vegetables0

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

1050

Calories

표준 점수

수학 공식을  Tableau 수식으로 변환

= each point of data = AVG([Calories]) for each Display_Name

= the average of all points = WINDOW_AVG(AVG([Calories]))

= standard deviation of all points = WINDOW_STDEV(AVG([Calories]))

관리도 (Control Chart)

First line of text• Bullet 2

– Bullet 3

Day

12345678910111213141516171819202122232425

0

5

10

15

20

Defectives

Average NP-bar

UCL

LCL

Average NP-bar

UCL

LCL

Problem!

Problem corrected

Np 관리도 수식

Where:n = sample sizeP-bar = percent of defective units. This is (Number of Defectives)/(Total Sample Size)

Explanation of Translating NP calc into TableauP-bar:

WINDOW_SUM(SUM([Defectives]))/WINDOW_SUM(SUM([Sample Size]))N :

SUM([Sample Size])

여러 번 출현하는 것은 별도의 계산 필드로 작성 .여기서는  n*p-bar를 여러 번 사용하게 되므로 계산 필드로 만들어 놓기로 함

Np-bar: [N]*[P-bar]

하나의 계산 필드에서 +와 –를 동시에 표현할 수 없으므로 별도의 계산식으로 분리 . Upper Control Limit (UCL) 및  Lower Control Limit (LCL)

UCL:[np-bar]+3*SQRT( [np-bar]*(1-[P-bar]) )

LCL:[np-bar]-3*SQRT( [np-bar]*(1-[P-bar]) )

UCL과  LCL을 벗어나는 마크 색상 표시

IF SUM ( [Defectives] ) > [UCL] OR SUM ( [Defectives] ) < [LCL]

THEN ‘Out of Control’ELSE ‘In Control’END

벤포드의 법칙 (Benford’s Law)

First line of text• Bullet 2

– Bullet 3

Click to insert image

Left(Sales,1)

1 2 3 4 5 6 7 8 90%

5%

10%

15%

20%

25%

30%

35%

of Total Number of Records %3.99%

5.04%5.76%6.64%

7.55%8.92%

11.33%

18.09%

32.68%

END

Recommended