50
Chapter 12 비모수 통계학 (nonparametric analysis) 2017/6/5

Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Chapter 12 비모수 통계학

(nonparametric analysis)

2017/6/5

Page 2: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

9.1 머리말 (introduction) • 모수적 방법

– 모집단의 분포를 가정

– 그 분포는 모수의 함수

– 모수를 알면 분포를 완전히 안다.

– 모수의 추정과 검정이 주요 문제 →모집단의 분포 가정이 틀리면 전체 논리가 다 틀리게 된다.

• Parametric approach * assumes dist’n of the pop * dist’n is the function of the parameters * Characteristics of the pop is determined by the parameters

* Estimation and testing of the parameters are main problems

* If the parametric assumptions are not valid, all the results of the analysis are questionable.

Page 3: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

9.1 머리말 (introduction)

• 비모수적 방법; * 모집단의 분포를 가정하지 않음(무분포 방법) * data의 순위를 사용 * 모수 가정이 합리적인 경우 모수적 방법이 훨씬 더 효과적(efficient)

• Nonparametric approach * does not assumes the distributions of the pop (distribution-free method) * uses order of the data * If the parametric assumes are valid then parametric method is more efficient (smaller variance, less p-value)

Page 4: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

data mean median

1,2,3,4,5 3 3

1,2,3,4,5,100 19 3.5

Median is robust to the outliers comparing to mean. (<-> sensitive)

median is the same if 100 -> 10000000

Nonparametric methods typically uses order of the data, not the value of the data.

Page 5: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Parametric vs. nonparametric methods

• 비모수적 방법은 자료의 (정규성) 분포가정을 하지 않는다

• Nonparametric methods are not dependent on parametric distributions.

• 자료의 평균과 분산이 아닌 순위를 이용한 방법을 사용한다.

• It typically uses ranks rather than the mean and variance.

• 자료의 분포가정 (eg 정규성)이 만족되면 효율이 떨어진다.

• If the distributional assumptions are valid, then nonparametric methods are less efficient (larger variance)

• Robust 한 결과를 준다. (outlier에 둔감)

• It is robust (not sensitive) to outliers

Page 6: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.2 측정척도 (measurement scale)

• 명목척도(Nominal Scale) 남자, 여자, (male, female) 서울, 부산 (NY, LA)

• 서열척도(Ordinal Scale) 上, 中, 下 (high, medium, low)

• 구간척도(Interval Scale) 서열도 의미, 절대적 차이도 의미

• 비척도(Ratio Scale) 비율도 의미

Page 7: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.3 부호검정(Sign Test)

Ex 12.3.1

•가설 Ho : 중위수(Median)=102 ,

Ha :중위수(Median)≠102

학생번호 (No) 점수(Score) 학생번호 (No) 점수(Score)

1 75 9 82

2 90 10 103

3 86 11 88

4 110 12 124

5 115 13 110

6 94 14 77

7 132 15 99

8 74

Page 8: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Decision rule :P(+)>P(-)=Median>102 : enough # of +’s -> Reject

:P(+)<P(-)=Median<102: enough # of -’s -> Reject

:P(+)≠P(-)=Median≠102: enough # of + or -’s -> Reject

Ex12.3.1 에서 :(중위수=102) : P(+) ≠ P(-) # of +’s out of 15 under ~ Bin(15,1/2)

0H

0H

0H

AH

AH

AHAH

0H

Scores above(+) or below(-) the hypothesized median (103)

학생번호 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

관측값−103 − − − + + − + − − + − + + − −

Page 9: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•Test statistic

𝑃 𝑘 ≤ 6 15,0.5

=150

0.5 0 0.5 15 +151

0.5 1 0.5 14 + ⋯ +156

0.5 6 0.5 9

= 0.3036

We cannot reject Ho

[짝비교를 위한 부호검정]

짝지은 관측값들의 차이의 + 혹은 – 여부를 사용함.

We may apply Sign test for paired observations (like paired t-test)

Page 10: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

data sign;

input score @@;

datalines;

75 90 86 110 115 94 132 74 82 103 88 124 110 77 99

;

run;

proc univariate mu0=102 ;

run;

2-sided

1-sided=.6072/2 =.3036

Page 11: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Ex 12.3.2 (쌍을 이룬 집단 비교) paired data

• Hypothesis

: median of the difference is P(+)=P(-)

: median of the difference is negative P(+) < P(-) 0H

AH

Dental Hygiene Score

점수

id 양치질 교육을 받은 사람(𝑿𝒊) 양치질 교육을 받지 않은 사람(𝒀𝒊)

1 1.6 2 2 2 2 3 3.7 4.1 4 3.5 2.4 5 3.3 4.2 6 2.4 3.6 7 2 3.5 8 1.5 3 9 1.5 2.5 10 2.1 2.5 11 3.6 2.5 12 2.3 2.5

instructed Not-instructed

Page 12: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Test statistic : # of (+)

𝑃 𝑘 ≤ 2 11, 𝑝 = 0.5 = 11𝑟

0.5𝑟0.511−𝑟2𝑟=0

pbinom(2,11,0.5)=0.0327< 0.05

-> 𝛼 = 0.05에서 영가설을 기각한다. (Reject Ho)

[오른쪽 부호검정] (Sign Test Using right tail)

[표본의 크기] (Sample size)

id 1 2 3 4 5 6 7 8 9 10 11 12

𝑋𝑖 − 𝑌𝑖 − 0 − + − − − − − − + −

Page 13: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

data pair;

input edu noedu ;

diff=noedu-edu ;

datalines;

1.5 2.0

2.0 2.0

3.5 4.0

3.0 2.5

3.5 4.0

2.5 3.0

2.0 3.5

1.5 3.0

1.5 2.5

2.0 2.5

3.0 2.5

2.0 2.5

;run;

proc univariate ;

var diff ;

run;

2-sided

1-sided=.0654/2 =.03275

Page 14: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.4 Wilcoxon의 위치에 대한 부호순위검정 (Wilcoxon’s signed rank test)

관측값 (obs) 𝒅𝒊 = 𝒙𝒊 − 𝟓. 𝟎𝟓 |𝒅𝒊|의 순서 |𝒅𝒊|의 순서와 부호의 곱

4.90 −0.15 1 −1 4.1 −0.95 7 −7 6.73 1.68 10 10 7.27 2.22 13 13 7.42 2.37 14 14 7.5 2.45 15 15 6.76 1.71 9 9 4.64 −0.41 3 −3 5.98 0.93 6 6 3.14 −1.91 12 −12 3.24 −1.81 11 −11 5.8 0.75 5 5 6.17 1.12 8 8 5.39 0.34 2 2 5.78 0.73 4 4

𝑊+ = 86, 𝑊− = 34, 𝑊 = 52

Ho: mean=5.50, Ha: Mean≠5.50 Test stat: W= 𝑊+ + 𝑊− = 52 Reject Ho if W is too large or too small >wilcox.test(c(4.90,4.1,6.73,7.27,7.42,7.5,6.76,4.64,5.98,3.14,3.24,5.8,6.17,5.39, 5.78), mu=5.05) 𝑝-값은 0.1514

Page 15: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.5 중위수 검정법(Median Test) • H0 :중위수(농촌)=중위수(도시)

Median(rural)=Median(urban)

# >= Median

# < Median

urban rural

Mental health score

urban rural urban rural

35 29 25 50

26 50 27 37

27 43 45 34

21 22 46 31

27 42 33

38 47 26

23 42 46

25 32 41

도시 시골 합계

중위수보다 큰 값의 수 6 8 14

중위수보다 작은 값의 수 10 4 14

합계 16 12 28

Page 16: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• 하에서는 2ⅹ2분할표의 row와 column이 독립

• Row and column are independent under Ho

• ∴Do not reject 두 집단의 중위수는 동일하다.

Medians of two groups are not different.

따라서

22

22

1

( )

( )( )( )( )

28 6 4 10 8 = 2.33 3.841

16 12 14 14

2.33 2.706 0.10

n ad bc

a c b d a b c d

p

0H

0H

Page 17: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.6 Mann-Whitney test

• 가정 :두 집단의 sample size가 각각 n, m일때 ① 독립적이고 확률적으로 뽑았다. ② 서열적이다. ③ 두 집단은 같은 분포이고, 중위수만 다르다.

• Assumptions: samples are n, m, respectively. ① sampled independently and randomly. ② ordinal scale. ③ different only by the medians. Shapes are exactly the same

Page 18: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•Ex 12.6.1

몸무게 (Weight) Group 1 (𝑿) Group 2 (𝒀)

252 254 185 280 240 164 310 264 205 288 212 270 200 138 238 210 170 240 184 192 170 217 136 126 320 240 200 220 148 302 270 295

214 312

그룹 1의 모중위수가 그룹2의 모중위수보다 작다고 할 수 있나?

Is population median of group 1 is smaller than that of group 2?

𝐻0 ∶ 𝑀𝑋 ≥ 𝑀𝑌 vs 𝐻𝐴∶ 𝑀𝑋 < 𝑀𝑌

Page 19: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

rank rank

Rank sum of X

그룹 1 순서 그룹 2 순서 126 1 136 2

138 3 148 4 164 5 170 6.5 170 6.5

184 8 185 9 192 10

200 11.5 200 11.5 205 13

210 14 212 15

214 16 217 17

220 18 238 19

240 21 240 21 240 21 252 23 254 24

264 25 270 26.5 270 26.5 280 28

288 29 295 30

302 31 310 32

312 33 320 34 Total 319.5

𝑈 = 𝑊 −𝑚 𝑚 + 1

2

= 319.5 −18 18 + 1

2= 148.5

Rule: Reject Ho if U is small enough.

p-value=0.14 Evidence is not

enough to reject Ho.

Page 20: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

install.packages('coin') > library(coin) > xx<-c(252,240,205,200,170,170,320,148,214,185,310,212,238,184,136,200,270) > yy<-c(254,164,288,138,240,217,240,302,312,254,164,288,138,240,217,240,302,312) > dat<-data.frame(val=c(xx,yy),group=factor(rep(1:2,c(17,18))) ) > wilcox_test(val~group,data=dat,distribution = 'exact') Exact Wilcoxon-Mann-Whitney Test data: val by group (1, 2) Z = -1.4882, p-value = 0.1404 alternative hypothesis: true mu is not equal to 0

Page 21: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

11.6 Kolmogorov-Smirnov (K-S) goodness-of-fit test

• Are cumulative dist’ns the same? ⇔Are dist’ns of two pops the same?

• 검정통계량 (test stat)

모집단

0

ˆ ( ) : Pr( )

( ) : Pr( )

: ( ) ( )

: ( ) ( )

S S

T T

TS

TSA

F x x x

F x X x

H F x F x

H F x F x

표본누적분포함수

누적분포함수

ˆsup | ( )S

xD F x ˆ ( ) |

TF x

(pop) Cumulative dist’n ft

sample cumulative dist’n ft

Page 22: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•계산방법 , 보기 11.6.1 공복시 혈당량이 정규분포를 따르는가 ? Glucose level ~ normal dist’n ?

75 92 80 80 83 72

83 77 81 77 75 81

80 92 72 77 78 76

77 86 77 92 80 78

67 78 92 67 80 81

87 76 80 87 77 86

𝒙 도수 누적도수 𝑭𝑺(𝒙)

67 2 2 0.0556

72 2 4 0.1111

75 2 6 0.1667

76 2 8 0.2222

77 6 14 0.3889

78 3 17 0.4722

80 6 23 0.6389

83 3 26 0.7222

84 2 28 0.7778

86 2 30 0.8333

87 2 32 0.8889

92 4 36 1.0000

합계 36

Page 23: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

D=0.1547 < 0.221

𝒙 𝒛 = (𝒙 − 𝟖𝟎) 𝟔 𝑭𝑻(𝒙)

[67,72) −2.00 0.0228

[72,75) −1.33 0.0918

[75,76) −0.83 0.2033

[76,77) −0.67 0.2514

[77,78) −0.50 0.3085

[78,80) −0.33 0.3707

[80,83) 0.00 0.5000

[83,84) 0.17 0.5675

[84,86) 0.67 0.7486

[86,87) 1.00 0.8413

[87,92) 1.17 0.8790

[92,∞) 2.00 0.9772

𝒙 𝑭𝑺 𝒙 𝑭𝑻(𝒙) |𝑭𝑺 𝒙 − 𝑭𝑻(𝒙)|

67 0.0556 0.0228 0.0328

72 0.1111 0.0918 0.0193

75 0.1667 0.2033 0.0366

76 0.2222 0.2514 0.0292

77 0.3889 0.3085 0.0804

78 0.4722 0.3707 0.1015

80 0.6389 0.5000 0.1389

83 0.7222 0.5675 0.1547

84 0.7778 0.7486 0.0292

86 0.8333 0.8413 0.0080

87 0.8889 0.8790 0.0099

92 1.0000 0.9772 0.0228

Page 24: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

http://www.mathematik.uni-kl.de/~schwaar/Exercises/Tabellen/table_kolmogorov.pdf

> xx<-c(75,92,80,80,83,72,83,77,81,77,75,81,80,92,72,77,78,76,77,86,77,92,80,78, + 67,78,92,67,80,81,87,76,80,87,77,86)

> ks.test(xx,'pnorm',mean=80,sd=6)

One-sample Kolmogorov-Smirnov test

data: xx

D = 0.15604, p-value = 0.3447

alternative hypothesis: two-sided

경고메시지(들):

In ks.test(xx, "pnorm", mean = 80, sd = 6) :

Kolmogorov-Smirnov 테스트를 이용할 때는 ties가 있으면 안됩니다

> 근사적인 p-값을 사용한다.

Page 25: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.8 Kruskal-Wallis One-way ANOVA7

• 가정 H0: k개의 집단은 같은 분포에서 나왔다. HA: 적어도 하나의 집단은 다른 집단과 다른 분포(큰 값 혹은 작은값)에서 나왔다.

• Assumptions

H0 : k samples from the same distributions

HA : one or more sample from distribution with larger or smaller location parameter

Page 26: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• H0하에서는 각 집단에서의 순위합 들은 비슷하다. 원래는 의 형태이고 값들이 비슷하면 값이 작아지므로 Ho를 reject 못한다.

• rank-sums are similar under Ho

• If ‘s are similar then are small -> H is small, we cannot reject Ho

1 2, , ,

kR R R

2

iR R i

R

2

iR R

1 2, , ,

kR R R

iR

2

iR R

Page 27: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•보기 12.8.1 2

2

1

123( 1) ~

( 1)j

kj

RH n

n n n

반응값

A B C

12.01 3.67 55.63

29.44 4.05 27.88

28.02 6.49 66.81

38.33 21.12 46.27

55.91 1.11 31.19

반응값

A B C

5 2 13

9 3 7

8 4 15

11 6 12

14 1 10

47 16 57

Original values Ordered values

𝐻 =12

15(16)

472

5+

162

5+

572

5− 3 15 + 1 = 9.14

P<0.009 Page 486

Page 28: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

> xx<-c(12.01,3.67,55.63,29.44,4.05,27.88,28.02,6.49,66.81,38.33,21.12,46.27,55.91,1.11,31.19)

> dat<-data.frame(val=xx,group=factor(rep(1:3,5)))

> kruskal.test(val~group,data=dat)

Asymptotic Kruskal-Wallis Test

data: val by group (1, 2, 3)

chi-squared = 9.14, df = 2, p-value = 0.01036

Page 29: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•Ex 12.8.2

𝐻 =12

41(41 + 1)

682

10+

2462

8+

1242

9+

1592

7+

2642

7− 3 41 + 1 = 36.39

pchisq(36.39,4,lower=F)= 2.4 × 10−7

Treatment cost by drug type per bed by hospital type

Drug type

A B C D E

17.38(11) 52.59(35) 27.87(20) 34.55(26) 60.77(40)

15.20(2) 44.55(28) 24.00(12) 31.15(22) 59.99(38)

14.76(1) 44.80(29) 26.55(16) 30.50(21) 58.94(37)

16.88(7) 43.25(27) 25.00(13) 31.25(23) 57.05(36)

17.02(10) 50.75(32) 27.55(19) 32.75(24) 60.50(39)

26.67(17) 52.25(34) 25.92(14) 33.00(25) 61.50(41)

15.75(4) 46.13(30) 26.01(15) 27.30(18) 51.10(33)

16.02(5) 48.87(31) 16.48(6)

15.30(3) 17.00(9)

16.98(8)

𝑅1 =68 𝑅2 =246 𝑅3 =124 𝑅4 =159 𝑅5 =264

Page 30: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

> val<-c(17.38,15.20,14.76,16.88,17.02,26.67,15.75,16.02,15.30,16.98,52.59,44.55,44.80,43.25,50.75, 52.25,46.13,48.87,27.87,24.00,26.55,25.00,27.55,25.92,26.01,16.48,17.00,34.55,31.15,30.50,31.25,32.75,33.00,27.30,60.77,59.99,58.94,57.05,60.50,61.50,51.10)

> group<-factor(rep(c('A','B','C','D','E'),c(10,8,9,7,7)))

> dat<-data.frame(val,group)

> kruskal.test(val~group,data=dat)

Kruskal-Wallis rank sum test

data: val by group

Kruskal-Wallis chi-squared = 36.394, df = 4, p-value = 2.401e-07

Page 31: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.9 Friedman’s 2-way ANOVA

• Ex 12.9.1

Physical therapists’ ranks of three low-volt electrical simulators

의료기기

물리치료사 A B C

1 2 3 1 2 2 3 1 3 2 3 1 4 1 3 2 5 3 2 1 6 1 2 3 7 2 3 1 8 1 3 2 9 1 3 2

𝑅𝑗 15 25 14

Medical device

Therapist

𝐻0: 3가지 의료기기의 성능은 동일하다. (Three devices are equivalent)

𝐻A: 적어도 하나의 의료기기 성능은 다르다. (They are not equivalent)

Page 32: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

𝑋𝑟2 =

12

9 3 3 + 1[ 15 2+ 25 2+ 15 2] − 3(9)(3 + 1)

= 8.222 [표 B(a)]-> p=0.016. 유의수준 0.05에서 영가설 기각

(Reject Ho)

> val<-c(2,3,1,2,3,1,2,3,1,1,3,2,3,2,1,1,2,3,2,3,1,1,3,2,1,3,2) > group<-factor(rep(1:3,9)) > id<-factor(rep(1:9,each=3)) > friedman.test(val,group,id) Friedman rank sum test data: val, group and id Friedman chi-squared = 8.2222, df = 2, p-value = 0.01639

Page 33: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.10 Spearman rank correlation coefficient

• 양측검정 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 독립적이 아니다.

• 단측검정 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 정비례 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 반비례

• 2-sided H0 : X and Y are indep. HA : X and Y are not indep.

• 1-sided H0 : X and Y are indep. HA : X and Y: + association H0 : X and Y are indep. HA : X and Y: - association

Page 34: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Ex 12.10 식별번호 𝐗 𝐘 식별변호 𝐗 𝐘

1 500 525 10 50 60 2 475 130 11 175 105 3 390 325 12 130 148 4 325 190 13 76 75 5 325 90 14 200 250 6 205 295 15 174 102 7 200 180 16 201 151 8 75 74 17 125 130

9 230 420

식별번호 순서 (𝐗) 순서 (𝐘) 𝒅𝒊 𝒅𝒊

𝟐

1 17 17 0.0 0.00 2 16 7.5 8.5 72.25 3 15 15 0.0 0.00 4 13.5 12 1.5 2.25 5 13.5 4 9.5 90.25 6 11 14 -3.0 9.00 7 8.5 11 -2.5 6.25 8 2 2 0.0 0.00 9 12 16 -4.0 16.00 10 1 1 0.0 0.00 11 7 6 1.0 1.00 12 5 9 -4.0 16.00 13 3 3 0.0 0.00 14 8.5 13 -4.5 20.25 15 6 5 1.0 1.00 16 10 10 0.0 0.00 17 4 7.5 -3.5 12.25

𝑑𝑖2 =246.5

Page 35: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•가설검정의 순서 ① X,Y 따로 순위를 준다. ② di=순위(xi)-순위(Yi) ③ 을 구한다. 2

id

•반비례의 관계가 있다면 가 커지고 rs가 작아진다.

•비례의 관계가 있다면 가 작아지고 rs가 커진다. -> 충분히 큰 rs -> 두 변수가 독립이라는 귀무가설을 기각함

(table C)

2

id

•steps ① rank X, Y seperately. ② di=rank(xi)-rank(Yi) ③ calculate

2

id

• negative association -> large -> small rs

• positive association -> small -> large rs

• rs is large enough -> reject H0 : independence ∴ We conclude positive association between X and Y

2

id

𝑟𝑠 = 1 −6 𝑑𝑖

2

𝑛(𝑛2−1)=0.697 > 0.4853

2

id

2

id

Page 36: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Ex 12.10.2(n>30일 경우) 식별번호 나이 (𝑿) 무기질 농도 (𝒀) 식별번호 나이 (𝑿) 무기질 농도 (𝒀)

1 82 169.62 19 50 4.48

2 85 48.94 20 71 46.93

3 83 41.16 21 54 30.91

4 64 63.95 22 62 34.27

5 82 21.09 23 47 41.44

6 53 5.40 24 66 109.88

7 26 6.33 25 34 2.78

8 47 4.26 26 46 4.17

9 37 3.62 27 27 6.57

10 49 4.82 28 54 61.73

11 65 108.22 29 72 47.59

12 40 10.20 30 41 10.46

13 32 2.69 31 35 3.06

14 50 6.16 32 75 49.57

15 62 23.87 33 50 5.55

16 33 2.70 34 76 50.23

17 36 3.15 35 28 6.81

18 53 60.59

Page 37: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

• Ex 12.10.2(n>30일 경우)

0

0.75 1 4.37 1.96S

r Z r ns

reject H

• Z가 너무 크거나(반비례관계) 들이 크고 Z가 너무 작거나(비례관계) 들이 작고

01 2if Z Z then reject H

2

id

2

id

• larger Z (- asso) larger smaller Z(+asso) smaller

2

id

2

id

식별번호 순서(𝑿) 순서(𝒀) 𝒅𝒊 𝒅𝒊𝟐 식별번호 순서(𝑿) 순서(𝒀) 𝒅𝒊 𝒅𝒊

𝟐

1 32.5 35 − 2.5 6.25 19 17 9 8.0 64.00

2 35 27 8.0 64.00 20 28 25 3.0 9.00

3 34 23 11.0 121.00 21 21.5 21 0.5 0.25

4 25 32 − 7.0 49.00 22 23.5 22 1.5 2.25

5 32.5 19 13.5 182.25 23 13.5 24 − 10.5 110.25

6 19.5 11 8.5 72.25 24 27 34 − 7.0 49.00

7 1 14 − 13.0 169.00 25 6 3 3.0 9.00

8 13.5 8 5.5 30.25 26 12 7 5.0 25.00

9 9 6 3.0 9.00 27 2 15 − 13.0 169.00

10 15 10 5.0 25.00 28 21.5 31 − 9.5 90.25

11 26 33 − 7.0 49.00 29 29 26 3.0 9.00

12 10 17 − 7.0 49.00 30 11 18 − 7.0 49.00

13 4 1 3.0 9.00 31 7 4 3.0 9.00

14 17 13 4.0 16.00 32 30 28 2.0 4.00

15 23.5 20 3.5 12.25 33 17 12 5.0 25.00

16 5 2 3.0 9.00 34 31 29 2.0 4.00

17 8 5 3.0 9.00 35 3 16 -13.0 169.00

18 19.5 30 − 10.5 110.25

𝑑𝑖2 =1788.5

Page 38: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

12.11 비모수 회귀분석 (non-parametric regression)

• Ex. 12.11.1 [Theil’s method]

𝛽 = median 𝑆12, ⋯ , 𝑆𝑛−1,𝑛 ,

𝑆𝑖𝑗 = 𝑦𝑗 − 𝑦𝑖 / 𝑥𝑗 − 𝑥𝑖 , 𝑆12 =164−163

57.4−53.9= 0.285

테스토스테론(𝐘) 163 164 156 151 152 167 165 153 155

구연산(𝐗) 53.9 57.4 41.0 40.0 42.0 64.4 59.1 49.9 43.2

0.285 0.470 0.202

0.643 0.655 0.126

0.487 0.669 0.965

0.863 0.384 1.304

0.747 0.588 0.747

5.000 0.497 0.633

0.924 0.732 − 0.454

0.779 0.760 1.250

− 4.00 0.377 2.500

0.500 2.500 0.566

0.380 1.466 0.628

0.428 − 0.337 − 0.298

절편의 추정 (Estimating intercept )

𝛽 = median 𝑦1 − 𝛽 1𝑥1, ⋯ , 𝑦𝑛 − 𝛽 1𝑥𝑛

𝛽 =

median mean 𝑦1 − 𝛽 1𝑥1, 𝑦2 − 𝛽 1𝑥2 , mean 𝑦1 − 𝛽1𝑥1, 𝑦3 − 𝛽1𝑥3 , ⋯ , mean 𝑦𝑛−1 − 𝛽1𝑥𝑛−1, 𝑦𝑛 − 𝛽1𝑥𝑛

Page 39: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

•Mod20.sas /* File name : mod20.sas

Nonparametric One-Way Anova */

options pageno=1 nodate ls=130

ps=60 nocenter;

filename inbrakes

'd:\myweb\intro\taillite.dat';

data one;

infile inbrakes ;

input id vehtype group positn

speedzn resptime follotme

folltmec;

if group=1;

label vehtype='Vehicle Type'

group='Group - Light On=1

Light Off=2'

positn='Light Position'

speedzn='Speed Zone'

resptime='Response Time'

follotme='Following Time

in Vedio Frames'

folltmec='Following Time

in Categories‘;

run;

proc sort; by vehtype;

/* Let's do one-way ANOVA to see

the effect of vehicle type */

proc anova;

class vehtype;

model resptime=vehtype;

title 'Parametric ANOVA analysis';

run;

/* What's wrong with this ?

We didn't check the normality

assumption.

Let's do proc univariate to

check the normality*/

proc univariate normal plot;

var resptime;

by vehtype;

title 'Normality Check';

run;

Page 40: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

/* NOT NORMALLY DISTRIBUTED

>> NONPARAMETRIC ANOVA */

proc npar1way wilcoxon;

class vehtype;

var resptime ;

title 'Nonpara One-Way ANOVA for

Tail Light Study';

run;

/* The other way is transformation.

Let's take log transformation

so that we have normal

distribition.*/

data t;

set one;

t=log(resptime);

label t='ln (response time)';

run;

proc sort; by vehtype;

proc univariate normal plot;

var t;

by vehtype;

title 'Normality Check for

transformed variable';

run;

/* The transformed variable

seems to normally

ditributed.

Then we can do parametric

ANOVA with normality

assumption

*/

proc anova;

class vehtype;

model t=vehtype;

title 'ANOVA for the log

transformed response time';

run;

Page 41: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (1) Smoothing

• Consider X Y plot.

• Draw a regression line which requires no parametric assumptions

• The regression line is not linear

• The regression line is totally dependent on the data

Two components of smoothing

• Kernal function : How to calculate weighted mean

• Bandwidth : width of the window (span), determines the smoothness of the regression line; wider > smoother

Page 42: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (2)

Uniform Kernel

Page 43: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (3)

Triangular Kernel

Page 44: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (4)

Normal Kernel

Page 45: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (5)

Default Lowess line : Span=0.5

Page 46: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (6)

Lowess line : Span=0.2

Page 47: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

Nonpapametric Smoothing (7)

Lowess line : Span=0.1

Page 48: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

data A; input x y @@;

datalines; 1 4 2 9 3 20 4 25 5 1 6 5 7 -4 8 12 ;

title "sm45 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm45 value=circle height=2; /* note that x is sorted */ run;

title "sm70 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm70 value=circle height=2; /* note that x is sorted */

run;

title "sm20 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm20 value=circle height=2; /* note that x is sorted */

run;

Page 49: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

require(graphics)

plot(cars, main = "lowess(cars)")

lines(lowess(cars), col = 2)

lines(lowess(cars, f = .2), col = 3)

legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)

Page 50: Chapter 12 비모수 통계학 (nonparametric analysis)hosting03.snu.ac.kr › ~hokim › int › 2017 › chap_12.pdf · 2017-06-02 · (Wilcoxon’s signed rank test) ... rank rank

data<- read.csv("http://hosting03.snu.ac.kr/~hokim/isee2010/data2010.csv", sep=",")

head(data)

data$date=as.Date(data$date)

sl <-subset(data, ccode==11 )

boxplot(meanpm10~yy, ylab=expression(PM[10]), axes=T, data=sl)

plot(sl$date,sl$meanpm10, ylab=expression(PM[10]), xaxt='n', cex=0.6)

x.at<-seq(as.Date("2000-01-01"), as.Date("2007-12-31"),"year")

xname<-c("'00-01-01","'01-01-01", "'02-01-01", "'03-01-01",

"'04-01-01", "'05-01-01", "'06-01-01", "'07-01-01")

axis(side=1, at=x.at, labels=xname)

table(is.na(sl$meanpm10))

which(is.na(sl$meanpm10))

sl[829,"meanpm10"]<-(sl[828,"meanpm10"]+ sl[830,"meanpm10"])/2

sl[829,"meanpm10"]

plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(a)f=.1", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.1), col="red", lwd=2)

axis(side=1, at=x.at, labels=xname)

plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(b)f=.05", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.05), col="red", lwd=2)

axis(side=1, at=x.at, labels=xname)

plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(c)f=.5", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.5), col="red", lwd=2)

axis(side=1, at=x.at, labels=xname)

par(mfrow=c(3,1))