Download pdf - ams572_ch14

7/30/2019 ams572_ch14

1/133

What is NONPARAMETRIC Statistics?

Normality doesnt hold for all data. Similarly, some data may not have any

particular fixed distribution such as Binomialor Poisson. Such sets of data are called Non-parametric

data or Distribution-free .

We use nonparametric tests for thesepopulations.

7/30/2019 ams572_ch14

2/133

the population distribution is highly skewed or very heavily tailed.Median is a better measure to find the center than the mean.

The sample size is small (usually less than 30)and not normal

(we find that out using SAS orother statistical

programs).

When do we use NONPARAMETRIC Statistics?

7/30/2019 ams572_ch14

3/133

14.1.1 Sign Test and Confidence Interval

Sign test for a Single Sample

We want to test a hypothesis at asignificant level if the true

median is above a certain knownvalue .

7/30/2019 ams572_ch14

4/133


Example:THERMOSTAT DATA:

Perform the sign test todetermine if the median

setting is different fromthe design setting of 200 0 F.

.

202.2 203.4

200.5 202.5

206.3 198.0

203.7 200.8

201.3 199.0

7/30/2019 ams572_ch14

5/133


STEP 1:

We find the signs of each sample by comparingwith 200.

.

STEP 2:

0 0

0

200

200a

H

H

202.2 > 200 203.4 > 200

200.5 > 200 202.5 > 200

206.3 > 200 198.0 < 200

203.7 > 200 200.8 > 200

201.3 > 200 199.0 < 200

0 8i s x u 0 2i s x u

7/30/2019 ams572_ch14

6/133


.

What do we do if there is a Tie?

0i x

1) We can break the tie at random, meaning putting it with either s or s . For a large sample it may not make a big difference,but the result may vary significantly for a small sample.

2) We can contribute towards each s and s . However, wecan not calculate the p-value using fractions. So, we should

not do it.3) We exclude the ties. This may reduce the sample size and

hence the power of the test. For a large sample, it should notbe a big deal.

7/30/2019 ams572_ch14

7/133


.

STEP 3:

Why Binomial?

Well, S+ and S- are the only two variables in the sample set, n.

.

8~ (10, )10 s Bin 2~ (10, )10 s Bin

( )s

P s pn

( ) 1 1 s s P s pn n

1

s s n

s s n

n n n

s p

n

both s and s are binomially distributed withprobability p and 1-p respectively.

~ ( , )S Bin n p AND ~ ( ,1 )S Bin n p

7/30/2019 ams572_ch14

8/133


.

STEP 4:

Since they both S+ and S- have the same binomial distribution, wecan denote a common r.v S:

S ~ bin (n, ).

When 0 0: H is true, the 0 is the true median.Therefore, s s and p=1/2, because the number of

samples above the median is equal to the number of samples below the media. Consequently, 1-p= too.

1~ ( , )2

S Bin n and 1~ ( , )2

S Bin n

1~ (10, )2S Bin

7/30/2019 ams572_ch14

9/133


.

Now we can calculate the p-value using thebinomial distribution:

alternatively,

1010

8

10 18

8 20

10. 55

2

nn

i s i

n P value P S s P S

i

102

00

01 1

.022

0 12 2

55i

n s

i

n P valu P S e P S s

i

7/30/2019 ams572_ch14

10/133


STEP 5:We compare our p-value with thesignificant level:

P-value= .055

At = .05, P-value = .055>.05.

We fail to reject the null hypothesis.

7/30/2019 ams572_ch14

11/133


For large sample :(n 20)We can also approximate it by normal distribution.

2n E S E S and 4

nVar S Var S

/ 2 1/ 2

/ 4

s n z

n

where is the continuity

correction.

We reject the null hypothesis if z z .

Equivalently, ,12 2 4 nn n s z b after rearranging.

7/30/2019 ams572_ch14

12/133

14.1.1 Sign Test for matched pairs

Sign test for Matched PairsWhen observations are matched

Then:- S + = the positive differences- S - = the negative differences

Note: the magnitude of the differences is notknown

When pairs are matched P (A,B)= P (B,A)

7/30/2019 ams572_ch14

13/133


No.Method

AMethod

B Difference NoMethod

AMethod

B Differences

i x i yi d i i x i yi d i

1 6.3 5.2 1.1 14 7.7 7.4 0.3

2 6.3 6.6 -0.3 15 7.4 7.4 0

3 3.5 2.3 1.2 16 5.6 4.9 0.7

4 5.1 4.4 0.7 17 6.3 5.4 0.95 5.5 4.1 1.4 18 8.4 8.4 0

6 7.7 6.4 1.3 19 5.6 5.1 0.5

7 6.3 5.7 0.6 20 4.8 4.4 0.4

8 2.8 2.3 0.5 21 4.3 4.3 0

9 3.4 3.2 0.2 22 4.2 4.1 0.1

10 5.7 5.2 0.5 23 3.3 2.2 1.1

11 5.6 4.9 0.7 24 3.8 4 -0.2

12 6.2 6.1 0.1 25 5.7 5.8 -0.1

13 6.6 6.3 0.3 26 4.1 4 0.1

7/30/2019 ams572_ch14

14/133


Note that for the matched paired test all tiedentries ( x i = y i ) are disregarded

Thenn=23 since x i = y i , for i =15,18,21

S + = 20S - = 3

Using

S -n/2 1/ 2

/ 4 z

n

7/30/2019 ams572_ch14

15/133


.

Two sided p-value:2(1- (3.336))=0.0008

-This indicates a significant differencebetween Method A and Method B

20-23/2-1/23.336

23/4

7/30/2019 ams572_ch14

16/133

14.1.2 Wilcoxon Signed Rank Test

Who is Frank Wilcoxon?

Born: September 2 1892Wilcoxon was born to American parents in County Cork, Ireland.

Frank Wilcoxon grew up in Catskill, New York although hedid receive pat of his education in England. In 1917 hegraduated from Pennsylvania Military College with a B.S. Hethen received his M.S. in Chemistry in 1921 from RutgersUniversity. In 1924 he received his PhD from CornellUniversity in Physical Chemistry.In 1945 he published a paper containing two tests he is mostremembered for, the Wilcoxon signed-rank test and theWilcoxon rank-sum test. His interest in statistics can beaccredited to R.A. Fisher's text, Statistical Methods for Research Worker (1925).

Over the course of his career Wilcoxon published 70 papers .

7/30/2019 ams572_ch14

17/133


Who is Frank Wilcoxon?

7/30/2019 ams572_ch14

18/133


Alternative method to the Sign Test

The Wilcoxon Signed Rank Test

Improves on the Sign Test.Unlike the sign test the Wilcoxon Signed Rank

Test not only looks at whether x i> or x i

7/30/2019 ams572_ch14

19/133


.

Note : Wilcoxon Signed Rank Testassumes that the observedpopulation distribution issymmetric.

(Symmetry is not required for theSign Test)

7/30/2019 ams572_ch14

20/133


Step 1:Rank order the differences d i in terms of their absolute values.

Step 2:w += sum r i (ranks) of the positive differences.w -= sum r i (ranks) of the negative differences.

if we assume no tiesThen

w + + w - = r 1+ r 2 + + r n = 1 + 2 + 3 + + n= ( 1)

2

n n

7/30/2019 ams572_ch14

21/133


Step 3:reject H 0 if w + is large or if w - is small!!

7/30/2019 ams572_ch14

22/133


The size of w + , w - needed to reject H 0 at isdetermined using the distributions of thecorresponding W + , W - r.v.s when H0 is true.Since the null distributions are identical and

symmetric the common r.v. is denoted by W.

p-value:= P (W w+) = P (W w-)

Reject H 0 if p-value is

7/30/2019 ams572_ch14

23/133


1 if ith rank corresponds to a positive sign

0 if ith rank corresponds to a negative sign

1

n

ii

W iZ E (w

+) = E( iZ

i)

= E(1Z 1+2Z 2++nZn)

= E(1Z 1)+E(2Z 2)++E(nZn)

= 1E(Z 1) + 2E(Z 2)++nE(Zn), [ E(Z 1)= E(Z 2)==E(Zn) ]

= 1E(Z 1) + 2E(Z 1)++nE(Z1)

= (1+2+3++n) E(Z1)

( 1)

2

n n p

Zi ~ Bernoulli (P)P=p(x i > 0) , P=1/2

i Z

7/30/2019 ams572_ch14

24/133


Var (W+) = Var (iZi)

= Var(1Z 1+2Z 2++nZn)

= Var(1Z 1)+Var(3Z 2)++Var(nZn)

= 1Var(Z 1)+ 2Var(Z 2)++nVar(Zn)

= 1Var(Z 1)+ 2Var(Z 1)++nVar(Z1)

= (1+2++n) Var(Z1)

( 1)(2 1)(1 )

6n n n

p p

7/30/2019 ams572_ch14

25/133


Then a Z-test is based on the statistic:

H 0 : = 0 H a : 0

Reject H 0 if Z Z

( 1)w 1/ 2

4( 1)(2 1)

24

n n

n n n z

7/30/2019 ams572_ch14

26/133

H 0 : = 0 H a : 0

reject H 0 if Z Z

H 0 : = 0 H a : 0

reject H 0 if (1) Z Z (2) Z -Z

The two sided p-value is

2 p ( W wmax ) = 2 p( W wmin )


7/30/2019 ams572_ch14

27/133

14.1.2 Summary

Sign Rank Test VS Sign TestWeighs each signeddifference by its rank

If the positive differences aregreater in magnitude than thenegative differences they geta higher rank resulting in alarger value of w +This improves the power of

the signed rank test

But it also affects the type Ierror if the populationdistribution is NOT symmetric

Counts the number of differences

7/30/2019 ams572_ch14

28/133

YOU WOULDNT WANT THIS

TO HAPPEN!

7/30/2019 ams572_ch14

29/133

14.1.2 Summary

Sign Rank Test VS Sign Test

PreferredTest

And the winner is

7/30/2019 ams572_ch14

30/133

14.1.2 Summary

I pity the Fu t hat messes with t he WilcoxonSi gned Rank Test !!!
http://imdb.com/gallery/mptv/1060/12608-0013.jpg.html?seq=2http://imdb.com/gallery/mptv/1060/12608-0013.jpg.html?seq=2http://imdb.com/gallery/mptv/1060/12608-0013.jpg.html?seq=2

7/30/2019 ams572_ch14

31/133

No.

Method A

MethodB Difference Rank No Method A MethodB Differences Rank

i Xi Yi Di i Xi Yi Di

1 6.3 5.2 1.1 19.5 14 7.7 7.4 0.3 8

2 6.3 6.6 -0.3 8 15 7.4 7.4 0 -

3 3.5 2.3 1.2 21 16 5.6 4.9 0.7 15

4 5.1 4.4 0.7 15 17 6.3 5.4 0.9 17.5

5 5.5 4.1 1.4 23 18 8.4 8.4 0 -

6 7.7 6.4 1.3 22 19 5.6 5.1 0.5 12

7 6.3 5.7 0.6 17.5 20 4.8 4.4 0.4 10

8 2.8 2.3 0.5 12 21 4.3 4.3 0 -

9 3.4 3.2 0.2 5.5 22 4.2 4.1 0.1 2.5

10 5.7 5.2 0.5 12 23 3.3 2.2 1.1 19.5

11 5.6 4.9 0.7 15 24 3.8 4 -0.2 5.5

12 6.2 6.1 0.1 2.5 25 5.7 5.8 -0.1 2.5

13 6.6 6.3 0.3 8 26 4.1 4 0.1 2.5


7/30/2019 ams572_ch14

32/133


.

w - = 8 + 5.5 + 2.5 = 16thenw + =

Two sided p-value:

2(1 (3.695)) = 0.0002

23(24)260 1/ 24 3.695

23(24)(47)24

z

23(24)16 260

2

7/30/2019 ams572_ch14

33/133


If d i = 0 then the observations are dropped andonly the nonzero differences are retained

Given |d i|s are tied for the same rank a newrank is assigned to them called the midrank.

7/30/2019 ams572_ch14

34/133


No. A B Diff R No. A B Diff R

i x i yi d i r i i x i yi d i r i

15 7.4 7.4 0 - 8 2.8 2.3 0.5 12

18 8.4 8.4 0 - 10 5.7 5.2 0.5 12

21 4.3 4.3 0 - 19 5.6 5.1 0.5 12

12 6.2 6.1 0.1 2.5 7 6.3 5.7 0.6 18

22 4.2 4.1 0.1 2.5 4 5.1 4.4 0.7 15

25 5.7 5.8 -0.1 2.5 11 5.6 4.9 0.7 15

26 4.1 4 0.1 2.5 16 5.6 4.9 0.7 15

9 3.4 3.2 0.2 5.5 17 6.3 5.4 0.9 18

24 3.8 4 -0.2 5.5 1 6.3 5.2 1.1 20

2 6.3 6.6 -0.3 8 23 3.3 2.2 1.1 20

13 6.6 6.3 0.3 8 3 3.5 2.3 1.2 21

14 7.7 7.4 0.3 8 6 7.7 6.4 1.3 22

20 4.8 4.4 0.4 10 5 5.5 4.1 1.4 23

7/30/2019 ams572_ch14

35/133

In the new table we see that whenn=12,22,25,26 |d i|=0.1

Then d 1=d 2=d 3=d 4=0.1

Then

Therefore the new ranks of the abovedifferences are not 1,2,3,4 but rather 2.5


1 2 3 4 10 2.54 4

7/30/2019 ams572_ch14

36/133

14.2 Inferences for independent samples

.

1. Wilcoxon rank sum testAssumption : There are no ties in the two samples.Hypothesis :

Step1 : Rank all observationsStep2 : Sum the ranks of the two samples

separately( =sum the ranks of the xs, =sum the ranks of the ys)

Step3 : Reject null hypothesis if is large or if issmall

Problem : Distributions of are not same when

0 0 0: . :a H vs H

1w2w

1w 2w

1 2,W W 1 2n n

7/30/2019 ams572_ch14

37/133

14. 2.1 Wilcoxon-Mann-Whitney Test

.

1. Mann Whitney test

Step1 : Compare each with each( = #pairs in which , = #pairs in which )

Step2 : Reject if is large or is small

Rank sum test statistic :P-value :For large samples, we approximate to normal, when

Rejection rule : If then reject

i x j y

1u i j x y 2u i j x y

1 1 2 21 1 2 2

( 1) ( 1),

2 2n n n n

u w u w

1 2( ) ( ) P U u P U u

1 2 1 2 ( 1)( ) , ( )2 12

n n n n N E U Var U

11

( )2 ,

( )

u E U Z z

Var U

0 H

0 H 1u 2u

7/30/2019 ams572_ch14

38/133

14. 2.1. Wilcoxon-Mann-Whitney Test

Example: Failure Times of Capacitors

Control Group Stressed Group

5.2 17.1 1.1 7.2

8.5 17.9 2.3 9.1

9.8 23.7 3.2 15.212.3 29.8 6.3 18.3

7.0 21.1

Control Group Stressed Group4 13 1 7

8 14 2 9

10 17 3 12

11 18 5 15

6 16

: c.d.f. of the control groupand : c.d.f. of the stressedgroup

T.S. : w1=95, w 2=76, u 1 =59, u 2 =21 P-value =.051 from Table A.11 Compare with large sample

normal approx :

P-value = .052

Table 1 : Times to Failure

Table 2 : Ranks of Times to Failure

1 F 2 F

0 1 2 1 2: . :a H F F vs H F F

(8)(10) 159

2 2 1.643(8)(10)(19)

12

Z

7/30/2019 ams572_ch14

39/133


.

Null Distribution of the Wilcoxon-Mann-Whitney Test Statistic

Assumption:

Under H 0 , all N= n 1 + n 2 observationscome from the common distributionF 1 =F 2 .

All possible orderings of theseobservations with n 1 coming from F 1 and n 2 coming from F 2 are equally

likely.

7/30/2019 ams572_ch14

40/133


Example: Find the null distribution of W1

and U 1 when n 1 =2 and n 2 =2.

Ranksw1 u1

Null distn of W1and U1

1 2 3 4 w1 u1 p

x x y y 3 0 3 0 1/6

x y x y 4 1 4 1 1/6

x y y x 5 2 5 2 2/6

y y x x 7 4 6 3 1/6y x y x 6 3 7 4 1/6

y x x y 5 2

7/30/2019 ams572_ch14

41/133

14. 2.2. Wilcoxon-Mann-Whitney Confidence Interval

1 2

1 2

1 1 2 2

1

and distrbutions belong to a location parameter familywith location paramaters and , respectively.

( ) ( ) and ( ) ( )

where F is a common unknown distribution function,

F F

F x F x F y F y

2and are the respective population medians.

7/30/2019 ams572_ch14

42/133


1 2, 1 2

1 2

1 2( 1) ( )

Step 2

Let be the lower /2 critical point

of the null distribution of the statistics.

Then a 100 1 )% CI for is given by

,n n /

u N u

u u

U

( -

d d

1 2

1 2

1 2

A CI for can be obtained by inverting the Mann-Whitney test.The procedure is as follows:

Step 1

Calculate all differences(1 , 1 )

and rank them:

ij i j

N n nd x y i n j n

(1) (2) ( )

where ( )is the ordered values of the differences N

ij i j

d d d

d i d x y

7/30/2019 ams572_ch14

43/133


Example

Find 95% CI for the difference between the median failure times of the control groupand thermally stressed group of capacitors the data from ex 14.7.

1 2 1 28, 10, 8*10 80

The lower 2.2% critical point of the distribution of U 17

By symmetry the upper 2.2% critical point 80-17 63

Setting / 2 0.022 , 1- 1 0.044 0.956

95.6 % CI for the diffe

n n N n n

(18) (63)

rence between the median failure times

[ , ]

where the ( ) are the ordered values of the differences

Differences are calcucated in an array form in Table 14.7.Counting the 18th

ij i j

ij

d d

d i d x y

d

(18) (63)

ordered differences from the lower and high ends.

Therefore, 95.6% CI for the difference of the two medians

[ , ] [-1.1, 14.7]d d

7/30/2019 ams572_ch14

44/133


Table A.11 (pg. 684)

n1 n2 u1=upper critical point(80-u1=lower critical point)

P(W>w1)Upper Tail

Probabilities

8 10 59 (80-59=21) 0.051

10 62 (80-62=18) 0.027

10 63 (80-63=17) 0.022

10 66 (80-66=14) 0.010

10 68 (80-68=12) 0.006

7/30/2019 ams572_ch14

45/133

14.3 Inferences for Several Independent Samples

One-way layout experiment

Completely Randomized Design

Comparing a > 2 treatment. The available experimental units are

randomly assigned to each treatment. No. of experimental units in different

treatment groups does not have to besame.

The data classified according to the level of a singletreatment factor.

7/30/2019 ams572_ch14

46/133

Treatment

1 2 a

X1 1X1 2

::

X1 n1

X2 1X2 2

::

X2 n2

:::

Xa 1 Xa 2

::

Xa na

Sample Median 1 1 a

Sample SD S 1 S 2 S a


Example of One-way layout experiment

Comparing effectiveness of different pillson migraine.

Comparing duration of different tires.

etc

7/30/2019 ams572_ch14

47/133


Assumption

1. The data on the each treatmentform a random sample from a continuousc.d.f. F i.

2. Random samples areindependent.

3. Fi( y ) = F ( y

i) ,

where i is the locationof parameter of F i

i = Median of F i

F1

F2

F a

1

2

a

7/30/2019 ams572_ch14

48/133


Hypothesis

H0: F 1 = F 2 = = Fa H1: F i < F j for some i = j

It can be changed to

H0: 1 = 2 = = a

H1: i > j for some i = j

F1

F2

F a

1

2

a

Can we say that all Fis are the same?

7/30/2019 ams572_ch14

49/133

14.3.1 Kruskal Wallis Test

STEP 1:

STEP 2:

Rank all N = n i observations in ascendingorder. Assign mid-ranks in case of ties.

r ij = rank (y ij)

a i=1

N = r ij = 1 + 2 + + N = N ( N + 1 )

2

E [ r ] =( N + 1 )

2

Calculate rank sums r i = r ij

and averages r i = r i / n i, i = 1, 2, , a. j=1 n i

7/30/2019 ams572_ch14

50/133


STEP 3:

STEP 4:

Calculate the Kruskal-Wallis test statistic

kw = n i ( r i - )12

N ( N + 1 ) i=1

a ( N + 1 )

2

2

= - 3( N + 1 )12 N ( N + 1 ) i=1

a ri n i

2

Reject H 0 for large value of kw .

If nis are large, kw follows chi-square dist. witha-1 degrees of freedom.

7/30/2019 ams572_ch14

51/133


Example :

NRMA, the worlds biggest car insurancecompany, has decided to test the durability of tires from 4 major companies.

7/30/2019 ams572_ch14

52/133


Example :

Average Test ScoresDifferent Tires from 4 major co.

14.5923.4425.4318.1520.8214.0614.26

20.2726.8414.7122.3419.4924.9220.20

27.8224.9228.6823.3232.8533.9023.42

33.1626.9330.4336.4337.0429.7633.88

Ranks of Average Test Scores

313165912

8174

106

14.57

1914.52011232612

24182227282125

1

2824.92

24.92

49 66.5 125.5 165

7/30/2019 ams572_ch14

53/133

Example :


7/30/2019 ams572_ch14

54/133

Example :


kw = - 3( N + 1 )12

N ( N + 1 ) i=1

a ri

n i

2

= + + + + 12

28(29)

(49)

7

(66.5)

7

(125.5)

7

(165)

7

22 22

- 3(29)

= 18.134

[ ]

7/30/2019 ams572_ch14

55/133

Example :


kw = 18.34 > X 3,.005 = 12.837

7/30/2019 ams572_ch14

56/133

14.3.2 Pairwise Comparisons

Comparing 2 groups among treatments

H0: E ( R i Rj ) = 0 and

Var( R i Rj ) = +N ( N + 1 )

12

1

n i

1

n j ( )

For large n is, R i Rj follows approximatelynormally distributed.

z ij = r i - r jN ( N + 1 )

12

1

n i

1

n j ( )+

7/30/2019 ams572_ch14

57/133


To control the type I familywise error rateat level

IzijI statistic should be referred to appropriate

Studentized range distribution.

Tukey Method ( Chapter. 12)

7/30/2019 ams572_ch14

58/133


+N ( N + 1 )

12

1

n i

1

n j ( )

q a,,

2I r i - r j I >

q a,,

2I z ij I > or

No. of treatment group compared = a Degree of freedom =

(assumption : sample is large)

Compare with critical constant q a, ,. .

14 3 2 P i i C i

7/30/2019 ams572_ch14

59/133

Example :

Ranks of Average Test ScoresDifferent Tires from 4 major co.

313165912

8174

106

14.57

1914.52011232612

24182227182125

49 66.5 125.5 165

14.3.2 Pairwise Comparison

14 3 2 P i i C i

7/30/2019 ams572_ch14

60/133

Example :

14.3.2 Pairwise Comparison

Let be .05.

1

7

1

7( )= +3.63

2

(28)(29)

12= 11.29

I r 1 r 4 I , I r 1 r 4 I > 11.29

We differfrom

GOODYEAR!!!

+N ( N + 1 )

12

1

n i

1

n j ( )q

a,,

2

14 4 I f f S l M h d S l

7/30/2019 ams572_ch14

61/133

14.4 Inferences for Several Matched Samples

Randomized block design

Friedman test

treatment groups and blocks.2a 2b

A distribution-free rank-based test for comparing thetreatments in the randomized block design

HypothesisH0: F 1j = F 2j = = F aj H1: F ij < F kj for some i = k

It can be changed toH0: 1 = 2 = = a H1: i > k for some i = k

14 4 1 F i d T t

7/30/2019 ams572_ch14

62/133

14.4.1 Friedman Test

STEP 1:

STEP 2:

Rank all N = n i observations in ascendingorder. Assign mid-ranks in case of ties.

r ij = rank (y ij)

a i=1

Calculate rank sums r i = r ij , i = 1, 2, , a. j=1 b

14 4 1 F i d T t

7/30/2019 ams572_ch14

63/133


STEP 3:

STEP 4:

Calculate the Friedman test statistic

fr = ( r i - )12

ab ( a + 1 ) i=1

a b ( a + 1 )

2

2

= - 3b( a + 1 )12 ab ( a + 1 ) i=1

a ri

2

Reject H 0 for large value of fr .

If nis are large, fr follows chi-square dist. witha-1 degrees of freedom.

14 4 1 F i d T t

7/30/2019 ams572_ch14

64/133


Example :

Drip loss in Meat Loaves

OvenPosition

Batch Ranksum1 Rank 2 Rank 3 Rank

1

2345

678

7.33

3.223.286.443.83

3.285.064.44

8

12.574

2.565

8.11

3.725.115.786.50

5.115.114.28

8

1467

442

8.06

4.284.568.617.72

5.567.836.33

7

1285

364

23

38.52116

9.51611

14 4 1 Friedman Test

7/30/2019 ams572_ch14

65/133


Example : Friedman test statistic equals

fr = - 3b( a + 1 )12

ab ( a + 1 ) i=1 a

ri 2

=

12

8*3*9 [ 23 +3 +8.5 +21 +16 +9.5 +16 +11 ] 3*3*9

2 2 2 2 2 2 22

= 17.583 > = 16.01227,.025

significant differences between theoven positions

However, No. of blocks is only 3; the largesample chi-square approximation may not be

accurate.

14 4 2 Pairwise Comparisons

7/30/2019 ams572_ch14

66/133


Comparing 2 groups among treatments

H0: E ( R i Rj ) =0 and

Var( R i Rj ) =a ( a + 1 )

6b

As in the case of the Kruskal-Wallis test, i and j canbe declared different at significance level if

, , ( 1)62

a ai j

q a ar r b

14 5 Rank Correlation Methods

7/30/2019 ams572_ch14

67/133

14.5 Rank Correlation Methods

.

What is Correlation?

Correlation indicates the strength and directionof a linear relationship between two random

variables.

In general statistical usage, correlation to thedeparture of two variables from independence.

Correlation does not imply causation.

14 5 1 Spearmans Rank Correlation Coefficient

7/30/2019 ams572_ch14

68/133

14.5.1 Spearman s Rank Correlation Coefficient

.

Charles Edward Spearman

Born September 10, 1863

Died September 7, 1945(82 years old)

An English psychologistknown for work in statistics, as a pioneer of factor analysis, and for Spearman's rankcorrelation coefficient.

BTW, he looks like SeanConnery


7/30/2019 ams572_ch14

69/133

Yearly alcohol consumption from wine

Yearly heart disease (Per 100,000)

19 Country Study


.

What are we correlating?

DA

X i Y i U i V i

No. Country Alcoholf Wi

HeartDisease Rank X Rank Y D i

7/30/2019 ams572_ch14

70/133

ATA

y from Wine Deathsi

1 Australia 2.5 211 11 12.5 -1.5

2 Austria 3.9 167 15 6.5 8.5

3 Belgium 2.9 131 13.5 5 8.5

4 Canada 2.4 191 10 9 1

5 Denmark 2.9 220 13.5 14 -0.5

6 Finaland 0.8 297 3 18 -15

7 France 9.1 71 19 1 18

8 Iceland 0.8 211 3 12.5 -9.5

9 Ireland 0.7 300 1 19 -18

10 Italy 7.9 107 18 3 15

11 Netherlands 1.8 167 8 6.5 1.5

12 New Zealand 1.9 266 9 16 -7

13 Norway 0.8 227 3 15 -12

14 Spain 6.5 86 17 2 15

15 Sweden 1.6 207 7 11 -4

16 Switzerland 5.8 115 16 4 12

17 UK 1.3 285 6 17 -11

18 US 1.2 199 5 10 -5

19 W. Germany 2.7 172 12 8 4


7/30/2019 ams572_ch14

71/133


Spearmans Rank Correlation Coefficient

A nonparametric (distribution-free) rank statisticproposed in 1904 as a measure of the strength

of the associations between two variables.

The Spearman rank correlation coefficient canbe used to give a measure of monotone

association that is used when the distribution of the data make Pearson's correlation coefficientundesirable.


7/30/2019 ams572_ch14

72/133

1

2 2

1 1

( )( )

( ( ) )( ( ) )

n

i ii

s n n

i ii i

u u v vr

u u v v

2

12

61

( 1)

n

ii

s

d r

n n

If Di is integer then:


Relevant Formulas


7/30/2019 ams572_ch14

73/133

2

12

61

( 1)

n

ii

s

d r

n n

(6)(2081.5)1 0.826(19)(360) s

r


Examples

From previous data we calculate:


7/30/2019 ams572_ch14

74/133


Hypothesis Testing Using Spearman

Ho: X and Y are independent

Ha : X and Y are positivelyassociated


7/30/2019 ams572_ch14

75/133


For large values of N (> 10) is approximatedby the normal distribution with a mean

( ) 0 s E R1( )

1 sVar R

n

1 s z r n Using the test statistic:


7/30/2019 ams572_ch14

76/133


Examples

From previous data we calculate:

0.826 18 3.504 z 1 s z r n

P-value = 0.0004

14.5.2 Kendalls Rank Correlation Coefficient

7/30/2019 ams572_ch14

77/133

14.5.2 Kendall s Rank Correlation Coefficient

born September 6, 1907 died March 29, 1983 (76 years

old) Maurice Kendall was born in

Kettering, North Hampton shire He studied mathematics at St.

John's College, Cambridge,where he played cricket andchess

After graduation as aMathematics Wrangler in 1929,he joined the British Civil Service

in the Ministry of Agriculture. Inthis position he becameincreasingly interested in usingstatistics.

Developed the rank correlation

coefficient in 1948.


7/30/2019 ams572_ch14

78/133


Kendalls Rank Correlation Coefficient

A pair of Bivariate random variables( , )i i X Y ( , ) j j X Y

( )( ) 0i j i j X X Y Y Which implies:

i j X X ANDor

AND

i jY Y

i j X X i jY Y

Concordant:


7/30/2019 ams572_ch14

79/133


Kendalls Rank Correlation Coefficient Discordant:

( )( ) 0i j i j X X Y Y

i j X X AND

or AND

i jY Y i j X X i j

Y Y

Which implies:


7/30/2019 ams572_ch14

80/133


Kendalls Rank Correlation Coefficient Tied Pair:

OR

OR BOTH

Which implies:

( )( ) 0i j i j X X Y Y

i j X X i jY Y


7/30/2019 ams572_ch14

81/133

Relevant Formula

( ) (( )( ) 0)c i j i j P Concordant P X X Y Y

( ) (( )( ) 0)d i j i j P Discordant P X X Y Y

c d

1 1

c d


7/30/2019 ams572_ch14

82/133

Relevant Formula

Nc

= # of Concordant PairsNd = # of Discordant Pairs

c d N N

N


7/30/2019 ams572_ch14

83/133

Formula Continued

1 2

g j

x j

aT

1 2

h j

y j

bT

( )( )c d

x y

N N

N T N T

If there are ties the formula is modified:

Suppose there are g groups of tied X is with a j tiedobservations in the j th group and h groups of tied

Yis with b j tied observations in the j th group.


7/30/2019 ams572_ch14

84/133

Formula Explanation Five pairs of observations:(x,y)=(1,3)(1,4)(1,5)(2,5)(3,4)

There is g=1 group of a 1=3 tied

xs equal to 1 and there are h=2groups of tied ys Group 1 has b 1=2 tied ys qual to

4 and group 2 has b 2=2 tied ysequal to 5.


7/30/2019 ams572_ch14

85/133

Formula Example continued

3 23 1 4

2 2 xT

2 2 1 1 22 2 y

T

Data

7/30/2019 ams572_ch14

86/133

i Country X i Yi Nci Ndi Nti 1 Ireland 0.7 300 0 18 02 Iceland 0.8 211 3 11 33 Norway 0.8 227 2 13 14 Finland 0.8 297 0 15 05 US 1.2 199 5 9 06 UK 1.3 285 0 13 07 Sweden 1.6 207 3 9 08 Netherlands 1.8 167 5 5 19 New Zealand 1.9 266 0 10 0

10 Canada 2.4 191 2 7 011 Australia 2.5 211 1 7 0

12West

Germany 2.7 172 1 6 013 Belgium 2.9 131 2 4 014 Denmark 2.9 220 0 5 015 Austria 3.9 167 0 4 016 Switzerland 5.8 115 0 3 017 Spain 6.5 86 1 1 018 Italy 7.9 107 0 1 019 France 9.1 71 0 0 0

Nc=25 N d=141 N t=5


7/30/2019 ams572_ch14

87/133

25 1410.690(171 4)(171 2)

Testing Example

( )( )c d

x y

N N

N T N T


7/30/2019 ams572_ch14

88/133

Hypothesis Testing

0 : 0

: 0a

H

H

( ) 0 E 2(2 5)

( )9 ( 1)

nVar

n n

9 ( 1)2(2 5)

n n z

n


7/30/2019 ams572_ch14

89/133

25 1410.690

(171 4)(171 2)

(9)(19)(18)0.690 4.128

2(43) z

P-value < 0.0001

Testing Example

14.5.3 Kendalls Coefficient of Concordance

7/30/2019 ams572_ch14

90/133

Kendalls Coefficient of Concordance Q: Why do we need Kendalls Coefficient of

Concordance ? A: It is a measure of association between several

matched samples.

Q: Why not use Kendalls Rank CorrelationCoefficient instead?

A: Because its only works for two samples.


7/30/2019 ams572_ch14

91/133

How can you apply this to real life?

A common & interesting example:

A taste-testing experiment used four tasters torank eight recipes with the following results. Arethe tasters in agreement??Hmm lets find out!

Kendalls Coefficient of Concordance


7/30/2019 ams572_ch14

92/133

Taster RankRecipe 1 2 3 4 Sum

1 5 4 5 4 18 2 7 5 7 5 24 3 1 2 1 3 7 4 3 3 2 1 9 5 4 6 4 6 20 6 2 1 3 2 8 7 8 7 8 8 31 8 6 8 6 7 27



7/30/2019 ams572_ch14

93/133

How does it work?

It is closely related to Freidmans test statistic(mentioned in 14.4).

The a treatments are candidates(recipes) .

The b blocks are judges (Tasters) .

Each judge ranks the a candidates .


7/30/2019 ams572_ch14

94/133

Kendalls Coefficient of Concordance The discrepancy of the actual rank sums under

perfect disagreement as defined by:

Is a measure of agreement between the judges

2

1

( 1)2

a

ii

b ad r


7/30/2019 ams572_ch14

95/133

The maximum value of this measure isattained when there is perfect agreement :

It is given by:

2 2 2

max1

( 1) ( 1)2 12

a

i

b a b a ad ib



7/30/2019 ams572_ch14

96/133


Kendallsw statistic : Is an estimate of the variance of the row sums

of ranks Ri divided by the maximum possible

value the variance can take:

This occurs when all judges are in agreement. Hence;

2

2 21max

12 ( 1)( 1) 2

a

ii

d b aw r

d b a a

0 1w


7/30/2019 ams572_ch14

97/133

Kendalls Coefficient of Concordance What relationship does w and fr,

Freidmans statistic have?

Does the Kendalls w statistic relate to theSpearmans rank correlation coefficient?

only when a=2 :

( 1)

fr w

b a

2 1 sr w


7/30/2019 ams572_ch14

98/133


Q: How can we perform statistical tests?

What distribution does it follow?

In order to perform a test on w for statisticalsignificance:

Use chi-square distribution. Use (n-1) degrees of freedom.

2

( )

Kendalls Coefficient of

7/30/2019 ams572_ch14

99/133

Kendall s Coefficient of Concordance

In order to find out whether or not tasters arein agreement, we calculate the Kendallscoefficient of concordance.

Freidmans statistic: fr=24.667 Therefore, w = 24.667/ (4)(8)= 0.881 , Comparing fr=24.667 with =14.067,

since fr exceeds this critical value we conclude that tasters agree .

27,.05

14.6.1 Permutation Tests

7/30/2019 ams572_ch14

100/133

Permutation Test1) General Idea

A permutation test is a type of statistical significance test inwhich a reference distribution is obtained by calculating allpossible values of the test statistic under rearrangements thelabels on the observed data points. Confidence intervals canbe derived from the tests.

2) Inventor The theory has evolved fromthe works of R.A. Fisher andE.J.G. Pitman in the 1930s.


7/30/2019 ams572_ch14

101/133

Major Theory & DerivationThe permutation test finds a p-value as the proportion of

regroupings that would lead to a test statistic as extreme asthe one observed. Well consider the permutation test based on sample averages, although one could computing andcomparing other test statistics

We have two samples that we with to compare

Hypotheses:Ho: differences between two samples are due to chance

Ha: sample 2 tends to have higher values than sample 1 notdue to simply to chanceHa: sample 2 tends to have smaller values than sample 1, not

due simply to chanceHa: there are differences between the two samples not just to

chance


7/30/2019 ams572_ch14

102/133

To See if the observed difference d from our data

supports Ho or one of the selected alternatives, dothe following steps of a Permutation Test:

7/30/2019 ams572_ch14

103/133

Ms. Merry Huilin

Ma~ ^^*

14.6.2 Bootstrap Method

7/30/2019 ams572_ch14

104/133

Bootstrapping is a statistical method for estimating thesampling distribution of an estimator by sampling withreplacement from the original sample, most often with thepurpose of deriving robust estimates of standard errors andconfidence intervals of a population parameter like a mean,median, proportion, odds ratio, correlation coefficient or regression coefficient.

1) General Idea

2) Inventor

Homepage: http://stat.stanford.edu/~brad/ E-mail: [email protected]

Bradley Efron(1938-present)'s work has spanned boththeoretical and applied topics, including empirical Bayes

analysis, applications of differential geometry to statisticalinference, the analysis of survival data, and inference for microarray gene expression data.

14.6.2 Bootstrap Method
http://stat.stanford.edu/~brad/mailto:[email protected]:[email protected]://stat.stanford.edu/~brad/

7/30/2019 ams572_ch14

105/133

3) Major Theory and Derivation

Consider the cases where a random sample of size n is drawn from anunspecified probability distribution, The basic steps in the bootstrapprocedure are following

14.6.3 Jackknife Method

7/30/2019 ams572_ch14

106/133

Jackknife is a statistical method for estimating andcompensating for bias and for deriving robust estimates of standard errors and confidence intervals. Jackknifed statisticsare created by systematically dropping out subsets of dataone at a time and assessing the resulting variation in thestudied parameter.

1) General Idea

2) Inventor Richard Edler von Mises(1883 - 1953)was a scientist who worked on practicalanalysis, integral and differentialequations, mechanics, hydrodynamicsand aerodynamics, constructive geometry,probability calculus, statistics andphilosophy.

14.6.3 Jackknife Method

7/30/2019 ams572_ch14

107/133

3) Major Theory and Derivation

Now we briefly describe how it is possible to obtain the standarddeviation of a generic estimator using the Jackknife method. For simplicity we consider the average estimator. Let us consider thevariables:

where X is the sample average. X (i) is the sample average of the data set deleting the i th point. Then we can define the

average of x(i) :

The jackknife estimate of standard deviation is then defined as:

SAS program%macro _SASTASK_DROPDS(dsname);

%IF %SYSFUNC(EXIST(&dsname)) %THEN %DO;DROP TABLE &dsname;

%END;

7/30/2019 ams572_ch14

108/133

;%IF %SYSFUNC(EXIST(&dsname, VIEW)) %THEN %DO;

DROP VIEW &dsname;%END;

%mend _SASTASK_DROPDS;

%LET _EGCHARTWIDTH=0;%LET _EGCHARTHEIGHT=0;

PROC SQL ;% _S A STA SK _DRO PD S (WORK.SORTTempTableSorted);QUIT ;

PROC SQL ;

CREATE VIEW WORK.SORTTempTableSorted AS SELECT ScoreChange FROM MIHIR.AMS572;

QUIT ;TITLE;TITLE1 "Distribution analysis of: ScoreChange";Title2 " Wilcoxon Rank Sum Test";

ODS EXCLUDE CIBASIC BASICMEASURES EXTREMEOBS MODES MOMENTS QUANTILES;PROC UNIVARIATE DATA = WORK.SORTTempTableSorted

MU0= 0 ;

VAR ScoreChange;HISTOGRAM / NOPLOT ;

RUN ; QUIT ;PROC SQL ;% _S A STA SK _DRO PD S (WORK.SORTTempTableSorted);QUIT ;

SAS program

7/30/2019 ams572_ch14

109/133

Distribution analysis of: ScoreChange Wilcoxon RankSum Test

The UNIVARIATE ProcedureVariable: ScoreChange (Change in Test Scores)

Tests for Location: Mu0=0

Test Statistic p Value

Student's t t -0.80079 Pr > |t| 0.4402

Sign M -1 Pr >= |M| 0.7744

Signed Rank S -8.5 Pr >= |S| 0.5278

/**Kruskal-Wallis Test and Wilcoxon-Mann-Whitney Test **/%macro _SASTASK_DROPDS(dsname);

%IF %SYSFUNC(EXIST(&d )) %THEN %DO

SAS program

7/30/2019 ams572_ch14

110/133


%END;%IF %SYSFUNC(EXIST(&dsname, VIEW)) %THEN %DO;




PROC SQL ;% _S A STA SK _DRO PD S (WORK.TMP0TempTableInput);QUIT ;

PROC SQL ;CREATE VIEW WORK.TMP0TempTableInput

AS SELECT PreTest, Gender FROM MIHIR.AMS572;QUIT ;

TITLE;TITLE1 "Nonparametric One-Way ANOVA";

PROC NPAR1WAY DATA=WORK.TMP0TempTableInput WILCOXON;

VAR PreTest;CLASS Gender;

RUN ; QUIT ;PROC SQL ;% _S A STA SK _DRO PD S (WORK.TMP0TempTableInput);

QUIT ;

Nonparametric One-Way ANOVA

The NPAR1WAY ProcedureSAS program

7/30/2019 ams572_ch14

111/133

Wilcoxon Scores (Rank Sums) for Variable PreTestClassified by Variable Gender

Gender N Sum ofScores

ExpectedUnder H0

Std DevUnder H0

MeanScore

F 7 40.0 45.50 6.146877 5.714286

M 5 38.0 32.50 6.146877 7.600000

Average scores were used for ties.

Wilcoxon Two-Sample TestStatistic 38.0000

Normal Approximation

Z 0.8134

One-Sided Pr > Z 0.2080

Two-Sided Pr > |Z| 0.4160

t Approximation

One-Sided Pr > Z 0.2166

Two-Sided Pr > |Z| 0.4332

Z includes a continuity correction

of 0.5.

Kruskal-Wallis Test

Chi-Square 0.8006

DF 1

Pr > Chi-Square 0.3709

/** Wilcoxon Signed Rank Test **/%macro _SASTASK_DROPDS(dsname);


SAS program

7/30/2019 ams572_ch14

112/133

DROP TABLE &dsname;%END;%IF %SYSFUNC(EXIST(&dsname, VIEW)) %THEN %DO;




PROC SQL ;% _S A STA SK _D ROP DS (WORK.SORTTempTableSorted);QUIT ;

PROC SQL ;CREATE VIEW WORK.SORTTempTableSorted

AS SELECT ScoreChange FROM MIHIR.AMS572;QUIT ;TITLE;TITLE1 "Distribution analysis of: ScoreChange";TITLE2 "Wilcoxon Signed Rank Test";

ODS EXCLUDE CIBASIC BASICMEASURES EXTREMEOBS MODES MOMENTS QUANTILES;PROC UNIVARIATE DATA = WORK.SORTTempTableSorted

MU0= 0 ;

VAR ScoreChange;HISTOGRAM / NOPLOT ;

RUN ; QUIT ;PROC SQL ;% _S A STA SK _D ROP DS (WORK.SORTTempTableSorted);

QUIT ;

7/30/2019 ams572_ch14

113/133

/**Friedman Test **/%macro _SASTASK_DROPDS(dsname);


SAS program

7/30/2019 ams572_ch14

114/133

%END;%IF %SYSFUNC(EXIST(&dsname, VIEW)) %THEN %DO;


%mend _SASTASK_DROPDS;%LET _EGCHARTWIDTH=0;%LET _EGCHARTHEIGHT=0;

PROC SQL ;% _S A STA SK _DRO PD S (WORK.SORTTempTableSorted);QUIT ;

PROC SQL ;

CREATE VIEW WORK.SORTTempTableSorted AS SELECT Emotion, Subject, SkinResponse FROM WORK.HYPNOSIS1493;QUIT ;TITLE; TITLE1 "Table Analysis";

TITLE2 "Results";

PROC FREQ DATA = WORK.SORTTempTableSortedORDER=INTERNAL

;

TABLES Subject * Emotion * SkinResponse /NOROWNOPERCENTNOCUMCMH SCORES=RANK

ALPHA= 0.05 ;RUN ; QUIT ;PROC SQL ;% _S A STA SK _DRO PD S (WORK.SORTTempTableSorted);QUIT ;

SAS program

7/30/2019 ams572_ch14

115/133

Table Analysis Results:

The FREQ Procedure:Summary Statistics for Emotion by SkinResponse

Controlling for Subject

Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)

Statistic Alternative Hypothesis DF Value Prob

1 Nonzero Correlation 1 0.2400 0.6242

2 Row Mean Scores Differ 3 6.4500 0.0917

3 General Association 84 . .

At least 1 statistic not computed--singular covariance matrix.

Total Sample Size = 32

/*Spearman correlation*/%macro _SASTASK_DROPDS(dsname);


SAS program

7/30/2019 ams572_ch14

116/133

;%END;%IF %SYSFUNC(EXIST(&dsname, VIEW)) %THEN %DO;




PROC SQL ;% _S A STA SK _D RO PDS (WORK.SORTTempTableSorted);QUIT ;

PROC SQL ;CREATE VIEW WORK.SORTTempTableSorted

AS SELECT Arts, Economics FROM WORK.WESTERNRATES5171;QUIT ;

TITLE1 "Correlation Analysis";

/*Sperman Method*/

PROC CORR DATA=WORK.SORTTempTableSortedSPEARMANVARDEF=DFNOSIMPLENOPROB;VAR Arts;WITH Economics;

RUN ;

/*Kendall Method */SAS program

7/30/2019 ams572_ch14

117/133

PROC CORR DATA=WORK.SORTTempTableSortedKENDALLVARDEF=DF

NOSIMPLENOPROB;VAR Arts;WITH Economics;

RUN ;

RUN ; QUIT ;

PROC SQL ;% _S A STASK _D ROPDS (WORK.SORTTempTableSorted);

QUIT ;

SAS program

7/30/2019 ams572_ch14

118/133

Correlation Analysis

The CORR Procedure

1 With Variables: Economics

1 Variables: Arts

Spearman Correlation Coefficients, N = 52

Arts

Economics 0.27926

1 With Variables: Economics

1 Variables: Arts

Kendall Taub Correlation Coefficients, N = 52

Arts

Economics 0.18854

Correlation Analysis

The CORR Procedure

7/30/2019 ams572_ch14

119/133

Whathappened to

I dontreally

7/30/2019 ams572_ch14

120/133

pphis eyes!!!!! believe in

peace

7/30/2019 ams572_ch14

121/133

buddies

Statistics is funny!

7/30/2019 ams572_ch14

122/133

y

How?

They are going tokill me HELP!

7/30/2019 ams572_ch14

123/133

kill me. HELP!

Are you

7/30/2019 ams572_ch14

124/133

Are youstill taking

the picture?

Is it safe toI dont know but I

am looking

7/30/2019 ams572_ch14

125/133

look at thecamera?

am looking.

7/30/2019 ams572_ch14

126/133

We lovestatistics

Losers!

7/30/2019 ams572_ch14

127/133

7/30/2019 ams572_ch14

128/133

7/30/2019 ams572_ch14

129/133

7/30/2019 ams572_ch14

130/133

7/30/2019 ams572_ch14

131/133

7/30/2019 ams572_ch14

132/133

7/30/2019 ams572_ch14

133/133