Resampling Applications & Permutation Testsstatlearn.uos.ac.kr/.../Ch9_10_slide.pdf · 2020-05-13 · Approximate permutation test Two-sample Tests for Equal Distributions =t (˝‚Ü‰

Resampling Applications & Permutation Tests

박창이

서울시립대학교 통계학과

박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 1 / 12

학습 내용

Resampling Applications

Jackknife-after-Bootstrap

Resampling for Regression Models

Resampling cases

Resampling errors

Permutation Tests

Permutation distribution

Approximate permutation test

Two-sample Tests for Equal Distributions


Jackknife-after-Bootstrap

If one is interested in the variance of bootstrap estimates, then one

may try the jackknife.

Let J(i) be the indices of bootstrap samples that do not contain xi

and let B(i) denote the number of bootstrap samples that do not

contain xi . The jackknife estimate of s.e. is

se(θ) = se jack(seB(1), . . . , seB(n)),

where seB(i) =√

1B(i)

∑j∈J(i)[θ(j) − θ(J(i))]2 and

θ(J(i)) = 1B(i)

∑j∈J(i) θ(j) is the sample mean of the estimates from

the leave-xi -out jackknife samples.


Resampling Cases for Regression Models

Regression model:

Yj = β0 + β1xj + εj , j = 1, . . . , n,

where εi ’s are iid with E(εi ) = 0 and Var(εi ) = σ2.

Observed data: (xi , yi ), i = 1, . . . , n.

Algorithm

For r=1 to R:

1. Randomly draw {i∗1 , . . . , i∗n } from {1, . . . , n} with replacement.

2. Set x∗j = xi∗j and y∗j = yi∗j , j = 1, . . . , n.

3. Fit OLS to (x∗j , y∗j ), j = 1, . . . , n to get estimates β

(r)∗0 and β

(r)∗1 and

the residual s.e. s∗r .


Resampling Errors for Regression Models I

1. Fit OLS to data to obtain β = (β0, . . . , βp)T , σ2ε , and fitted values

y1, . . . , yn.

2. Compute the raw residuals ei = yi − yi and the modified residuals

e ′i = ei/√

1− hii , i = 1, . . . , n, where hjk = 1n +

(xj−x)(xk−x)SSx

is an jk

element of the hat matrix.

3. Center the modified residuals: ri = e ′i − e ′.


Resampling Errors for Regression Models II

4. For k = 1, . . . ,R:

1 For j = 1 to n:

1 Set x∗j = xj .

2 Randomly sample with replacement ε∗(k)j from {r1, . . . , rn}.

3 Set y∗(k)j = yj + ε

∗(k)j .

2 Fit OLS to (x∗j , y∗(k)j ), j = 1, . . . , n to get estimates β(k)∗ and the

residual s.e. s(k)∗.

3 Return (β(k)∗, s(k)∗).


Permutation Tests

Permutation tests are based on resampling, but the samples are

drawn without replacement.

Used for nonparametric tests for equal distributions, independence,

association, location, common scale, etc.


Permutation Distribution I

X1, . . . ,Xn and Y1, . . . ,Ym are independent random samples from FX

and FY .

Z = X ∪Y : pooled sample s.t. Zi = Xi if 1 ≤ i ≤ n and Zi = Yi−n if

n + 1 ≤ i ≤ n + m = N.

Z ∗ = (X ∗,Y ∗): a partition of Z , where X ∗ has n elements and Y ∗

has m elements. There is a permutation π of ν s.t. Z ∗i = Zπ(i), where

ν = {1, . . . , n, n + 1, . . . , n + m} = {1, . . . ,N}.

Under H0 : FX = FY , a randomly selected Z ∗ has probability1

(Nn)= n!m!

N! .


Permutation Distribution II

For θ(X ,Y ) = θ(Z , ν), the permutation distribution of θ∗ is the

distribution of {θ∗} = {θ(Z , πj(ν)), j = 1, . . . ,(Nn

)} = {θ(j) :

πj(ν) is a permutation of ν}.

The permutation test rejects the null hypothesis if θ is large relative

to the distribution Fθ∗(t) = P(θ∗ ≤ t) =(Nn

)−1∑Nj=1 I (θ

(j) ≤ t). The

achieved significance level is

P(θ∗ ≥ θ) =

(N

n

)−1 N∑j=1

I (θ(j) ≥ θ).

If the sample size is large, the permutation test is computationally too

expensive. In that case, we may use an approximate permutation test.


Approximate Permutation Test

1. Compute the observed test statistic θ(X ,Y ) = θ(Z , ν).

2. For b = 1, . . . ,B:

1 Generate a random permutation πb = π(ν).

2 Compute θ(b) = θ∗(Z , πb).

3. If large value of θ support H1, compute the empirical p-value by

p =

∑Bb=1 I (θ

(b) ≥ θ) + 1

B + 1.

For a lower-tail or two-tail test p is computed similarly.

4. Reject H0 at significance level α if p ≤ α.


Two-sample Tests for Equal Distributions I

X = (X1, . . . ,Xn) and Y = (Y1, . . . ,Ym) are independent random

samples from F and G .

H0 : F = G vs H1 : F 6= G .

Under H0, X , Y , and Z = X ∪ Y are random samples from F . Also,

under H0, any subset X ∗ of size n from Z and its complement Y ∗ are

random samples from F .

θ: a two-sample statistic measuring the distance between F and G .

Large values of θ support H1.

Under H0, all values of θ∗ = θ(X ∗,Y ∗) are equally likely.


Two-sample Tests for Equal Distributions II

Examples of test statistics

Kolmogorov-Smirnov statistic

D = sup1≤i≤n+m

|Fn(zi )− Gm(zi )|,

where Fn and Gm are ecdf’s of x1, . . . , xn and y1, . . . , ym. Note that

0 ≤ D ≤ 1.

Cramer-von Mises statistic

W2 =mn

(m + n)2

n∑i=1

(Fn(xi )− Gm(xi ))2 +m∑j=1

(Fn(yj)− Gm(yj))2

.


Documents

Resampling Applications & Permutation Testsstatlearn.uos.ac.kr/.../Ch9_10_slide.pdf · 2020-05-13 · Approximate permutation test Two-sample Tests for Equal Distributions =t (˝‚Ü‰