Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Resampling Applications & Permutation Tests
박창이
서울시립대학교 통계학과
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 1 / 12
학습 내용
Resampling Applications
Jackknife-after-Bootstrap
Resampling for Regression Models
Resampling cases
Resampling errors
Permutation Tests
Permutation distribution
Approximate permutation test
Two-sample Tests for Equal Distributions
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 2 / 12
Jackknife-after-Bootstrap
If one is interested in the variance of bootstrap estimates, then one
may try the jackknife.
Let J(i) be the indices of bootstrap samples that do not contain xi
and let B(i) denote the number of bootstrap samples that do not
contain xi . The jackknife estimate of s.e. is
se(θ) = se jack(seB(1), . . . , seB(n)),
where seB(i) =√
1B(i)
∑j∈J(i)[θ(j) − θ(J(i))]2 and
θ(J(i)) = 1B(i)
∑j∈J(i) θ(j) is the sample mean of the estimates from
the leave-xi -out jackknife samples.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 3 / 12
Resampling Cases for Regression Models
Regression model:
Yj = β0 + β1xj + εj , j = 1, . . . , n,
where εi ’s are iid with E(εi ) = 0 and Var(εi ) = σ2.
Observed data: (xi , yi ), i = 1, . . . , n.
Algorithm
For r=1 to R:
1. Randomly draw {i∗1 , . . . , i∗n } from {1, . . . , n} with replacement.
2. Set x∗j = xi∗j and y∗j = yi∗j , j = 1, . . . , n.
3. Fit OLS to (x∗j , y∗j ), j = 1, . . . , n to get estimates β
(r)∗0 and β
(r)∗1 and
the residual s.e. s∗r .
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 4 / 12
Resampling Errors for Regression Models I
1. Fit OLS to data to obtain β = (β0, . . . , βp)T , σ2ε , and fitted values
y1, . . . , yn.
2. Compute the raw residuals ei = yi − yi and the modified residuals
e ′i = ei/√
1− hii , i = 1, . . . , n, where hjk = 1n +
(xj−x)(xk−x)SSx
is an jk
element of the hat matrix.
3. Center the modified residuals: ri = e ′i − e ′.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 5 / 12
Resampling Errors for Regression Models II
4. For k = 1, . . . ,R:
1 For j = 1 to n:
1 Set x∗j = xj .
2 Randomly sample with replacement ε∗(k)j from {r1, . . . , rn}.
3 Set y∗(k)j = yj + ε
∗(k)j .
2 Fit OLS to (x∗j , y∗(k)j ), j = 1, . . . , n to get estimates β(k)∗ and the
residual s.e. s(k)∗.
3 Return (β(k)∗, s(k)∗).
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 6 / 12
Permutation Tests
Permutation tests are based on resampling, but the samples are
drawn without replacement.
Used for nonparametric tests for equal distributions, independence,
association, location, common scale, etc.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 7 / 12
Permutation Distribution I
X1, . . . ,Xn and Y1, . . . ,Ym are independent random samples from FX
and FY .
Z = X ∪Y : pooled sample s.t. Zi = Xi if 1 ≤ i ≤ n and Zi = Yi−n if
n + 1 ≤ i ≤ n + m = N.
Z ∗ = (X ∗,Y ∗): a partition of Z , where X ∗ has n elements and Y ∗
has m elements. There is a permutation π of ν s.t. Z ∗i = Zπ(i), where
ν = {1, . . . , n, n + 1, . . . , n + m} = {1, . . . ,N}.
Under H0 : FX = FY , a randomly selected Z ∗ has probability1
(Nn)= n!m!
N! .
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 8 / 12
Permutation Distribution II
For θ(X ,Y ) = θ(Z , ν), the permutation distribution of θ∗ is the
distribution of {θ∗} = {θ(Z , πj(ν)), j = 1, . . . ,(Nn
)} = {θ(j) :
πj(ν) is a permutation of ν}.
The permutation test rejects the null hypothesis if θ is large relative
to the distribution Fθ∗(t) = P(θ∗ ≤ t) =(Nn
)−1∑Nj=1 I (θ
(j) ≤ t). The
achieved significance level is
P(θ∗ ≥ θ) =
(N
n
)−1 N∑j=1
I (θ(j) ≥ θ).
If the sample size is large, the permutation test is computationally too
expensive. In that case, we may use an approximate permutation test.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 9 / 12
Approximate Permutation Test
1. Compute the observed test statistic θ(X ,Y ) = θ(Z , ν).
2. For b = 1, . . . ,B:
1 Generate a random permutation πb = π(ν).
2 Compute θ(b) = θ∗(Z , πb).
3. If large value of θ support H1, compute the empirical p-value by
p =
∑Bb=1 I (θ
(b) ≥ θ) + 1
B + 1.
For a lower-tail or two-tail test p is computed similarly.
4. Reject H0 at significance level α if p ≤ α.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 10 / 12
Two-sample Tests for Equal Distributions I
X = (X1, . . . ,Xn) and Y = (Y1, . . . ,Ym) are independent random
samples from F and G .
H0 : F = G vs H1 : F 6= G .
Under H0, X , Y , and Z = X ∪ Y are random samples from F . Also,
under H0, any subset X ∗ of size n from Z and its complement Y ∗ are
random samples from F .
θ: a two-sample statistic measuring the distance between F and G .
Large values of θ support H1.
Under H0, all values of θ∗ = θ(X ∗,Y ∗) are equally likely.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 11 / 12
Two-sample Tests for Equal Distributions II
Examples of test statistics
Kolmogorov-Smirnov statistic
D = sup1≤i≤n+m
|Fn(zi )− Gm(zi )|,
where Fn and Gm are ecdf’s of x1, . . . , xn and y1, . . . , ym. Note that
0 ≤ D ≤ 1.
Cramer-von Mises statistic
W2 =mn
(m + n)2
n∑i=1
(Fn(xi )− Gm(xi ))2 +m∑j=1
(Fn(yj)− Gm(yj))2
.
박창이 (서울시립대학교 통계학과) Resampling Applications & Permutation Tests 12 / 12