Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Three 2nd Thoughts on Empirical Bayes Inference
— Conversations with CarlMorris—
Bradley Efron
Stanford University
Empirical Bayes (Robbins, 1950s)
Unknown prior density g(θ) gives θ1, θ2, . . . , θN (unobserved)
Observations x1, x2, . . . , xN with xiind∼ p(xi |θi) (e.g., xi ∼ Poi(θi))
The Goal— To estimate the parameters θi
Amazing Fact: For large N we can nearly achieve Bayes risk
“Large parallel studies contain their own Bayesian information.”
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 2 / 32
“Normal Normal Case”:
θi ∼ N(M,A) and xi ∼ N(θi,1)
Bayes Rule θBayesi = E{θi |xi} = M + (1 − B)(xi −M)
B = 1/(A + 1) [shrinkage factor A/(A + 1)]
Bayes Risk RBayes = E
n∑1
(θi − θ
Bayesi
)2 = N(1 − B)
MLE θMLEi = xi : RMLE = N
RBayes/RMLE = 1 − B
Bayes saves proportion B of the risk
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 3 / 32
The James–Stein Estimator (1961)
James–Stein θJSi = M +
(1 − B
) (xi − M
)where M = x and B = (N − 3)
/∑N1
(xi − M
)2(unbiased ests)
Risk: RJS/RBayes = 1 + 3
/(N · A)
For N = 18, A = 1, JS loses 1/6 of Bayes savings
“Shrinkage estimation”
Theorem
Eθ
{∑ (θi − θJS
i
)2}< Eθ
{∑ (θi − θMLE
i
)2}
always!
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 4 / 32
The 18 baseball players
MLE Truth JS
Clemente .400 .346 .294F. Robinson .378 .298 .289F. Howard .356 .276 .285Johnstone .333 .222 .280...
......
...E. Rodriguez .222 .226 .256Campaneris .200 .285 .252Munson .178 .316 .247Alvis .156 .200 .242
Squared error: .075 .021
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 5 / 32
Learning From the Experience of Others
Why does Clemente’s good performance increase Munson’s
estimate?
Parallel data sets let you “learn from the experience of others”
Which others?
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 6 / 32
Relevance (Efron–Morris 1972)
θreli = M +
[1 − ρ(xi)B
] (xi − M
)Relevance function ρ(·) decreases with
∣∣∣xi − M∣∣∣
Limited Translation “Never shrink more than one unit
away from MLE” −→ if xi ∼ N(θi , σ2) then “unit” = σ
Clemente: θreli = 0.334 (cf. θJS
i = 0.294)
Loses about 10% of θJS savings
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 7 / 32
False Discovery Rates (Benjamini–Hochberg 1995)
EB Testing Which of N cases are “significant”?
DTI Study: 6 dyslexics vs 6 controls, N = 15,443 voxels
zinull∼ N(0,1) for ith voxel
Fdr All z’s determine each zi ’s significance
“locfdr”: 174 voxels with zi ≥ 3.10 have Fdr ≤ 0.10
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 8 / 32
DTI data: z−scores for 15443 voxels;174 voxels with z >= 3.10 have Fdr <= .10
z−score
Fre
quen
cy
−4 −2 0 2 4
020
040
060
080
010
00
| |^ ^||| | ||| | || ||| || ||| | || | |||| | || | | ||| | |||| || ||| || |||| || ||| || ||| |||| || | ||||| | || || | ||| | ||||| || |||| || || | || | | | |||||||| |||| | || | ||||| || || | | |||| || || ||| | || | |||| |||||| | ||| | ||| | |||| | ||| ||
3.10Fdr <= .10
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 9 / 32
20 40 60 80
−2
02
4
distance x −−>
z−va
lues
16%ile
84%ile
median
**
*
**
**
* **
* **
*
*
* ***
*
*
**
*
*
*
**
*
* **
* *
**
* *
* ** *
*
***
*
*
* * * *
*
*
**
* * *
* **
**
* *
** *
*
*
* **
**
*
* **
*
* *
**
** **
** *
* ** *
*
*
*
**
**
*
* *
**
**
* ** * *
** ** *
*
**
**
** **
** *
**
* ** *
*
*
**
* *
*
**
**
*
***
* **
*
* * * **
**
**
**
**
*
*
*
*
**
**** *
*
**
DTI z-values versus distance from back of brain
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 10 / 32
Relevance (Efron 2010, §10.3)
xi = distance of voxel i from back of brain
Target voxel i0
Voxeli counts amount ρ(xi − xi0) toward Fdri
Example ρ = 1 if |xi − xi0 | ≤ 10, ρ = 0 otherwise
or ρ = exp(|xi − xi0 |/10)
Kicking the can down the street . . .
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 11 / 32
Second 2nd Thought: Ridge Regression
(Hoerl–Kinnard 1970)
OLS y = Xn×p
β+ ε with εiind∼ (0,1)
β = S−1X ′y [S = (X ′X)]
Ridge Estimate
β(λ) = (S + λI)−1S β
Shrinks estimate β toward zero
Empirical Bayes data-based choice of λ
Example Diabetes Study: n = 442, p = 10 predictors
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 12 / 32
Diabetes study: First 7 of n = 442 patients
(predict prog from the 10 covariates)
age sex bmi map tc ldl hdl tch ltg glu prog
59 1 32.1 101 157 93.2 38 4 2.11 87 15148 0 21.6 87 183 103.2 70 3 1.69 69 7572 1 30.5 93 156 93.6 41 4 2.03 85 14124 0 25.3 84 198 131.4 40 5 2.12 89 20650 0 23.0 101 192 125.4 52 4 1.86 80 13523 0 22.6 89 139 64.8 61 2 1.82 68 9736 1 22.0 90 160 99.6 50 3 1.72 82 138
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 13 / 32
0.00 0.05 0.10 0.15 0.20 0.25
−50
00
500
Ridge trace for standardized diabetes data
lambda
beta
coe
ffici
ents
age
sex
bmi
map
tc
ldl
hdltch
ltg
gluage
sex
bmi
map
tc
ldl
hdltch
ltg
glu
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 14 / 32
Comparison with James–Stein
βJS =
1 − p − 2
β′Sβ
β = 0.97β for diabetes
µJS = X βJS guaranteed to beat µ = X β
No such result for ridge regression and no automatic choice of λ
But. . . there’s more than one way to shrink a cat
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 15 / 32
Estimate and stdevs for components of β
betamle betaridge sdmle sdridge
age −10.0 1.3 59.7 52.7sex −239.8 −207.2 61.2 53.2bmi 519.8 489.7 66.5 56.3map 324.4 301.8 65.3 55.7tc −792.2 −83.5 416.2 43.6ldl 476.7 −70.8 338.6 52.4hdl 101.0 −188.7 212.3 58.4tch 177.1 115.7 161.3 70.8ltg 751.3 443.8 171.7 58.4glu 67.6 86.7 65.9 56.6
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 16 / 32
Three Uses of Regression Rules
Given any vector v of covariates, y = r(v)
e.g., y = v′βJS or v′β(λ)
1. Prediction [JS]
Want small prediction error E(y − y)2 (y the response at v)
2. Response Surface Estimation
Want y accurate for E{y |v}
3. Explanation [Ridge]
Form of r(v) shows importance of the individual covariates
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 17 / 32
Ridge Regression and Regularization
Alternate Formβ(λ) = arg minβ
{‖y−Xβ‖2+λ‖β‖2
}↖ ↗
OLS penalty
“Regularizes” OLS by penalizing large values of ‖β‖
Equivalently: Bayes vs prior Np(0, I/λ)
Lasso
β(λ) = arg minβ
‖y − Xβ‖2 + λp∑1
|βj |
Now sometimes shrinkage all the way to zero (“Sparsity”)
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 18 / 32
0 2 4 6 8 10 12
−50
00
500
step
beta
val
s
age
sex
bmi
map
tc
ldl
hdltch
ltg
glu
Inf 889 453 316 130 89 69 20 5.5 5.1 2.2 1.3 OLSlambda−−>
Lasso coefficients as function of regulizer lambda;Cp calculations say lambda=20 best
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 19 / 32
Comparison of β estimates
mle ridge lasso JS
age −10.0 1.3 0.0 −9.7sex −239.8 −207.2 −197.8 −232.6bmi 519.8 489.7 522.3 504.2map 324.4 301.8 297.2 314.7tc −792.2 −83.5 −103.9 −768.4ldl 476.7 −70.8 0.0 462.4hdl 101.0 −188.7 −223.9 98.0tch 177.1 115.7 0.0 171.8ltg 751.3 443.8 514.7 728.7glu 67.6 86.7 54.8 65.6
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 20 / 32
Robbins versus Stein
1956 θi ∼ g(θ) and xi ∼ Poi(θi) ind for i = 1, . . . ,N
Marginal Density f(x) =
∫∞
0g(θ)e−θθx/x! dθ
for x = 0,1,2, . . .
Robbins’ Formula
E{θi |xi = x} = (x + 1)f(x + 1)/f(x)
Emp Bayes Estimate E{θi |xi = x} = (x + 1)f(x + 1)/f(x)
f(x) = #{xi = x}/N
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 21 / 32
Comparison (biased)
James–Stein
I Elegant airtight theorem
I Works with or without prior g(θ)
I Small N
Robbins-type Empirical Bayes
I Asymptotic justification
I Nonparametric (slow)
I “Need N in the thousands!”
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 22 / 32
Auto insurance example: N = 9461
Claims x 0 1 2 3 4 5 6 7
#{xi < x} 7840 1317 239 42 14 4 4 1
E{θi |xi = x} .168 .363 .527 1.33 1.43 6.00 1.25
Gamma∗ .164 .398 .633 .87 1.10 1.34 1.57
∗ Assumes g(θ) a Gamma density (“parametric EB”)
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 23 / 32
Tweedie’s Formula (1956?)
θi ∼ g(θ) and xi ∼ N(θi ,1) independent i = 1,2, . . . ,N
Marginal Density f(x) =
∫∞
−∞
g(θ)e−(x−θ)
2/2√
2πdθ
E{θi |xi = x} = x +ddx
log f(x)
Empirical Bayes: fit smooth f(x) to histogram of (x1, x2, . . . , xN);
E{θi |xi = x} = x +ddx
log f(x)
(gives James–Stein if assume log g(x) quadratic, e.g., normal)
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 24 / 32
Prostate Cancer Study
102 men, 52 prostate cancer and 50 healthy controls
Each assessed on N = 6033 genes
zi = two-sample test statistic, patients vs controls (z = “x”)
zi ∼ N(θi ,1) i = 1,2, . . . ,N
θi is “effect size” for ith gene
Null genes: θi = 0
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 25 / 32
N=6033 z−values, prostate cancer study
z values
Fre
quen
cy
−4 −2 0 2 4
010
020
030
040
0
5th degreelog poly fit
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 26 / 32
−4 −2 0 2 4 6
−4
−2
02
46
Tweedie's estimate of E{theta|z} for prostate cancer study;using 5th degree polynomial model for f(z)
z value
E{t
heta
|z}
Curve nearly 0 for −2 < z < 2; E{theta|x=3}=1.05
●
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 27 / 32
Empirical Bayes Hypothesis Testing (Fdr)
Prior g(θ)→ θi → zi ∼ N(θi ,1), i = 1,2, . . . ,N
g(θ) has atom π0 at θ = 0 (the null cases)
False Discovery Rates
Fdr(z0) = Pr{θi = 0|zi ≥ z0} = π0Φ(z0)/F(z0)
Φ(z0) = Pr{N(0,1) ≥ z0} and F(z0) = Prg{zi ≥ z0}
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 28 / 32
Fdr Control Algorithm (1995)
Estimate Fdr(z) with
Fdr(z) = Φ(z)/F(z), F(z) : #{zi ≥ z}/N
Reject θi = 0 if Fdr(zi) ≤ q
Frequentist Theorem
Expected proportion false discoveries ≤ q
Empirical Bayes: Fdr(zi) decreases if more of the other zj ’s
exceed zi
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 29 / 32
−2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Estimated false discovery rates for the prostate cancer data;Fdr(3.57) = .10; thirteen genes with z > 3.57
z value
Fdr
(z),
lwd=
3
●
3.57|| |||| ||| || ||
.10
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 30 / 32
“There’s nobody less Bayesian
than an Empirical Bayesian” (Dennis Lindley, 1969)
Early skepticism of empirical Bayes (relevance)
Bayes prior experience influences current inference
Emp Bayes current experience (of others) influences inference
Crucial idea is not “estimating prior g(θ)” but applying an
estimated prior to individual cases
Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 31 / 32
References
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate . . . .JRSS-B 57: 289–300, the original Fdr paper.
Copas, J. B. (1969). Compound decisions and empirical Bayes (with discussion).JRSS-B 31: 397–425, Lindley and other skeptics.
Efron, B. (2010). IMS Monographs 1. Fdr as empirical Bayes.Efron, B. (2011). JASA 106: 1602–1614, Tweedie’s estimate.Efron, B. and Morris, C. (1972). JASA 67: 130–139, limited translation and relevance.Efron, B. and Morris, C. (1973). JASA 68: 117–130, the baseball players and other
examples.Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression . . . . Technometrics 12:
69–82.James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the
4th Berkeley Symposium, 361–379, JS estimates.Morris, C. N. (1983). Parametric empirical Bayes inference . . . . JASA 78: 47–65,
parametric EB and EB confidence intervals.Robbins, H. (1956). An empirical Bayes approach to statistics. In Proceedings of the
3rd Berkeley Symposium, 157–163.Bradley Efron Stanford University
Three 2nd Thoughts on EB Inference 32 / 32