Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x?(y, λ) ∈ argminx∈RN
1
2||y − Φx||2 + λ||D∗x||1 (Pλ(y))
which aims at inverting
y = Φx0 + w
by promoting sparsity and with
I x0 ∈ RN the unknown image of interest,
I y ∈ RQ the low-dimensional noisy observation of x0,
I Φ ∈ RQ×N a linear operator that models the acquisition process,
I w ∼ N (0, σ2IdQ) the noise component,
I D ∈ RN×P an analysis dictionary, and
I λ > 0 a regularization parameter.
How to choose the value of the parameter λ?
Risk-based selection of λ
I Risk associated to λ: measure of the expected quality of x?(y, λ) wrt x0,
R(λ) = Ew||x?(y, λ)− x0||2 .I The optimal (theoretical) λ minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x?(y,λ)?
Risk estimation
I Assume y 7→ Φx?(y, λ) is weakly differentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y, λ) =||y − Φx?(y, λ)||2 − σ2Q + 2σ2 tr
(∂Φx?(y, λ)
∂y
)︸ ︷︷ ︸Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew(SURE(y, λ)) = Ew(||Φx0 − Φx?(y, λ)||2) .
Projection risk estimation via GSURE
I Let Π = Φ∗(ΦΦ∗)+Φ be the orthogonal projector on ker(Φ)⊥ = Im(Φ∗),
I Denote xML(y) = Φ∗(ΦΦ∗)+y,
I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y, λ) =||xML(y)− Πx?(y, λ)||2 − σ2 tr((ΦΦ∗)+) + 2σ2 tr
((ΦΦ∗)+∂Φx?(y, λ)
∂y
)is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew(GSURE(y, λ)) = Ew(||Πx0 − Πx?(y, λ)||2)
(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
Unkn
own
Known
SURE GSURE
(here, x? denotes x?(y, λ) for an arbitrary value of λ)
How to estimate the quantity tr((ΦΦ∗)+∂x
?(y,λ)∂y
)?
Main notations and assumptions
I Let I = supp(D∗x?(y, λ)) be the support of D∗x?(y, λ),
I Let J = Ic be the co-support of D∗x?(y, λ),
I Let DI be the submatrix of D whose columns are indexed by I ,
I Let sI = sign(D∗x?(y, λ))I be the subvector of D∗x?(y, λ) whose entries are indexed by I ,
I Let GJ = KerD∗J be the “cospace” associated to x?(y, λ) ,
I To study the local behaviour of x?(y, λ), we impose Φ to be “invertible” on GJ :
GJ ∩ Ker Φ = {0},I It allows us to define the matrix
A[J ] = U(U∗Φ∗ΦU)−1U∗,where U is a matrix whose columns form a basis of GJ ,
I In this case, we obtain an implicit equation:
x?(y, λ) solution of Pλ(y)⇔ x?(y, λ) = x(y, λ) , A[J ]Φ∗y − λA[J ]DIsI .
Is this relation true in a neighbourhood of (y,λ)?
Theorem (Local Parameterization)
I Even if the solutions x?(y, λ) of Pλ(y) might benot unique, Φx?(y, λ) is uniquely defined.
I If (y, λ) 6∈ H, for (y, λ) close to (y, λ), x(y, λ)is a solution of P(y, λ) where
x(y, λ) = A[J ]Φ∗y − λA[J ]DIsI .
I Hence, it allows us writing
∂Φx?(y, λ)
∂y= ΦA[J ]Φ∗ ,
I Moreover, the DOF can be estimated by
tr
(∂Φx?(y, λ)
∂y
)= dim(GJ) .
Can we compute this quantity efficiently?
�
x1
x2
�0 = 0 �k
x�k= 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ∼ N (0, IdP ),
tr
((ΦΦ∗)+∂Φx?(y, λ)
∂y
)= EZ(〈ν(Z), Φ∗(ΦΦ∗)+Z〉)
where, for any z ∈ RP , ν = ν(z) solves the following linear system(Φ∗Φ DJD∗J 0
)(νν
)=
(Φ∗z
0
).
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ν(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y, λ) at the optimal λ 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter λ
Qu
ad
ratic lo
ss
Projection Risk
GSURE
True Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y, λ) at the optimal λ2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter λ
Qu
ad
ratic lo
ss
Projection Risk
GSURE
True Risk
Perspectives: How to efficiently minimizes GSURE(y,λ) wrt λ?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]