Upload
hoangbao
View
218
Download
1
Embed Size (px)
Citation preview
Sparse modeling 2
Shiro Ikeda
The Institute of Statistical Mathematics
3 July 2015
Ikeda (ISM) Sparse modeling 3/July/2015 1 / 56
Some applications
Sparsity and Information Processing
minx
[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1
]estimation, model selection Reflection seismology
estimation, prediction, model selection fMRI and EMG
inverse problem, imaging Imaging of black hole
X ≃ L+M , L: low-rank, M : sparse
sparse matrix Movie analysis
Ikeda (ISM) Sparse modeling 3/July/2015 2 / 56
Some applications Reflection seismology
Some applicationsReflection seismologyfMRI and EMGImaging of black holeSparse matrix
How to solve the problems
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 3 / 56
Some applications Reflection seismology
Reflection seismology minx
[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1
]▶ Investigate the structure
underground.
▶ Make a sound with aloud-speaker and recordthe reflections.
▶ Reflections only occurs atthe boundaries.
Taylor, Banks, & McCoy (1979). “Deconvolution with the ℓ1 norm,”Geophysics, 44(1), 39-52.
Santosa & Symes (1986). “Linear inversion of band-limited reflectionseismograms,” SIAM. J. Sci. Stat. Comp., 7(4), 1307-1330.
Ikeda (ISM) Sparse modeling 3/July/2015 4 / 56
Some applications Reflection seismology
Reflection seismology
g(t): Sound recorded by the microphone
f(t): Sound from loud-speaker
h(t): Impulse response
0time
r(t) h(t) is non-zero only whenthere is a reflection. Thenumber of the boundaries is notlarge, and it is sparse.
Ikeda (ISM) Sparse modeling 3/July/2015 5 / 56
Some applications Reflection seismology
Reflection seismology
g(t) =
∫ t
0f(t− τ)h(τ)dτ
discretize as follows
g = (g1, · · · , gn)T , gi = g(i∆t)
F = (Fij), Fij = f((i− j + 1)∆t)
h = (h1, · · · , hn)T , hi = h(i∆t)
then,
g(i∆t) =∑j
f((i− j + 1)∆t)h(j∆t).
Ikeda (ISM) Sparse modeling 3/July/2015 6 / 56
Some applications Reflection seismology
Reflection seismology
g(i∆t) =∑j
f((i− j + 1)∆t)h(j∆t)
g = Fh
h is sparse, and we can use the LASSO framework.
minh
[∥g − Fh∥2ℓ2 + λ∥h∥ℓ1
].
Ikeda (ISM) Sparse modeling 3/July/2015 7 / 56
Some applications fMRI and EMG
Some applicationsReflection seismologyfMRI and EMGImaging of black holeSparse matrix
How to solve the problems
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 8 / 56
Some applications fMRI and EMG
fMRI and EMG minx
[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1
]Ask the subject to control the sum of the EMG, and record the EMGof agonist and antagonist muscles
Ganesh, Burdet, Haruno, & Kawato (2008). “Sparse linear regressionfor reconstructing muscle activity from human cortical fMRI,”NeuroImage, 42(4), 1463-1472.
Ikeda (ISM) Sparse modeling 3/July/2015 9 / 56
Some applications fMRI and EMG
fMRI and EMG
Torque corresponding to EMG
Ikeda (ISM) Sparse modeling 3/July/2015 10 / 56
Some applications fMRI and EMG
fMRI and EMG
Voxels of the brain related to the EMG
Ikeda (ISM) Sparse modeling 3/July/2015 11 / 56
Some applications fMRI and EMG
fMRI and EMG
Estimated EMG
Ikeda (ISM) Sparse modeling 3/July/2015 12 / 56
Some applications Imaging of black hole
Some applicationsReflection seismologyfMRI and EMGImaging of black holeSparse matrix
How to solve the problems
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 13 / 56
Some applications Imaging of black hole
Black hole (Joint work with Mareki Honma at NAOJ)
VLBI: Very Long Baseline Interferometer
▶ Resolution of a telescope Θ depends on the aperture D and thewavelength λ as
Θ ∝ λ
D
for example, for λ ≃ 1mm and D ≃ 8000km, Θ ≃ 25µarcsecond
Ikeda (ISM) Sparse modeling 3/July/2015 14 / 56
Some applications Imaging of black hole
Interferometer
Ikeda (ISM) Sparse modeling 3/July/2015 15 / 56
Some applications Imaging of black hole
Target of VLBI
Ikeda (ISM) Sparse modeling 3/July/2015 16 / 56
Some applications Imaging of black hole
EHT: Event Horizon Telescope
Ikeda (ISM) Sparse modeling 3/July/2015 17 / 56
Some applications Imaging of black hole
Imaging of black hole
VLBI: Very Long Baseline Interferometer
▶ Our goal is to take a image of a black hole with the interferometer.
▶ Black hole is “black,” but surrounding area emit light.
Ikeda (ISM) Sparse modeling 3/July/2015 18 / 56
Some applications Imaging of black hole
Size of black holes
Ikeda (ISM) Sparse modeling 3/July/2015 19 / 56
Some applications Imaging of black hole
Resolution of a telescope
Ikeda (ISM) Sparse modeling 3/July/2015 20 / 56
Some applications Imaging of black hole
Interferometer
Basic equation
The relation between image I(x, y) and the observation at the position(u, v), S(u, v) is
I(x, y) =
∫ ∫S(u, v)e−2πi(ux+vy)dudv
S(u, v) =
∫ ∫I(x, y)e2πi(ux+vy)dxdy.
Observation and the image are related with the Fourier transform.
Ikeda (ISM) Sparse modeling 3/July/2015 21 / 56
Some applications Imaging of black hole
Interferometer
Measurements and S(u, v)
Ikeda (ISM) Sparse modeling 3/July/2015 22 / 56
Some applications Imaging of black hole
Interferometer
Problem
Ideally, compute the image I(x, y) by using the inverse Fourier transformto S(u, v), but since the observation is only measured partially, there is adifficulty.
S(u, v)
2-dim Fouriertransformation
↔
I(x, y)
Ikeda (ISM) Sparse modeling 3/July/2015 23 / 56
Some applications Imaging of black hole
Interferometer
Problem
Applying LASSO.
Ikeda (ISM) Sparse modeling 3/July/2015 24 / 56
Some applications Imaging of black hole
Inverse problem
Problem
Simulate black hole image, and compute the observation where the pointsof observation is 2/5 of the whole points. Then, reconstruct the image isreconstructed with LASSO.
black hole image
Image computed byLASSO
Reconstructed image
5 10 15 20 25 30
5
10
15
20
25
30
Real observation data will be provided in 2015–2016. We prepare for dataanalysis.
Ikeda (ISM) Sparse modeling 3/July/2015 25 / 56
Some applications Sparse matrix
Some applicationsReflection seismologyfMRI and EMGImaging of black holeSparse matrix
How to solve the problems
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 26 / 56
Some applications Sparse matrix
Separation of a matrix
Separation of a matrix
M = L+ S
minL,S
[∥∥M∥∥∗ + λ
∥∥S∥∥ℓ1
]subject to M = L+ S.
∥∥L∥∥∗ = tr(√LTL) =
∑σ(L) is the nuclear norm, which is the sum of
singular values.∥∥S∥∥
ℓ1=
∑ij |Sij |.
Ikeda (ISM) Sparse modeling 3/July/2015 27 / 56
Some applications Sparse matrix
Application
Candes, Li, Ma, & Wright (2011). “Robust principal componentanalysis?,” J. ACM, 58(3).
Ikeda (ISM) Sparse modeling 3/July/2015 28 / 56
How to solve the problems Compressed Sensing: Linear Programming
Some applications
How to solve the problemsCompressed Sensing: Linear ProgrammingLASSO: LARSLASSO: Quadratic ProgrammingLASSO: Iterative Shrinking Algorithm
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 30 / 56
How to solve the problems Compressed Sensing: Linear Programming
Linear Programming
Let u, c ∈ ℜn be n-dimensional real vectors, A ∈ ℜm×n, and b ∈ ℜm.
Standard form
minu
ctu subject to Au ≤ b.
▶ Basic problem in optimization theory.
▶ Lot of numerical packages exist.
▶ Solved by simplex- or interior- methods.
▶ Large scale problems can be solved efficiently.
Solve compressed sensing with linear programming.
Ikeda (ISM) Sparse modeling 3/July/2015 31 / 56
How to solve the problems Compressed Sensing: Linear Programming
Linear Programming
Compressed Sensing
minx
∥∥x∥∥ℓ1
subject to Ax = y
Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0,
minx+,x−
1tn(x+ + x−) subject to A(x+ − x−) = y
Ikeda (ISM) Sparse modeling 3/July/2015 32 / 56
How to solve the problems Compressed Sensing: Linear Programming
Linear Programming
Let γ = (x+1 , · · · , x+n , x−1 , · · · , x−n )t. For optimal γ, at least one of x+j
and x−j becomes 0 for all j.
minγ
1t2nγ subject to
(A −A−A A
)γ ≤
(y−y
), −E2nγ ≤ 02n
Ikeda (ISM) Sparse modeling 3/July/2015 33 / 56
How to solve the problems LASSO: LARS
Some applications
How to solve the problemsCompressed Sensing: Linear ProgrammingLASSO: LARSLASSO: Quadratic ProgrammingLASSO: Iterative Shrinking Algorithm
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 34 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
LASSO problem
minx
[∥y −Ax∥2ℓ2 + λ∥x∥ℓ1
], λ > 0
LARS-LASSO
Solve the above problem for many λ’s.
Efron, Hastie, Johnstone, & Tibshirani, (2004). “Least angleregression,” The Annals of Statistics, 32(2), 407-499.
Rosset & Zhu, (2007). “Piecewise linear regularized solution paths,”The Annals of Statistics, 35(3), 1012–1030.
Ikeda (ISM) Sparse modeling 3/July/2015 35 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
LASSO
Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0, and
minx+,x−
[∥∥y −A(x+ − x−)∥∥2ℓ2+ λ
n∑j=1
(x+j + x−j )]
subject to x+j , x−j ≥ 0 j = 1, · · · , n.
Lagrange function
minx+,x−
[L(x) + λ
n∑j=1
(x+j + x−j )−n∑
j=1
λ+j x
+j −
n∑j=1
λ−j x
−j
]
where L(x) =∥∥y −A(x+ − x−)
∥∥2ℓ2. Let x(λ) be the optimal x for a λ.
Ikeda (ISM) Sparse modeling 3/July/2015 36 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
Karush-Kuhn-Tucker Condition
∂xjL(x(λ)) + λ− λ+j = 0
−∂xjL(x(λ)) + λ− λ−j = 0
λ+j x
+j = 0, λ+
j ≥ 0
λ−j x
−j = 0, λ−
j ≥ 0 j = 1, · · · , n.
Study the above conditions, where
∂xjL(x) =∂L(x)
∂xj= −2(y −Ax)taj
and aj = (a1j , · · · , amj)t is the j-th column vector of A.
Ikeda (ISM) Sparse modeling 3/July/2015 37 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
Optimal xj(λ) falls into one of the following cases (λ > 0).
(i) x−j = 0, x+
j = 0
|∂xjL(x)| ≤ λ, x−j = 0, x+j = 0.
(ii) x+j > 0,
∂xjL(x) = −λ, x−j = 0 (λ+j = 0, λ−
j > 0)
(iii) x−j > 0,
∂xjL(x) = λ, x+j = 0 (λ+j > 0, λ−
j = 0)
Ikeda (ISM) Sparse modeling 3/July/2015 38 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
Depending on λ, the set of “Active” component, which are non-zero xjchanges. Let us define this set as A. A = {j : xj(λ) = 0}
A member of Active set
j ∈ A ⇒ ∂xjL(x(λ)) = −sgn(xj(λ))λ
A component which is “not” Active
j /∈ A ⇒ |∂xjL(x(λ))| ≤ λ
Ikeda (ISM) Sparse modeling 3/July/2015 39 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
Let xA be the x with only the active components j ∈ A. From the fact∂xA∂xAL(x(λ)) = 2At
AAA, we can derive the following relation.
∂xA∂xAL(x(λ))∂xA(λ)
∂λ= −sgn(xA(λ))
∂xA(λ)
∂λ= −
(∂xA∂xAL(x(λ))
)−1sgn(xA(λ))
∂xA(λ)
∂λ= −1
2
(At
AAA
)−1sgn (xA(λ))
Ikeda (ISM) Sparse modeling 3/July/2015 40 / 56
How to solve the problems LASSO: LARS
Solving LASSO with LARS
Algorithm
1. x = 0, A = {argmaxj |ytaj |}, dA = −sgn (−ytaA), dAc = 0
2. If max |∂xjL(x)| > 0, move to the following.
i s1 = min{s > 0 : |∂xjL(x+ sd)| = |∂xAL(x+ sd)|, j /∈ A}s2 = min{s > 0 : (x+ sd)j = 0, j ∈ A}s = min(s1, s2)
ii x← x+ dγIf s = s1, add j, which achieves s1, to A.If s = s2, remove j, which achieves s2, from A.
iii Update dA = −(AtAAA)
−1sgn(∂xAL(x)), dAc = 0.iv Move to step 2.
Ikeda (ISM) Sparse modeling 3/July/2015 41 / 56
How to solve the problems LASSO: LARS
An example of LARS
0 2 4 6 8 10 12 14 16 18−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
Σ |βj|
β j
(∑|βj | is equivalent to |x|ℓ1 .)
Ikeda (ISM) Sparse modeling 3/July/2015 42 / 56
How to solve the problems LASSO: LARS
An example of LARS
2 4 6 8 10 12 14 16 185
6
7
8
9
10
11
12
13
14
log
λ
Σ|βj|
(∑|βj | is equivalent to |x|ℓ1 .)
Ikeda (ISM) Sparse modeling 3/July/2015 43 / 56
How to solve the problems LASSO: Quadratic Programming
Some applications
How to solve the problemsCompressed Sensing: Linear ProgrammingLASSO: LARSLASSO: Quadratic ProgrammingLASSO: Iterative Shrinking Algorithm
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 44 / 56
How to solve the problems LASSO: Quadratic Programming
Quadratic Programming (QP)
Let u,f ∈ ℜn be n-dimensional real vectors, H ∈ ℜn×n be a positivesemidefinite matrix, A ∈ ℜm×n, and b ∈ ℜm.
Standard form
minu
1
2utHu+ f tu
subject to
Au ≤ b.
▶ Basic problem in optimization theory.
▶ Lot of numerical packages exist.
▶ Large scale problems can be solved.
Solve LASSO with quadratic programming.
Ikeda (ISM) Sparse modeling 3/July/2015 45 / 56
How to solve the problems LASSO: Quadratic Programming
Solving LASSO with Quadratic Programming
LASSO
minx
[∥∥y −Ax∥∥2ℓ2+ λ
∥∥x∥∥ℓ1
]=min
x
[(y −Ax
)t(y −Ax
)+ λ
∑j
|xj |]
=yty +minx
[(xtAtAx− 2ytAx
)+ λ
∑j
|xj |]
Ikeda (ISM) Sparse modeling 3/July/2015 46 / 56
How to solve the problems LASSO: Quadratic Programming
Solving LASSO with Quadratic Programming
Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0,
LASSO
minx+,x−
[(x+ − x−)tAtA(x+ − x−)− 2ytA(x+ − x−)
)+ λ(x+ + x−)
]subject to
x+j , x−j ≥ 0 j = 1, · · · , n.
Satisfying xj = x+j − x−j , one of x+j and x−j becomes 0 because of the lastterm of the above formulation.
Ikeda (ISM) Sparse modeling 3/July/2015 47 / 56
How to solve the problems LASSO: Quadratic Programming
Solving LASSO with Quadratic Programming
Let γ = (x+1 , · · · , x+n , x−1 , · · · , x−n )t,
LASSO
minγ
[γt
(AtA −AtA−AtA AtA
)γ +
(−2Aty + λ1n2Aty + λ1n
)t
γ]
subject to − E2nγ ≤ 02n.
Ikeda (ISM) Sparse modeling 3/July/2015 48 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
Some applications
How to solve the problemsCompressed Sensing: Linear ProgrammingLASSO: LARSLASSO: Quadratic ProgrammingLASSO: Iterative Shrinking Algorithm
Conclusion
Ikeda (ISM) Sparse modeling 3/July/2015 49 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
LASSO: Iterative Shrinking Algorithm
Quadratic programming cannot solve very large problems. We need adifferent approach.
Surrogate function
f(x) =∥∥y −Ax
∥∥2ℓ2+ λ
∥∥x∥∥ℓ1
d(x,x0) =c
2
∥∥x− x0
∥∥2ℓ2− 1
2
∥∥Ax−Ax0
∥∥2ℓ2
Set c larger than the largest eigen value of AtA, and ∂2d(x,x0)/∂x∂x ispositive definite matrix.
New cost function
Q(x,x0) = f(x) + d(x,x0)
x = argminx
Q(x,x0)
Ikeda (ISM) Sparse modeling 3/July/2015 50 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
LASSO: Iterative Shrinking Algorithm
Q(x,x0) = Const− 2xt[At(y −Ax0) + cx0] + λ∥∥x∥∥
ℓ1+ c
∥∥x∥∥2ℓ2
Minimize Q(x,x0) with respect to x. Let v =1
cAt(y −Ax0) + x0 and
x = argminx
Q(x,x0) = argminx
[−2cxtv + λ
∥∥x∥∥ℓ1+ c
∥∥x∥∥2ℓ2
]= argmin
x
[λ∥∥x∥∥
ℓ1+ c
∥∥x− v∥∥2ℓ2
]= argmin
x
[λc
∥∥x∥∥ℓ1+
∥∥x− v∥∥2ℓ2
]
Ikeda (ISM) Sparse modeling 3/July/2015 51 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
LASSO: Iterative Shrinking Algorithm
x = argminx
[λc
∑i
|xi|+∑i
(xi − vi)2]
each component becomes independent.
xi = argminx
[λc|xi|+ (xi − vi)
2]
Ikeda (ISM) Sparse modeling 3/July/2015 52 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
LASSO: Iterative Shrinking Algorithm
Let us define Sω(v) as follows,
Sω(v) =
{0 |v| < ω/2
v − sgn(v)ω
2|v| ≥ ω/2
then,
xi = Sλ/c(vi)
thus, x = argminxQ(x,x0) can be solved efficiently.
Ikeda (ISM) Sparse modeling 3/July/2015 53 / 56
How to solve the problems LASSO: Iterative Shrinking Algorithm
LASSO: Iterative Shrinking Algorithm
Let us consider the following update rule.
xk+1 = argminx
Q(x,xk) = Sλ/c
(1cAt(y −Axk) + xk
)Because Q(x,xk) = f(x) + d(x,xk) and d(x,xk) ≥ 0,
f(xk+1) ≤ Q(xk+1,xk) = minx
Q(x,xk) ≤ Q(xk,xk) = f(xk)
Therefore, by repeating, f(x) is minimized with respect to x.
Daubechies, Defrise, & De-Mol, (2004). “An iterative thresholdingalgorithm for linear inverse problems with a sparsity constraint,”Commin. on Pure and Applied Math., LVII, 1413-1457.
Ikeda (ISM) Sparse modeling 3/July/2015 54 / 56
Conclusion
Conclusion
Solving Compressed Sensing and LASSO
▶ Introduced strategies for solving Compressed Sensing and LASSO.
▶ Compressed sensing can be solved with Linear Programming.
▶ LASSO can be solved with LARS, Quadratic Programming, andIterative Shrinking Algorithm, depending on the size of the problem.
Ikeda (ISM) Sparse modeling 3/July/2015 55 / 56