Sparse modeling 2 - 東京工業大学watanabe- · Sparse modeling 2 Shiro Ikeda The ... Black hole is \black," but surrounding area emit light. ... Size of black holes Ikeda (ISM)

$Page 1: Sparse modeling 2 - 東京工業大学watanabe- · Sparse modeling 2 Shiro Ikeda The ... Black hole is \black," but surrounding area emit light. ... Size of black holes Ikeda (ISM)$
Sparse modeling 2

Shiro Ikeda

The Institute of Statistical Mathematics

3 July 2015

Ikeda (ISM) Sparse modeling 3/July/2015 1 / 56

Some applications

Sparsity and Information Processing

minx

[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1

]estimation, model selection Reflection seismology

estimation, prediction, model selection fMRI and EMG

inverse problem, imaging Imaging of black hole

X ≃ L+M , L: low-rank, M : sparse

sparse matrix Movie analysis


Some applications Reflection seismology

Some applicationsReflection seismologyfMRI and EMGImaging of black holeSparse matrix

How to solve the problems

Conclusion



Reflection seismology minx

[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1

]▶ Investigate the structure

underground.

▶ Make a sound with aloud-speaker and recordthe reflections.

▶ Reflections only occurs atthe boundaries.

Taylor, Banks, & McCoy (1979). “Deconvolution with the ℓ1 norm,”Geophysics, 44(1), 39-52.

Santosa & Symes (1986). “Linear inversion of band-limited reflectionseismograms,” SIAM. J. Sci. Stat. Comp., 7(4), 1307-1330.



Reflection seismology

g(t): Sound recorded by the microphone

f(t): Sound from loud-speaker

h(t): Impulse response

0time

r(t) h(t) is non-zero only whenthere is a reflection. Thenumber of the boundaries is notlarge, and it is sparse.




g(t) =

∫ t

0f(t− τ)h(τ)dτ

discretize as follows

g = (g1, · · · , gn)T , gi = g(i∆t)

F = (Fij), Fij = f((i− j + 1)∆t)

h = (h1, · · · , hn)T , hi = h(i∆t)

then,

g(i∆t) =∑j

f((i− j + 1)∆t)h(j∆t).




g(i∆t) =∑j

f((i− j + 1)∆t)h(j∆t)

g = Fh

h is sparse, and we can use the LASSO framework.

minh

[∥g − Fh∥2ℓ2 + λ∥h∥ℓ1

].


Some applications fMRI and EMG



Conclusion



fMRI and EMG minx

[∥y − Ax∥2ℓ2 + λ∥x∥ℓ1

]Ask the subject to control the sum of the EMG, and record the EMGof agonist and antagonist muscles

Ganesh, Burdet, Haruno, & Kawato (2008). “Sparse linear regressionfor reconstructing muscle activity from human cortical fMRI,”NeuroImage, 42(4), 1463-1472.



fMRI and EMG

Torque corresponding to EMG



fMRI and EMG

Voxels of the brain related to the EMG



fMRI and EMG

Estimated EMG


Some applications Imaging of black hole



Conclusion



Black hole (Joint work with Mareki Honma at NAOJ)

VLBI: Very Long Baseline Interferometer

▶ Resolution of a telescope Θ depends on the aperture D and thewavelength λ as

Θ ∝ λ

D

for example, for λ ≃ 1mm and D ≃ 8000km, Θ ≃ 25µarcsecond



Interferometer



Target of VLBI



EHT: Event Horizon Telescope



Imaging of black hole

VLBI: Very Long Baseline Interferometer

▶ Our goal is to take a image of a black hole with the interferometer.

▶ Black hole is “black,” but surrounding area emit light.



Size of black holes



Resolution of a telescope



Interferometer

Basic equation

The relation between image I(x, y) and the observation at the position(u, v), S(u, v) is

I(x, y) =

∫ ∫S(u, v)e−2πi(ux+vy)dudv

S(u, v) =

∫ ∫I(x, y)e2πi(ux+vy)dxdy.

Observation and the image are related with the Fourier transform.



Interferometer

Measurements and S(u, v)



Interferometer

Problem

Ideally, compute the image I(x, y) by using the inverse Fourier transformto S(u, v), but since the observation is only measured partially, there is adifficulty.

S(u, v)

2-dim Fouriertransformation

↔

I(x, y)



Interferometer

Problem

Applying LASSO.



Inverse problem

Problem

Simulate black hole image, and compute the observation where the pointsof observation is 2/5 of the whole points. Then, reconstruct the image isreconstructed with LASSO.

black hole image

Image computed byLASSO

Reconstructed image

5 10 15 20 25 30

5

10

15

20

25

30

Real observation data will be provided in 2015–2016. We prepare for dataanalysis.


Some applications Sparse matrix



Conclusion



Separation of a matrix

Separation of a matrix

M = L+ S

minL,S

[∥∥M∥∥∗ + λ

∥∥S∥∥ℓ1

]subject to M = L+ S.

∥∥L∥∥∗ = tr(√LTL) =

∑σ(L) is the nuclear norm, which is the sum of

singular values.∥∥S∥∥

ℓ1=

∑ij |Sij |．



Application

Candes, Li, Ma, & Wright (2011). “Robust principal componentanalysis?,” J. ACM, 58(3).



Application


How to solve the problems Compressed Sensing: Linear Programming

Some applications

How to solve the problemsCompressed Sensing: Linear ProgrammingLASSO: LARSLASSO: Quadratic ProgrammingLASSO: Iterative Shrinking Algorithm

Conclusion



Linear Programming

Let u, c ∈ ℜn be n-dimensional real vectors, A ∈ ℜm×n, and b ∈ ℜm.

Standard form

minu

ctu subject to Au ≤ b.

▶ Basic problem in optimization theory.

▶ Lot of numerical packages exist.

▶ Solved by simplex- or interior- methods.

▶ Large scale problems can be solved efficiently.

Solve compressed sensing with linear programming.



Linear Programming

Compressed Sensing

minx

∥∥x∥∥ℓ1

subject to Ax = y

Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0,

minx+,x−

1tn(x+ + x−) subject to A(x+ − x−) = y



Linear Programming

Let γ = (x+1 , · · · , x+n , x−1 , · · · , x−n )t. For optimal γ, at least one of x+j

and x−j becomes 0 for all j.

minγ

1t2nγ subject to

(A −A−A A

)γ ≤

(y−y

), −E2nγ ≤ 02n


How to solve the problems LASSO: LARS

Some applications


Conclusion



Solving LASSO with LARS

LASSO problem

minx

[∥y −Ax∥2ℓ2 + λ∥x∥ℓ1

], λ > 0

LARS-LASSO

Solve the above problem for many λ’s.

Efron, Hastie, Johnstone, & Tibshirani, (2004). “Least angleregression,” The Annals of Statistics, 32(2), 407-499.

Rosset & Zhu, (2007). “Piecewise linear regularized solution paths,”The Annals of Statistics, 35(3), 1012–1030.




LASSO

Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0, and

minx+,x−

[∥∥y −A(x+ − x−)∥∥2ℓ2+ λ

n∑j=1

(x+j + x−j )]

subject to x+j , x−j ≥ 0 j = 1, · · · , n.

Lagrange function

minx+,x−

[L(x) + λ

n∑j=1

(x+j + x−j )−n∑

j=1

λ+j x

+j −

n∑j=1

λ−j x

−j

]

where L(x) =∥∥y −A(x+ − x−)

∥∥2ℓ2. Let x(λ) be the optimal x for a λ.




Karush-Kuhn-Tucker Condition

∂xjL(x(λ)) + λ− λ+j = 0

−∂xjL(x(λ)) + λ− λ−j = 0

λ+j x

+j = 0, λ+

j ≥ 0

λ−j x

−j = 0, λ−

j ≥ 0 j = 1, · · · , n.

Study the above conditions, where

∂xjL(x) =∂L(x)

∂xj= −2(y −Ax)taj

and aj = (a1j , · · · , amj)t is the j-th column vector of A.




Optimal xj(λ) falls into one of the following cases (λ > 0).

(i) x−j = 0, x+

j = 0

|∂xjL(x)| ≤ λ, x−j = 0, x+j = 0.

(ii) x+j > 0,

∂xjL(x) = −λ, x−j = 0 (λ+j = 0, λ−

j > 0)

(iii) x−j > 0,

∂xjL(x) = λ, x+j = 0 (λ+j > 0, λ−

j = 0)




Depending on λ, the set of “Active” component, which are non-zero xjchanges. Let us define this set as A. A = {j : xj(λ) = 0}

A member of Active set

j ∈ A ⇒ ∂xjL(x(λ)) = −sgn(xj(λ))λ

A component which is “not” Active

j /∈ A ⇒ |∂xjL(x(λ))| ≤ λ




Let xA be the x with only the active components j ∈ A. From the fact∂xA∂xAL(x(λ)) = 2At

AAA, we can derive the following relation.

∂xA∂xAL(x(λ))∂xA(λ)

∂λ= −sgn(xA(λ))

∂xA(λ)

∂λ= −

(∂xA∂xAL(x(λ))

)−1sgn(xA(λ))

∂xA(λ)

∂λ= −1

2

(At

AAA

)−1sgn (xA(λ))




Algorithm

1. x = 0, A = {argmaxj |ytaj |}, dA = −sgn (−ytaA), dAc = 0

2. If max |∂xjL(x)| > 0, move to the following.

i s1 = min{s > 0 : |∂xjL(x+ sd)| = |∂xAL(x+ sd)|, j /∈ A}s2 = min{s > 0 : (x+ sd)j = 0, j ∈ A}s = min(s1, s2)

ii x← x+ dγIf s = s1, add j, which achieves s1, to A.If s = s2, remove j, which achieves s2, from A.

iii Update dA = −(AtAAA)

−1sgn(∂xAL(x)), dAc = 0.iv Move to step 2.



An example of LARS

0 2 4 6 8 10 12 14 16 18−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

Σ |βj|

β j

(∑|βj | is equivalent to |x|ℓ1 .)



An example of LARS

2 4 6 8 10 12 14 16 185

6

7

8

9

10

11

12

13

14

log

λ

Σ|βj|

(∑|βj | is equivalent to |x|ℓ1 .)


How to solve the problems LASSO: Quadratic Programming

Some applications


Conclusion



Quadratic Programming (QP)

Let u,f ∈ ℜn be n-dimensional real vectors, H ∈ ℜn×n be a positivesemidefinite matrix, A ∈ ℜm×n, and b ∈ ℜm.

Standard form

minu

1

2utHu+ f tu

subject to

Au ≤ b.

▶ Basic problem in optimization theory.

▶ Lot of numerical packages exist.

▶ Large scale problems can be solved.

Solve LASSO with quadratic programming.



Solving LASSO with Quadratic Programming

LASSO

minx

[∥∥y −Ax∥∥2ℓ2+ λ

∥∥x∥∥ℓ1

]=min

x

[(y −Ax

)t(y −Ax

)+ λ

∑j

|xj |]

=yty +minx

[(xtAtAx− 2ytAx

)+ λ

∑j

|xj |]




Let xj = x+j − x−j , x+j ≥ 0, x−j ≥ 0,

LASSO

minx+,x−

[(x+ − x−)tAtA(x+ − x−)− 2ytA(x+ − x−)

)+ λ(x+ + x−)

]subject to

x+j , x−j ≥ 0 j = 1, · · · , n.

Satisfying xj = x+j − x−j , one of x+j and x−j becomes 0 because of the lastterm of the above formulation.




Let γ = (x+1 , · · · , x+n , x−1 , · · · , x−n )t,

LASSO

minγ

[γt

(AtA −AtA−AtA AtA

)γ +

(−2Aty + λ1n2Aty + λ1n

)t

γ]

subject to − E2nγ ≤ 02n.


How to solve the problems LASSO: Iterative Shrinking Algorithm

Some applications


Conclusion



LASSO: Iterative Shrinking Algorithm

Quadratic programming cannot solve very large problems. We need adifferent approach.

Surrogate function

f(x) =∥∥y −Ax

∥∥2ℓ2+ λ

∥∥x∥∥ℓ1

d(x,x0) =c

2

∥∥x− x0

∥∥2ℓ2− 1

2

∥∥Ax−Ax0

∥∥2ℓ2

Set c larger than the largest eigen value of AtA, and ∂2d(x,x0)/∂x∂x ispositive definite matrix.

New cost function

Q(x,x0) = f(x) + d(x,x0)

x = argminx

Q(x,x0)




Q(x,x0) = Const− 2xt[At(y −Ax0) + cx0] + λ∥∥x∥∥

ℓ1+ c

∥∥x∥∥2ℓ2

Minimize Q(x,x0) with respect to x. Let v =1

cAt(y −Ax0) + x0 and

x = argminx

Q(x,x0) = argminx

[−2cxtv + λ

∥∥x∥∥ℓ1+ c

∥∥x∥∥2ℓ2

]= argmin

x

[λ∥∥x∥∥

ℓ1+ c

∥∥x− v∥∥2ℓ2

]= argmin

x

[λc

∥∥x∥∥ℓ1+

∥∥x− v∥∥2ℓ2

]




x = argminx

[λc

∑i

|xi|+∑i

(xi − vi)2]

each component becomes independent.

xi = argminx

[λc|xi|+ (xi − vi)

2]




Let us define Sω(v) as follows,

Sω(v) =

{0 |v| < ω/2

v − sgn(v)ω

2|v| ≥ ω/2

then,

xi = Sλ/c(vi)

thus, x = argminxQ(x,x0) can be solved efficiently.




Let us consider the following update rule.

xk+1 = argminx

Q(x,xk) = Sλ/c

(1cAt(y −Axk) + xk

)Because Q(x,xk) = f(x) + d(x,xk) and d(x,xk) ≥ 0,

f(xk+1) ≤ Q(xk+1,xk) = minx

Q(x,xk) ≤ Q(xk,xk) = f(xk)

Therefore, by repeating, f(x) is minimized with respect to x.

Daubechies, Defrise, & De-Mol, (2004). “An iterative thresholdingalgorithm for linear inverse problems with a sparsity constraint,”Commin. on Pure and Applied Math., LVII, 1413-1457.


Conclusion

Conclusion

Solving Compressed Sensing and LASSO

▶ Introduced strategies for solving Compressed Sensing and LASSO.

▶ Compressed sensing can be solved with Linear Programming.

▶ LASSO can be solved with LARS, Quadratic Programming, andIterative Shrinking Algorithm, depending on the size of the problem.


Conclusion

Report

Questions▶ Explain what is the sparse modeling.

▶ Explain an example of the sparse modeling application.

▶ Propose a new application where sparse modeling is effective.


Documents

Sparse modeling 2 - 東京工業大学watanabe- · Sparse modeling 2 Shiro Ikeda The ... Black hole is \black," but surrounding area emit light. ... Size of black holes Ikeda (ISM)