Upload
phamduong
View
238
Download
0
Embed Size (px)
Citation preview
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Numerical Methods forPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Lars RuthottoDepartment of Mathematics and Computer ScienceEmory University
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 1
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Thank You
I Monika Wolfmayr and Marie-Therese Wolfram for kind invitation, coveringtravel expenses
I Doktorandenkolleg for picking this great place, organizing, shuttle,accommodation,. . .
I Dr. Ettinger and Dr. Carr for teaching my classesI Participants
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 2
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Motivation and Introduction
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 3
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Outline and Assumptions
About meI 2012: PhD, in Munster (advisors: Martin Burger, Jan Modersitzki)I 2013-14: PostDoc at UBC, Vancouver (advisor: Eldad Haber)I since 2014: Assistant professor at Emory University, Atlanta, GA
My assumptions on youI solid background on PDEs, variational calculusI interested and experiences in PDE modeling and applicationsI basic knowledge about numerical linear algebra and optimizationI some programming skills
Learning objectivesI introduction to optimization with PDE constraintsI overview about some terminology and fundamental ideasI shed light into the ”black box” optimization approachI give pointers to software and literature for future studyI setup and solve a PDE parameter estimation problem
course website: http://tinyurl.com/PDECO
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 4
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
PDE Constrained Optimization Problem
aka Optimal Control
minu,m
Φ(u,m) subject to C(u,m) = 0,
whereI m - control (what we can modify to minimize the objective)I u - state (the response of the PDE system)I Φ - objective functionI C - PDE constraint. Examples:
I Poisson’s: C(u,m) = ∇ · (σ(m)∇u)− q = 0I Advection: C(u,m) = ∂tu(x, t) + m(x, t)∇u(x, t) = 0 for all t > 0 and u(x, 0) = u0I Eikonal Eq.: C(u,m) = ‖∇u(x)‖2
2 − m(x) = 0 with u(x0) = 0 for some x0
(Some) challenges in PDE Constrained Optimization:I ensure sound interplay numerical PDEs, linear algebra, optimizationI problem is infinite dimensional both in m and C
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 5
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Reduced Formulation
aka ”black box” or simulation based approachAssume, u is uniquely defined through constraint (boundary conditions).
Use Implicit Function Theorem and define u(m) where
u(m) where C(u(m),m) = 0.
Use this to eliminate PDE constraint and solve
minm
Φ(u(m),m)
unconstrained problem PDE solved in each iteration need to computesensitivity matrix J = ∇mu(m) no need to store fields and Lagrange multipliers
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 6
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Course OutlineGoal: Given receivers pi, sources qj and data dij find model by solving reducedformulation of a PDE-constrained problem
minm
∑i,j
‖p>i uj(m)− dij‖2 + α‖Lm‖2
where α > 0 and L are given and uj(x) solves Poisson’s equation with coefficientsσ(m) = exp(m)
∇ · (σ(m)∇uj) = qj on Ω with ∇uj · n = 0 on ∂Ω
Outline:I Discretizing the Forward ProblemI Crash Course on Numerical Linear AlgebraI Crash Course on Numerical OptimizationI Sensitivity ComputationI Implementation and TestingI Outlook and Perspectives
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 7
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Direct Current Resistivity (Geophysics)
Goal: Estimate conductivity of subsurface.
u1, q1 u2, q2 u3, q3
Let Ω ⊂ R3. For given source qj the potential field uj can be simulated via ∇ · (σ(m)∇uj(x)) = qj(x) for x ∈ Ω~n∇uj(x) = 0 for x ∈ ∂Ω
uj(0) = 1.
E. Haber. Computational Methods inGeophysical Electromagnetics.SIAM, Philadelphia, 2014.
L. Ruthotto, E. Treister, E. Haber. jInv - AFlexible Julia Package for PDE ParameterEstimation.preprint available at arxiv, 2016.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 8
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Some References for This Course
PDE-Constrained Optimization
E. Haber.Computational methods in geophysicalelectromagnetics.SIAM, 2015.
A. Borzı and V. Schulz.Computational optimization of systems governed bypartial differential equations.SIAM, 2012.
R. Herzog and K. Kunisch.Algorithms for PDE-constrained optimization.GAMM-Mitteilungen, 2010.
M. Hinze, R. Pinnau, M. Ulbrich, S. Ulbrich.Optimization with PDE Constraints.Springer, 2009.
Numerical Optimization
A. Beck.Introduction to Nonlinear Optimization.SIAM, 2014.
J. Nocedal and S. Wright.Numerical Optimization.Springer, 2009.
Numerical Linear Algebra
C. Greif and U. AscherA First Course on Numerical Methods.SIAM, 2011.
Y. Saad.Iterative Methods for Sparse Linear Systems.SIAM, 2003.
Notes: small (subjective selection) ask me for more reading!
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 9
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discretizing the Forward Problem
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 10
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discretizing Poisson’s Equation
regular mesh partial derivatives
u(xi, yj)
u(xi+1, yj)
u(xi, yj+1)
u(xi+1, yj+1)
∂hx u(xi +
h2 , yj) ∂h
x u(xi +h2 , yj+1)
∂hy u(xi+1, yj +
h2 )
∂hy u(xi, yj +
h2 )
I m, σ - model and conductivity discretized in cell-centersI uj - fields discretized in nodesI approximate partial derivatives on edges
∂hx u(
xi +h2, yj
)≈ 1
h(u(xi+1, yj)− u(xi, yj))
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 11
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Julia: Building the Gradient
Represent ∇ = [∂x, ∂y, ∂z] as sparse matrix.
The 1D finite difference operator is
1 ddx(n,h) = spdiagm((-1/h*ones(n),1/h*ones(n)), (0,1),n, n+1)
Testing it:
1 n = 512;2 D = ddx(n,1/n);3 x = linspace(0,1,n+1);4 xc = x[1:end-1]+diff(x)/2;5 plot(xc,2*pi*cos(2*pi*xc),xc,D*sin(2*pi*x))
Homework: change n and see how accurate you get
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 12
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Coding ∇u = (∂xu, ∂yu) in 2D
Example: Regular 2× 2 grid. D is the 1D derivative matrix, U is the 2D fieldreshaped into a matrix.
Ux ≈ 1h
(−1 1 00 −1 1
)= D U I= D U I= D U I= D U I
Similarly
Uy ≈ 1h
−1 01 −10 1
= I U D>
Use Kronecker product rule
AUB> = (B⊗ A)vec(U)
to get Ux ≈ (I⊗ D) vec(U) and Uy ≈ (D⊗ I) vec(U).
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 13
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Coding ∇u = [∂xu, ∂yu] in 2/3D
Use it to code the gradient
1 function getNodalGradient(n,h)23 # setup the operator in 1D4 ddx(n,h) = spdiagm((-1/h*ones(n),1/h*ones(n)), (0,1),n, n+1)56 # compute 1D matrices7 d1 = ddx(n[1],h[1]); d2 = ddx(n[2],h[2]);8 Grad = [kron(speye(n[2]+1),d1); kron(d2,speye(n[1]+1))]9
10 return Grad11 end
Since ∇· = (∂x ∂y) = ∇> we use
1 Div = Grad’2 Lap = Grad’*Grad
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 14
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discretizing the PDE in 2DGoal: Discretize Poisson’s equation in 2D
∇ · (σ(m)∇u) = q.
Writing the equation in weak form gives
⇒ −(σ(m)∇u,∇p)L2 = (q, p)L2 for all p.
Terms in (·, ·)L2 are discretized on staggered grid.Example: ∂xu is cell-centered in x and nodal in y direction.
Consider 1D inner product (σf , e)L2 where σ is discretizes in cell-centers and f , e arediscretized in nodes.
Two options to discretize inner product1. (σf , e)L2 ≈ h
4
∑nk=1 σk+1/2(fk + fk+1)(ek + ek+1)
2. (σf , e)L2 ≈ h4
∑nk=1 σk+1/2(fkek + fk+1ek+1)
First option introduces artificial nullspace! Take f = (1,−1, 1,−1, . . . , 1,−1)>
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 15
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discretizing Inner Product in 1DGoal: Discretize (σf , e)L2 for σ discretized in cell-centers and f , e discretized innodes.
(σf , e)L2 ≈h4
n∑k=1
σk+1/2(fkek + fk+1ek+1) = (h σ)>A(f e) = f>diag(A>(h σ)
)e.
Here,I is the (component-wise) Hadamard productI A is a node-to-center average matrix
A =12
1 1
1 1. . . . . .
1 1
∈ Rn×n+1
In Julia, build this matrix by
av(n) = spdiagm((fill(.5,n),fill(.5,n)),(0,1),n,n+1)
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 16
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discretizing Poisson’s Equation in 3D
Recall: ∇ · (exp(m)∇u) = q G>diag(A>e (v exp(m)))G u = q
STEP 1: Build average matrices for each component usingKronecker products and stack them
1 I(i) = speye(Mesh.n[i])2 A1 = kron(I(3),kron(I(2),av(Mesh.n[1])))3 A2 = kron(I(3),kron(av(Mesh.n[2]),I(1)))4 A3 = kron(av(Mesh.n[3]),kron(I(2),I(1)))5 Ae = [A1 A2 A3]
STEP 2: Combine with gradient approximation to getdiscretized PDE
1 G = getNodalGradient(Mesh)2 V = getVolume(Mesh)3 A = G’*sdiag( Ae’*(V*sigma)))*G4 A[1,1] += h[1] # fix one point
2D sketch:
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 17
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Example: Discretization of Differential OperatorsJulia’s multiple dispatch simplifies comparison of different discretizations.
regular mesh tensor mesh
For regular mesh
using jInv.MeshM = getRegularMesh(domain,n)Div = getDivergenceMatrix(M)Grad = getNodalGradientMatrix(M)Curl = getCurlMatrix(M)Ae = getEdgeAverageMatrix(M)
For tensor mesh
using jInv.MeshM = getTensorMesh3D(h1,h2,h3)Div = getDivergenceMatrix(M)Grad = getNodalGradientMatrix(M)Curl = getCurlMatrix(M)Ae = getEdgeAverageMatrix(M)
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 18
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Σ : Discretizing the Forward Problem
Finite volume discretization:
∇ · (σ(m)∇u) = q G>diag(A>e (v exp(m)))G u = q ⇔ A(σ(m)) u = q
I u - field, discretized on nodesI m - model, discretized on cell-centers (could be different mesh)I A - discretized PDEI solving PDE solving sparse linear systemI size of linear system depends on number of nodes in meshI modularity w.r.t. meshI next: PDE solvers
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 19
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Crash Course: Numerical LinearAlgebra
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 20
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
NLA Challenges in PDE Constrained Optimization
Goal: Given discrete receivers pk, sources qj and data djk find model by solvingreduced formulation of a PDE-constrained problem
minm
∑i,j
‖p>i A(σ(m))−1qj − djk‖2 + α‖Lm‖2,
where α > 0 and L are given and A is our discretized PDE.I Linear PDE solvers
I Linear PDEs large sparse linear systemI potentially many different linear systems involved (frequencies, meshes, . . . )I coefficients can be very inhomogeneous (Ex: water, air, salt,. . . )I depending on size / computing power use iterative or direct solver
I Numerical optimization (will be discussed later)I Gauss Newton requires solving real symmetric positive semidefinite linear systemI size of linear system equals size of model can be largeI (approximate) Hessian often not formed explicitly use iterative method
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 21
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Condition Number
Let A ∈ Rn×n be SPD. Then the Schur decomposition is
A = UΣU>,
with UU> = U>U = In and Σ = diag(λ1, . . . , λn).
The condition number is defined as
cond(A) =λmax(A)
λmin(A)
−2 −1 0 1 2−2
−1
0
1
2
−2 −1 0 1 2−2
−1
0
1
2
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 22
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Positive (semi)-definiteness - 1
Let A ∈ Rn×n be symmetric and consider the quadratic form
f (x) = x>Ax.
Then A isI positive definite if f (x) > 0 for all x 6= 0I positive semi-definite if f (x) ≥ 0 for all xI negative definite if f (x) < 0 for all x 6= 0I indefinite if ∃x, y ∈ Rn : f (x) > 0 and f (y) < 0.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 23
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Positive (semi)-definiteness - 2
−2 −1 0 1 2−2
−1
0
1
2
1
1
1
2
2
2
2
3
3
3
3
4
4
5
5
6
6
7
7
−2 −1 0 1 2−2
−1
0
1
2
1
1
2
2
3
3
4
4
5
5
5
5
66
6
6
7
7
7
−2 −1 0 1 2−2
−1
0
1
2
2
2
4
4
6
6
8
8
10
10
12
12
14
14
−2 −1 0 1 2−2
−1
0
1
2
−14
−14
−12
−12
−10
−10
−8
−8
−6
−6
−4
−4
−2
−200
2
2
4
4
6
6
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 24
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Matrix Factorizations
Let A ∈ Rn×n be nonsingular.I LU-Factorization
PA = LU,
with P permutation, L unit lower triangular, U upper triangular. Cost ≈ 2n3/3.I Cholesky (LU for A SPD)
PTAP = LLT .
(symmetric) permutations can be used to keep sparsity of L. Cost ≈ n3/3.I QR-Factorization
AP = QR,
Q is orthogonal. Cost ≈ 4n3/3.I Singular Value Decomposition
A = U[
S0
]VT ,
U ∈ Rm×m and V ∈ Rn×n orthogonal, S ∈ Rn×n diagonal.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 25
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Conjugate Gradient MethodLet A ∈ Rn×n be SPSD and b ∈ Rn be given. Solve
minx
12
x>Ax− b>x ⇔ Ax∗ = b
Algorithm 1 CG1: set r = b,p = r2: γ0 = ‖p‖2, x = 03: for i = 1, 2, . . . do4: w = Ap5: α = γi−1/(p>w)6: x = x + αp7: r = r− αw8: γi = ‖r‖2
9: p = r + (γi/γi−1)p10: end for
exact solution in n steps often fast convergence only needs Ap inpractice requires preconditioner
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 26
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Preconditioned Conjugate Gradient MethodConvergence of CG depends on spectral properties.Idea:
Ax = b ⇔ M−1Ax = M−1 b,
where M is a preconditioner that satisfies M−1A ≈ I.
Some common examples:I Jacobi-Preconditioner
M−1 = D−1, where D = diag(A)
I SSOR Preconditioner
M = L−1DU−1,
L,U lower and upper triangular part of A, resp.I Incomplete Cholesky M−1 ≈ A−1 (sparse
approximation)
Example: 2D Poisson
exPCGforPoisson.ipynb
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 27
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Crash Course: NumericalOptimization
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 28
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Course Outline
Goal: Given discrete receivers pk, sources qj and data djk find model by solvingreduced formulation of a PDE-constrained problem
minm
∑j,k
‖p>k A(σ(m))−1qj − djk‖2 + α‖Lm‖2
where α > 0 and L are given and A is our discretized PDE.
Outline:I Discretizing the Forward ProblemI Crash Course on Numerical Linear AlgebraI Crash Course on Numerical OptimizationI Sensitivity ComputationI Implementation and TestingI Outlook and Perspectives
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 29
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Line Search Methods: Motivation
Let Φ : Rn → R be smooth and consider
minm∈Rn
f (m).
Recall: Necessary condition is ∇Φ(m) = 0 (n equations / n unknowns).
If n = 1 we can simply plot Φ or Φ′ on some interval [a, b] and ”see” the minimum.
If n 1 plotting is no option. Idea: Start with some m0 ∈ Rn and for k > 1 do1. Pick a good direction pk (descent direction)2. Solve αk = arg minα Φ(mk + αpk) (line search)3. Update mk+1 = mk + αkpk
until tired or ‖∇Φ(mk)‖ small enough.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 30
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Line Search Problem
objective function linesearch
−1.5 −1 −0.5 0 0.5
−0.4
−0.2
0
0.2
0.4
0 0.2 0.4 0.6 0.83
4
5
6
Let Φ : Rn → R smooth be given and pk be a descent direction at mk. The linesearch problem is:
minα>0
Φ(mk + αpk).
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 31
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Backtracking
Armijo’s condition may not be sufficient (steps might be too small).Wolfe’s condition require derivative of ϕ(α) = Φ(m + αpk).Idea: Avoid too small steps by backtracked Armijo:
1 function armijo(f,fc,df,xc,pc;maxIter=10,2 alpha=1,c1=.01,b=.5)3 LS = 1;4 while LS<=maxIter5 if f(xc+alpha*pc)<=fc + alpha*c1*dot(df,pc); break; end6 alpha *= b; LS += 17 end8 if LS>maxIter9 LS= -1; alpha = 0.0
10 end11 return alpha,LS12 end
alpha=1 works for damped Newton. Different initial value of alpha and carefulupdate of b required for other methods.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 32
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Steepest Descent Method
Iteration:mk+1 = mk + αpk with pk = − ∇Φk
‖∇Φk‖.
Interpretation: Minimizer of first-order Taylor approximation
minp
Φ(mk) + p>∇Φk subject to ‖p‖2 = 1.
Why? Drop constant term in objective and apply Cauchy-Schwartz:
p>∇Φk = ‖p‖‖∇Φk‖ cos(θ) = ‖∇Φk‖ cos(θ).
Easy to see that this is minimal at pk when
cos(θ) = −1 pk = −∇Φk/‖∇Φk‖.
simple robust linesearch can be tricky often very slow.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 33
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Gradient FlowIdea: To solve, ∇Φ(m) = 0, we consider time-dependent problem
∂tm = −∇Φ(m)
with some starting guess m0. Steady state gives ∇Φ = 0.Example: Convergent for Φ(m) = 1
2 m>Am + b>m when A SPD since
∂tm = b− Am⇒ m(t) = exp(−tA)(m0 − A−1b).
Discretizing ODE using Forward Euler, we getmk+1 −mk
h= −∇Φ(mk)⇔ mk+1 = mk − h∇Φ(mk)
Stability of forward Euler requires
|1− λmax(A)h| ≤ 1⇒ h ≤ 2|λmax|
.
Gradient flow is a slow version of steepest descent, since in general
α∗ =‖Am− b‖2
(Am− b)>A(Am− b)≥ 2|λmax|
.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 34
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Example: Steepest Descent for Quadratic ObjectiveLet A ∈ Rn×n be SPD and b ∈ Rn. Consider
minm
Φ(m) with Φ(m) =12
m>Am−m>b.
The solution is, of course, m∗ = A−1b.Steepest descent direction is p = −∇Φ = b− Am and forexact line search, solve
minα
12
(m + αp)>A(m + αp)− (m + αp)>b.
The necessary optimality condition is
αp>Ap− p>(Am− b) = 0⇒ α∗ =p>∇Φ(m)
p>Ap.
Global linear convergence with factor
1− λmin(A)
λmax(A)= 1− cond(A)−1
Example 1:
Example 2:
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 35
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Example: Solving Poisson’s Equations with SD
Consider: ∆u = b on rectangular domain withu(1, x2) = u(0, x2) = u(x1, 1) = u(x1, 0) = 0.
For given h > 0 consider optimization problem
minu
12
u>∆hu− u>b.
Solver: Steepest Descent with exact linesearch.
Check out exPoisson2D.ipynb.
SD convergence is ”mesh dependent” (that is number of iterations to achievegiven tolerance depends on mesh size)
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 36
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Newton’s MethodIteration:
mk+1 = mk + pk with pk = −∇2Φ−1k ∇Φk
Interpretation 1: Minimizer of second-order Taylor expansion
pk = argminp Φk + p>∇Φk +12
p>∇2Φkp.
Interpretation 2: First-order approximation of optimality condition
0 = ∇Φ(mk+1) ≈ ∇Φ(mk) +∇2Φ(mk)pk +O(‖pk‖2) .
DiscussionI fast convergence if Φ smooth around mk or close to minimumI iterates are the same for Φ and g(m) = βΦ(m) for any β > 0.I not robust: extra work required if ∇2Φk is not SPDI need to solve linear system in each iteration
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 37
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Example: Himmelblau
minx
Φ(m) = f (m2
1 + m2 − 11)2 + (m1 + m22 − 7)2
Local minimization: Different methods different solutions
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 38
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Nonlinear Least-Squares: Gauss NewtonConsider the non-linear least-squares problem
minm
12‖r(m)‖2
,
for smooth residual r : Rn → Rm. Linearizing r around mk gives
r(mk + p) = r(mk) + J(mk)p +O(‖p‖2).
This gives a linear least-squares problem for the step
mins‖J(mk)p + r(mk)‖2 ⇒ J(mk)
>J(mk)p∗ = −J(mk)>r(mk).
Compare to Newton stepJ(mk)>J(mk) +
m∑j=1
rj(mk)∇2rj(mk)
pnt = −J(mk)>r(mk).
J>J always SPSD globally convergent (some assumptions) goodconvergence when residual small or problem almost linear
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 39
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Σ : Numerical OptimizationWe covered the basics on line search methods.
Steepest Descent (similar for other first order methods)I easy, cheap iterations, and globally convergentI line search can be trickyI ill-conditioning of Hessian affects convergenceI convergence for PDE problems mesh dependentI often very slow convergence (runtime and iterations)
(Gauss) Newton MethodsI (approximately) mesh independent convergenceI robust against ill-conditioningI no Hessian modification needed for Gauss NewtonI costly iterations: need to solve linear system in each iterationI large scale: inexact / matrix-free Newton methods
PDE Constrained OptimizationI recall: reduced formulation requires one PDE solve per iteration / line searchI use method with few iterations such as (Gauss)-NewtonI main challenge: solve approximated Newton system where J not available as
matrix PCG!
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 40
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Sensitivity Calculations
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 41
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Course Outline
Goal: Given discrete receivers pk, sources qj and data djk find model by solvingreduced formulation of a PDE-constrained problem
minm
∑j,k
‖p>k A(σ(m))−1qj − djk‖2 + α‖Lm‖2
where α > 0 and L are given and A is our discretized PDE.
Outline:I Discretizing the Forward ProblemI Crash Course on Numerical Linear AlgebraI Crash Course on Numerical OptimizationI Sensitivity ComputationI Implementation and TestingI Outlook and Perspectives
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 42
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Computing SensitivitiesKey ingredient for optimization are sensitivity matrices Jj ∈ Rnj×n that satisfy
uj(m + v) ≈ uj(m) + Jj(m)v,
for any small perturbation v ∈ Rn.
Compute Ji by differentiating both sides of constraint with respect to m:
0 =∇m (Cj(m,ujk(m)))
= ∇mCj(m,uj) +∇uj Cj(m,uj)Jj(m)
Ji(m) = − (∇uj Cj(m,uj))−1 ∇mCj(m,uj).
Example: For linear PDE constraint, we have
Jj(m) = −Aj(m)−1∇m (Aj(m)uij) .
I building Jj can be very expensive (dense, many solves required)I matrix-free implementation: provide v 7→ Jj(m)v and w 7→ w>Jj(m)
I option to keep temp results (factorizations, preconditioners, fields)
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 43
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Computing Sensitivities - 2
For linear PDE constraint, we have
Jj(m) = −Aj(m)−1∇m (Aj(m)uj) .
Now focus on ∇m (Aj(m)uj) where (with V = diag(v))
Aj(m)uj =G>diag(A>Vσ(m)))Guj
= G>diag(Guj)A>v Vσ(m).
Therefore, in our case
∇m (Aj(m)uj) = G>diag(Guj)A>v V∇mσ.
Some comments:I in our example ∇mσ = diag(exp(m))
I forming Jj requires n PDE solves where m ∈ Rn
I multiplication with Jj or J>j requires one(!) PDE solve
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 44
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Programmer’s Note: Check Derivative
See checkDerivative.ipynb.
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 45
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Solving the Inverse Problem
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 46
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Discrete Inverse ProblemAssume
dij = (pi,uj(m)) + εij for i, j,
where for given m,qj, the field ui(m) is defined implicitly by
C(m,qj,uj(m)) = 0.
Notation:I dij - discrete dataI (·, ·) - inner productI εij - noiseI m - model functionI qj, pi - source / receiver function, respectivelyI uj - field, aka stateI C- PDE constraint (linear or nonlinear)
Goal: Reconstruct m
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 47
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Overview: How to Code Your Forward Problem
jInv is easy to extend. To use your own forward problem in jInv you need to1. Define a ForwardProbType that describes your problem.2. Write getData that solves forward problem: d(m) = F(σ(m),p,q, ω)
3. Write getSensTMatVec that computes J(d(m))> w4. Write getSensMatVec that computes J(d(m)) v
jInv helps doing this byI providing discrete differential operators ∇,∇×,∇· on different meshesI tools for derivative and adjoint testing
Benefits: Use jInv’s inversion tools (misfit, regularization, optimization, parallelcomputing, meshes, . . . ).
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 48
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Step 1: Creating ForwardProbType for Your ProblemExample: Consider DC Resistivity problem
∇ · (σ(m)∇uj) = qj on Ω with ∇uj · n = 0 on ∂Ω
Idea: Describe problem by using a type (abstraction, multiple dispatch).
1 type DivSigGradParam <: ForwardProbType2 Mesh::AbstractMesh # mesh3 Sources::SparseMatrixCSC # sources4 Receivers::SparseMatrixCSC # receivers5 Fields::ArrayFloat64 # store fields6 Ainv::AbstractSolver # PDE solver7 end
Comments:line 3+4: sources and receivers can also be dense
line 5: typically empty upon initializationline 6: chooses / configures PDE solver
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 49
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Solve Forward Problem
1 function getData(m::Vector,pFor::DivSigGradParam)2 pFor.Ainv.doClear=1 # clear factorization/preconditioner34 A = getDivSigGradMatrix(m,pFor.Mesh)5 pFor.Fields,pFor.Ainv = solveLinearSystem(A,6 pFor.Sources,pFor.Ainv)7 D = pFor.Receivers’*pFor.Fields89 pFor.Ainv.doClear=0 # keep factorization/preconditioner
10 return D, pFor11 end
Comments:
line 2 assume model has changed clear intermediatesline 4 pFor.Mesh is flexible as long as PDE can be discretized on mesh typeline 5 solveLinearSystem allows changing PDE solver via pFor.Ainv
line 9 keep those results in memory to accelerate sensitivity computations
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 50
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Step 3: Compute Matvecs with SensitivityRecall:
Jj(σ) = −Aj(σ)−1∇σ (Aj(σ)uj) with ∇σ (Aj(σ)uj) = G>diag(Guj)A>v V.
1 function getSensMatVec(x::Vector,m::Vector,2 pFor::DivSigGradParam)34 A = getDivSigGradMatrix(m,pFor.Mesh)5 G = getNodalGradientMatrix(pFor.Mesh)6 Ae = getEdgeAverageMatrix(pFor.Mesh)7 V = getVolume(pFor.Mesh)89 Z = G’*(sdiag(Ae’*(V*x))*(G*pFor.Fields))
10 Z, = solveLinearSystem(A,Z,pFor.Ainv)11 Jv = -pFor.Receivers’*Z12 return vec(Jv)13 end
Comment (line 9): note that we might have multiple fields
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 51
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
(Some) Advanced Topics
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 52
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Costs of an Inversion
toy problem Marine EM# sources 256 106 - 108
# frequencies 3 3# receivers 1024 106 - 108
GN iterations 20 20PCG iterations 5 5
line search steps 1 1PDE solves ≈ 180, 000 ≈ 8 · 1011
Too big to be solved? Maybe not if we. . .I reduce costs per PDE solve model order reductionI reduce number of PDE solves stochastic optimizationI reduce computation time parallel / distributed computing
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 53
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Mesh Decoupling
OcTree for one forward problem OcTree for inverse problem
Thus, the objective function becomes
Φ(m) =
N∑j,k
D(p>k A(Pj σ(m))−1qj,djk) + R(m),
where Pj is an interpolation matrix (fine mesh to coarse mesh).E. Haber and C. SchwarzbachParallel inversion of large-scale airborne time-domain EM data with multiple OcTree meshes.Inverse Problems, 30, 28 p
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 54
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Mesh Decoupling in jInv
Mesh decoupling: Use (fine) inverse mesh for m and (locally refined) meshes forcomputing fields, uj.
Assume:I Minv: fine mesh for inversionI pFor[k].M: mesh for kth forward problem
Generate interpolation matrix
1 for k=1:length(pFor) # loop over all forward problems2 Mesh2Mesh[k] = getInterpolationMatrix(Minv,pFor[k].M)3 end
Compute data (in parallel)
1 Dobs,pFor = getData(sigma,pFor,Mesh2Mesh)
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 55
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Step 1: Prepare forward problems on workers
jInv allows for automatic or user-defined scheduling. Here: Automatic.
Main Process
d1,q1,p1, ω1Ref[1]
d2,q2,p2, ω2Ref[2]
d3,q3,p3, ω3Ref[3]
Worker 1 Worker 2
d1,q1,p1, ω1prepare pFor[1]
d3,q3,p3, ω3
prepare pFor[3]d3, pFor[3] RemoteRef
d1, pFor[1]d2,q2,p2, ω2
prepare pFor[2]
RemoteRef
d2, pFor[2]
1. Send data 1 to worker 1 and data 2 to worker 22. Get remote reference from worker 1 and send problem 33. Get remote reference from worker 1 and 2
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 56
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Computing Misfit with two Workers
Assume N = 3 and two workers.
Main Process
m, dj,Rj, . . .
D = 0
∇D = 0
D = D1
∇D = ∇D1
D = D1 + D2
∇D = ∇D1 +∇D2
Worker 1 Worker 2
(m, d1,R1, . . .)computing D1
(m, d3,R3, . . .)
computing D3D1(m),∇D1(m)
”A−11 ”
(m, d2,R2, . . .)computing D2
D2(m),∇D2
”A−12 ”
watch the time
1. Send problem 1 to worker 1 and problem 2 to worker 22. Get result from worker 1, update and send problem 33. Get result from worker 2, update and set timer4. Wait for worker 1 for some time, then interrupt
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 57
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Simple Parallelization using Multiple DispatchExample: DC Resistivity problem with 10 sources.
Option 1: Use single pFor for sequential computation:
1 pFor = DivSigGradParam(Mesh,Sources,Receivers,[],Ainv)
Option 2: Split up sources and use ArrayDivSigGradParam for parallelization:
1 pFor1 = DivSigGradParam(Mesh,Sources[1:5],Receivers,[],Ainv)2 pFor2 = DivSigGradParam(Mesh,Sources[6:10],Receivers,[],Ainv)3 pForp = [pFor1; pFor2]
Option 3: Distribute forward problems a priori to reduce communication:
1 pFord = ArrayRemoteRefChannelAny(2)2 pFord[1] = @spawnat processor1 identity(pFor1)3 pFord[2] = @spawnat processor2 identity(pFor2)
Then: getData(m,pFor) = getData(m,pForp) = getData(m,pFord)
jInv uses multiple dispatch to find correct method of getData
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 58
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Weak Scaling: DC Resistivity Forward Problem
Test environment:I julia-0.4.6 on AWS (c4.large)I 50 instances, each 2 virtual coresI 3.75GB RAM
Test problem:I mesh size: 48× 48× 24I 10 sources per workerI Direct solver: MUMPS.jlI Iterative solvers: cg and blockCG
from KrylovMethods.jl 10 20 30 4050
60
70
80
90
100
number of workersef
ficie
ncy
in%
MUMPSBlock PCGPCGideal
Almost optimal scalability on distributed memory system
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 59
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Summary
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 60
c© Lars RuthottoPDE-Constrained Optimization
Doktorandenkolleg, Weißensee 2016
Σ: Numerical Methods for PDE Constrained OptimizationPDE Constrained Optimization is highly interdisciplinary:
I (geo)physicist: modeling, measurement design, interpret/assess results,. . .I PDE people: analyze well-posedness, regularization, numerical solvers,. . .I linear algebraist: solve linear PDEs, help optimizer,. . .I optimizers: efficient sampling, globalization strategies,. . .I computer scientist: software engineering, parallel implementation,. . .
Learning ObjectivesI introduction to optimization with PDE constraintsI overview about some terminology and fundamental ideasI shed light into the ”black box” optimization approachI give pointers to software and literature for future studyI setup and solve a PDE parameter estimation problem
(Some) Advanced TopicsI model order reduction (multiscale, adaptive discretization, . . . )I multiphysics inversion (exploit multimodality)I parallel computingI all-at-once methodsI statistical inversion stochastic PDEsI . . .
Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 61