Numerical Methods for PDE-Constrained Optimization ...lruthot/courses/PDECO/2016... · PDE-Constrained Optimization Doktorandenkolleg, Weißensee 2016 ... PDE-Constrained Optimization

c© Lars RuthottoPDE-Constrained Optimization

Doktorandenkolleg, Weißensee 2016

Numerical Methods forPDE-Constrained Optimization


Lars RuthottoDepartment of Mathematics and Computer ScienceEmory University

[email protected]

Title Intro fwd NumLinAlg Optim Sens jInv Adv Σ 1



Thank You

I Monika Wolfmayr and Marie-Therese Wolfram for kind invitation, coveringtravel expenses

I Doktorandenkolleg for picking this great place, organizing, shuttle,accommodation,. . .

I Dr. Ettinger and Dr. Carr for teaching my classesI Participants




Motivation and Introduction




Outline and Assumptions

About meI 2012: PhD, in Munster (advisors: Martin Burger, Jan Modersitzki)I 2013-14: PostDoc at UBC, Vancouver (advisor: Eldad Haber)I since 2014: Assistant professor at Emory University, Atlanta, GA

My assumptions on youI solid background on PDEs, variational calculusI interested and experiences in PDE modeling and applicationsI basic knowledge about numerical linear algebra and optimizationI some programming skills

Learning objectivesI introduction to optimization with PDE constraintsI overview about some terminology and fundamental ideasI shed light into the ”black box” optimization approachI give pointers to software and literature for future studyI setup and solve a PDE parameter estimation problem

course website: http://tinyurl.com/PDECO


http://tinyurl.com/PDECO



PDE Constrained Optimization Problem

aka Optimal Control

minu,m

Φ(u,m) subject to C(u,m) = 0,

whereI m - control (what we can modify to minimize the objective)I u - state (the response of the PDE system)I Φ - objective functionI C - PDE constraint. Examples:

I Poisson’s: C(u,m) = ∇ · (σ(m)∇u)− q = 0I Advection: C(u,m) = ∂tu(x, t) + m(x, t)∇u(x, t) = 0 for all t > 0 and u(x, 0) = u0I Eikonal Eq.: C(u,m) = ‖∇u(x)‖2

2 − m(x) = 0 with u(x0) = 0 for some x0

(Some) challenges in PDE Constrained Optimization:I ensure sound interplay numerical PDEs, linear algebra, optimizationI problem is infinite dimensional both in m and C




Reduced Formulation

aka ”black box” or simulation based approachAssume, u is uniquely defined through constraint (boundary conditions).

Use Implicit Function Theorem and define u(m) where

u(m) where C(u(m),m) = 0.

Use this to eliminate PDE constraint and solve

minm

Φ(u(m),m)

unconstrained problem PDE solved in each iteration need to computesensitivity matrix J = ∇mu(m) no need to store fields and Lagrange multipliers




Course OutlineGoal: Given receivers pi, sources qj and data dij find model by solving reducedformulation of a PDE-constrained problem

minm

∑i,j

‖p>i uj(m)− dij‖2 + α‖Lm‖2

where α > 0 and L are given and uj(x) solves Poisson’s equation with coefficientsσ(m) = exp(m)

∇ · (σ(m)∇uj) = qj on Ω with ∇uj · n = 0 on ∂Ω

Outline:I Discretizing the Forward ProblemI Crash Course on Numerical Linear AlgebraI Crash Course on Numerical OptimizationI Sensitivity ComputationI Implementation and TestingI Outlook and Perspectives




Direct Current Resistivity (Geophysics)

Goal: Estimate conductivity of subsurface.

u1, q1 u2, q2 u3, q3

Let Ω ⊂ R3. For given source qj the potential field uj can be simulated via ∇ · (σ(m)∇uj(x)) = qj(x) for x ∈ Ω~n∇uj(x) = 0 for x ∈ ∂Ω

uj(0) = 1.

E. Haber. Computational Methods inGeophysical Electromagnetics.SIAM, Philadelphia, 2014.

L. Ruthotto, E. Treister, E. Haber. jInv - AFlexible Julia Package for PDE ParameterEstimation.preprint available at arxiv, 2016.




Some References for This Course

PDE-Constrained Optimization

E. Haber.Computational methods in geophysicalelectromagnetics.SIAM, 2015.

A. Borzı and V. Schulz.Computational optimization of systems governed bypartial differential equations.SIAM, 2012.

R. Herzog and K. Kunisch.Algorithms for PDE-constrained optimization.GAMM-Mitteilungen, 2010.

M. Hinze, R. Pinnau, M. Ulbrich, S. Ulbrich.Optimization with PDE Constraints.Springer, 2009.

Numerical Optimization

A. Beck.Introduction to Nonlinear Optimization.SIAM, 2014.

J. Nocedal and S. Wright.Numerical Optimization.Springer, 2009.

Numerical Linear Algebra

C. Greif and U. AscherA First Course on Numerical Methods.SIAM, 2011.

Y. Saad.Iterative Methods for Sparse Linear Systems.SIAM, 2003.

Notes: small (subjective selection) ask me for more reading!




Discretizing the Forward Problem




Discretizing Poisson’s Equation

regular mesh partial derivatives

u(xi, yj)

u(xi+1, yj)

u(xi, yj+1)

u(xi+1, yj+1)

∂hx u(xi +

h2 , yj) ∂h

x u(xi +h2 , yj+1)

∂hy u(xi+1, yj +

h2 )

∂hy u(xi, yj +

h2 )

I m, σ - model and conductivity discretized in cell-centersI uj - fields discretized in nodesI approximate partial derivatives on edges

∂hx u(

xi +h2, yj

)≈ 1

h(u(xi+1, yj)− u(xi, yj))




Julia: Building the Gradient

Represent ∇ = [∂x, ∂y, ∂z] as sparse matrix.

The 1D finite difference operator is

1 ddx(n,h) = spdiagm((-1/h*ones(n),1/h*ones(n)), (0,1),n, n+1)

Testing it:

1 n = 512;2 D = ddx(n,1/n);3 x = linspace(0,1,n+1);4 xc = x[1:end-1]+diff(x)/2;5 plot(xc,2*pi*cos(2*pi*xc),xc,D*sin(2*pi*x))

Homework: change n and see how accurate you get




Coding ∇u = (∂xu, ∂yu) in 2D

Example: Regular 2× 2 grid. D is the 1D derivative matrix, U is the 2D fieldreshaped into a matrix.

Ux ≈ 1h

(−1 1 00 −1 1

)= D U I= D U I= D U I= D U I

Similarly

Uy ≈ 1h

−1 01 −10 1

= I U D>

Use Kronecker product rule

AUB> = (B⊗ A)vec(U)

to get Ux ≈ (I⊗ D) vec(U) and Uy ≈ (D⊗ I) vec(U).




Coding ∇u = [∂xu, ∂yu] in 2/3D

Use it to code the gradient

1 function getNodalGradient(n,h)23 # setup the operator in 1D4 ddx(n,h) = spdiagm((-1/h*ones(n),1/h*ones(n)), (0,1),n, n+1)56 # compute 1D matrices7 d1 = ddx(n[1],h[1]); d2 = ddx(n[2],h[2]);8 Grad = [kron(speye(n[2]+1),d1); kron(d2,speye(n[1]+1))]9

10 return Grad11 end

Since ∇· = (∂x ∂y) = ∇> we use

1 Div = Grad’2 Lap = Grad’*Grad




Discretizing the PDE in 2DGoal: Discretize Poisson’s equation in 2D

∇ · (σ(m)∇u) = q.

Writing the equation in weak form gives

⇒ −(σ(m)∇u,∇p)L2 = (q, p)L2 for all p.

Terms in (·, ·)L2 are discretized on staggered grid.Example: ∂xu is cell-centered in x and nodal in y direction.

Consider 1D inner product (σf , e)L2 where σ is discretizes in cell-centers and f , e arediscretized in nodes.

Two options to discretize inner product1. (σf , e)L2 ≈ h

4

∑nk=1 σk+1/2(fk + fk+1)(ek + ek+1)

2. (σf , e)L2 ≈ h4

∑nk=1 σk+1/2(fkek + fk+1ek+1)

First option introduces artificial nullspace! Take f = (1,−1, 1,−1, . . . , 1,−1)>




Discretizing Inner Product in 1DGoal: Discretize (σf , e)L2 for σ discretized in cell-centers and f , e discretized innodes.

(σf , e)L2 ≈h4

n∑k=1

σk+1/2(fkek + fk+1ek+1) = (h σ)>A(f e) = f>diag(A>(h σ)

)e.

Here,I is the (component-wise) Hadamard productI A is a node-to-center average matrix

A =12

1 1

1 1. . . . . .

1 1

∈ Rn×n+1

In Julia, build this matrix by

av(n) = spdiagm((fill(.5,n),fill(.5,n)),(0,1),n,n+1)




Discretizing Poisson’s Equation in 3D

Recall: ∇ · (exp(m)∇u) = q G>diag(A>e (v exp(m)))G u = q

STEP 1: Build average matrices for each component usingKronecker products and stack them

1 I(i) = speye(Mesh.n[i])2 A1 = kron(I(3),kron(I(2),av(Mesh.n[1])))3 A2 = kron(I(3),kron(av(Mesh.n[2]),I(1)))4 A3 = kron(av(Mesh.n[3]),kron(I(2),I(1)))5 Ae = [A1 A2 A3]

STEP 2: Combine with gradient approximation to getdiscretized PDE

1 G = getNodalGradient(Mesh)2 V = getVolume(Mesh)3 A = G’*sdiag( Ae’*(V*sigma)))*G4 A[1,1] += h[1] # fix one point

2D sketch:




Example: Discretization of Differential OperatorsJulia’s multiple dispatch simplifies comparison of different discretizations.

regular mesh tensor mesh

For regular mesh

using jInv.MeshM = getRegularMesh(domain,n)Div = getDivergenceMatrix(M)Grad = getNodalGradientMatrix(M)Curl = getCurlMatrix(M)Ae = getEdgeAverageMatrix(M)

For tensor mesh

using jInv.MeshM = getTensorMesh3D(h1,h2,h3)Div = getDivergenceMatrix(M)Grad = getNodalGradientMatrix(M)Curl = getCurlMatrix(M)Ae = getEdgeAverageMatrix(M)




Σ : Discretizing the Forward Problem

Finite volume discretization:

∇ · (σ(m)∇u) = q G>diag(A>e (v exp(m)))G u = q ⇔ A(σ(m)) u = q

I u - field, discretized on nodesI m - model, discretized on cell-centers (could be different mesh)I A - discretized PDEI solving PDE solving sparse linear systemI size of linear system depends on number of nodes in meshI modularity w.r.t. meshI next: PDE solvers




Crash Course: Numerical LinearAlgebra




NLA Challenges in PDE Constrained Optimization

Goal: Given discrete receivers pk, sources qj and data djk find model by solvingreduced formulation of a PDE-constrained problem

minm

∑i,j

‖p>i A(σ(m))−1qj − djk‖2 + α‖Lm‖2,

where α > 0 and L are given and A is our discretized PDE.I Linear PDE solvers

I Linear PDEs large sparse linear systemI potentially many different linear systems involved (frequencies, meshes, . . . )I coefficients can be very inhomogeneous (Ex: water, air, salt,. . . )I depending on size / computing power use iterative or direct solver

I Numerical optimization (will be discussed later)I Gauss Newton requires solving real symmetric positive semidefinite linear systemI size of linear system equals size of model can be largeI (approximate) Hessian often not formed explicitly use iterative method




Condition Number

Let A ∈ Rn×n be SPD. Then the Schur decomposition is

A = UΣU>,

with UU> = U>U = In and Σ = diag(λ1, . . . , λn).

The condition number is defined as

cond(A) =λmax(A)

λmin(A)

−2 −1 0 1 2−2

−1

0

1

2

−2 −1 0 1 2−2

−1

0

1

2




Positive (semi)-definiteness - 1

Let A ∈ Rn×n be symmetric and consider the quadratic form

f (x) = x>Ax.

Then A isI positive definite if f (x) > 0 for all x 6= 0I positive semi-definite if f (x) ≥ 0 for all xI negative definite if f (x) < 0 for all x 6= 0I indefinite if ∃x, y ∈ Rn : f (x) > 0 and f (y) < 0.




Positive (semi)-definiteness - 2

−2 −1 0 1 2−2

−1

0

1

2

1

1

1

2

2

2

2

3

3

3

3

4

4

5

5

6

6

7

7

−2 −1 0 1 2−2

−1

0

1

2

1

1

2

2

3

3

4

4

5

5

5

5

66

6

6

7

7

7

−2 −1 0 1 2−2

−1

0

1

2

2

2

4

4

6

6

8

8

10

10

12

12

14

14

−2 −1 0 1 2−2

−1

0

1

2

−14

−14

−12

−12

−10

−10

−8

−8

−6

−6

−4

−4

−2

−200

2

2

4

4

6

6




Matrix Factorizations

Let A ∈ Rn×n be nonsingular.I LU-Factorization

PA = LU,

with P permutation, L unit lower triangular, U upper triangular. Cost ≈ 2n3/3.I Cholesky (LU for A SPD)

PTAP = LLT .

(symmetric) permutations can be used to keep sparsity of L. Cost ≈ n3/3.I QR-Factorization

AP = QR,

Q is orthogonal. Cost ≈ 4n3/3.I Singular Value Decomposition

A = U[

S0

]VT ,

U ∈ Rm×m and V ∈ Rn×n orthogonal, S ∈ Rn×n diagonal.




Conjugate Gradient MethodLet A ∈ Rn×n be SPSD and b ∈ Rn be given. Solve

minx

12

x>Ax− b>x ⇔ Ax∗ = b

Algorithm 1 CG1: set r = b,p = r2: γ0 = ‖p‖2, x = 03: for i = 1, 2, . . . do4: w = Ap5: α = γi−1/(p>w)6: x = x + αp7: r = r− αw8: γi = ‖r‖2

9: p = r + (γi/γi−1)p10: end for

exact solution in n steps often fast convergence only needs Ap inpractice requires preconditioner




Preconditioned Conjugate Gradient MethodConvergence of CG depends on spectral properties.Idea:

Ax = b ⇔ M−1Ax = M−1 b,

where M is a preconditioner that satisfies M−1A ≈ I.

Some common examples:I Jacobi-Preconditioner

M−1 = D−1, where D = diag(A)

I SSOR Preconditioner

M = L−1DU−1,

L,U lower and upper triangular part of A, resp.I Incomplete Cholesky M−1 ≈ A−1 (sparse

approximation)

Example: 2D Poisson

exPCGforPoisson.ipynb




Crash Course: NumericalOptimization




Course Outline


minm

∑j,k

‖p>k A(σ(m))−1qj − djk‖2 + α‖Lm‖2

where α > 0 and L are given and A is our discretized PDE.





Line Search Methods: Motivation

Let Φ : Rn → R be smooth and consider

minm∈Rn

f (m).

Recall: Necessary condition is ∇Φ(m) = 0 (n equations / n unknowns).

If n = 1 we can simply plot Φ or Φ′ on some interval [a, b] and ”see” the minimum.

If n 1 plotting is no option. Idea: Start with some m0 ∈ Rn and for k > 1 do1. Pick a good direction pk (descent direction)2. Solve αk = arg minα Φ(mk + αpk) (line search)3. Update mk+1 = mk + αkpk

until tired or ‖∇Φ(mk)‖ small enough.




Line Search Problem

objective function linesearch

−1.5 −1 −0.5 0 0.5

−0.4

−0.2

0

0.2

0.4

0 0.2 0.4 0.6 0.83

4

5

6

Let Φ : Rn → R smooth be given and pk be a descent direction at mk. The linesearch problem is:

minα>0

Φ(mk + αpk).




Backtracking

Armijo’s condition may not be sufficient (steps might be too small).Wolfe’s condition require derivative of ϕ(α) = Φ(m + αpk).Idea: Avoid too small steps by backtracked Armijo:

1 function armijo(f,fc,df,xc,pc;maxIter=10,2 alpha=1,c1=.01,b=.5)3 LS = 1;4 while LS<=maxIter5 if f(xc+alpha*pc)<=fc + alpha*c1*dot(df,pc); break; end6 alpha *= b; LS += 17 end8 if LS>maxIter9 LS= -1; alpha = 0.0

10 end11 return alpha,LS12 end

alpha=1 works for damped Newton. Different initial value of alpha and carefulupdate of b required for other methods.




Steepest Descent Method

Iteration:mk+1 = mk + αpk with pk = − ∇Φk

‖∇Φk‖.

Interpretation: Minimizer of first-order Taylor approximation

minp

Φ(mk) + p>∇Φk subject to ‖p‖2 = 1.

Why? Drop constant term in objective and apply Cauchy-Schwartz:

p>∇Φk = ‖p‖‖∇Φk‖ cos(θ) = ‖∇Φk‖ cos(θ).

Easy to see that this is minimal at pk when

cos(θ) = −1 pk = −∇Φk/‖∇Φk‖.

simple robust linesearch can be tricky often very slow.




Gradient FlowIdea: To solve, ∇Φ(m) = 0, we consider time-dependent problem

∂tm = −∇Φ(m)

with some starting guess m0. Steady state gives ∇Φ = 0.Example: Convergent for Φ(m) = 1

2 m>Am + b>m when A SPD since

∂tm = b− Am⇒ m(t) = exp(−tA)(m0 − A−1b).

Discretizing ODE using Forward Euler, we getmk+1 −mk

h= −∇Φ(mk)⇔ mk+1 = mk − h∇Φ(mk)

Stability of forward Euler requires

|1− λmax(A)h| ≤ 1⇒ h ≤ 2|λmax|

.

Gradient flow is a slow version of steepest descent, since in general

α∗ =‖Am− b‖2

(Am− b)>A(Am− b)≥ 2|λmax|

.




Example: Steepest Descent for Quadratic ObjectiveLet A ∈ Rn×n be SPD and b ∈ Rn. Consider

minm

Φ(m) with Φ(m) =12

m>Am−m>b.

The solution is, of course, m∗ = A−1b.Steepest descent direction is p = −∇Φ = b− Am and forexact line search, solve

minα

12

(m + αp)>A(m + αp)− (m + αp)>b.

The necessary optimality condition is

αp>Ap− p>(Am− b) = 0⇒ α∗ =p>∇Φ(m)

p>Ap.

Global linear convergence with factor

1− λmin(A)

λmax(A)= 1− cond(A)−1

Example 1:

Example 2:




Example: Solving Poisson’s Equations with SD

Consider: ∆u = b on rectangular domain withu(1, x2) = u(0, x2) = u(x1, 1) = u(x1, 0) = 0.

For given h > 0 consider optimization problem

minu

12

u>∆hu− u>b.

Solver: Steepest Descent with exact linesearch.

Check out exPoisson2D.ipynb.

SD convergence is ”mesh dependent” (that is number of iterations to achievegiven tolerance depends on mesh size)




Newton’s MethodIteration:

mk+1 = mk + pk with pk = −∇2Φ−1k ∇Φk

Interpretation 1: Minimizer of second-order Taylor expansion

pk = argminp Φk + p>∇Φk +12

p>∇2Φkp.

Interpretation 2: First-order approximation of optimality condition

0 = ∇Φ(mk+1) ≈ ∇Φ(mk) +∇2Φ(mk)pk +O(‖pk‖2) .

DiscussionI fast convergence if Φ smooth around mk or close to minimumI iterates are the same for Φ and g(m) = βΦ(m) for any β > 0.I not robust: extra work required if ∇2Φk is not SPDI need to solve linear system in each iteration




Example: Himmelblau

minx

Φ(m) = f (m2

1 + m2 − 11)2 + (m1 + m22 − 7)2

Local minimization: Different methods different solutions




Nonlinear Least-Squares: Gauss NewtonConsider the non-linear least-squares problem

minm

12‖r(m)‖2

,

for smooth residual r : Rn → Rm. Linearizing r around mk gives

r(mk + p) = r(mk) + J(mk)p +O(‖p‖2).

This gives a linear least-squares problem for the step

mins‖J(mk)p + r(mk)‖2 ⇒ J(mk)

>J(mk)p∗ = −J(mk)>r(mk).

Compare to Newton stepJ(mk)>J(mk) +

m∑j=1

rj(mk)∇2rj(mk)

pnt = −J(mk)>r(mk).

J>J always SPSD globally convergent (some assumptions) goodconvergence when residual small or problem almost linear




Σ : Numerical OptimizationWe covered the basics on line search methods.

Steepest Descent (similar for other first order methods)I easy, cheap iterations, and globally convergentI line search can be trickyI ill-conditioning of Hessian affects convergenceI convergence for PDE problems mesh dependentI often very slow convergence (runtime and iterations)

(Gauss) Newton MethodsI (approximately) mesh independent convergenceI robust against ill-conditioningI no Hessian modification needed for Gauss NewtonI costly iterations: need to solve linear system in each iterationI large scale: inexact / matrix-free Newton methods

PDE Constrained OptimizationI recall: reduced formulation requires one PDE solve per iteration / line searchI use method with few iterations such as (Gauss)-NewtonI main challenge: solve approximated Newton system where J not available as

matrix PCG!




Sensitivity Calculations




Course Outline


minm

∑j,k

‖p>k A(σ(m))−1qj − djk‖2 + α‖Lm‖2

where α > 0 and L are given and A is our discretized PDE.





Computing SensitivitiesKey ingredient for optimization are sensitivity matrices Jj ∈ Rnj×n that satisfy

uj(m + v) ≈ uj(m) + Jj(m)v,

for any small perturbation v ∈ Rn.

Compute Ji by differentiating both sides of constraint with respect to m:

0 =∇m (Cj(m,ujk(m)))

= ∇mCj(m,uj) +∇uj Cj(m,uj)Jj(m)

Ji(m) = − (∇uj Cj(m,uj))−1 ∇mCj(m,uj).

Example: For linear PDE constraint, we have

Jj(m) = −Aj(m)−1∇m (Aj(m)uij) .

I building Jj can be very expensive (dense, many solves required)I matrix-free implementation: provide v 7→ Jj(m)v and w 7→ w>Jj(m)

I option to keep temp results (factorizations, preconditioners, fields)




Computing Sensitivities - 2

For linear PDE constraint, we have

Jj(m) = −Aj(m)−1∇m (Aj(m)uj) .

Now focus on ∇m (Aj(m)uj) where (with V = diag(v))

Aj(m)uj =G>diag(A>Vσ(m)))Guj

= G>diag(Guj)A>v Vσ(m).

Therefore, in our case

∇m (Aj(m)uj) = G>diag(Guj)A>v V∇mσ.

Some comments:I in our example ∇mσ = diag(exp(m))

I forming Jj requires n PDE solves where m ∈ Rn

I multiplication with Jj or J>j requires one(!) PDE solve




Programmer’s Note: Check Derivative

See checkDerivative.ipynb.




Solving the Inverse Problem




Discrete Inverse ProblemAssume

dij = (pi,uj(m)) + εij for i, j,

where for given m,qj, the field ui(m) is defined implicitly by

C(m,qj,uj(m)) = 0.

Notation:I dij - discrete dataI (·, ·) - inner productI εij - noiseI m - model functionI qj, pi - source / receiver function, respectivelyI uj - field, aka stateI C- PDE constraint (linear or nonlinear)

Goal: Reconstruct m




Overview: How to Code Your Forward Problem

jInv is easy to extend. To use your own forward problem in jInv you need to1. Define a ForwardProbType that describes your problem.2. Write getData that solves forward problem: d(m) = F(σ(m),p,q, ω)

3. Write getSensTMatVec that computes J(d(m))> w4. Write getSensMatVec that computes J(d(m)) v

jInv helps doing this byI providing discrete differential operators ∇,∇×,∇· on different meshesI tools for derivative and adjoint testing

Benefits: Use jInv’s inversion tools (misfit, regularization, optimization, parallelcomputing, meshes, . . . ).




Step 1: Creating ForwardProbType for Your ProblemExample: Consider DC Resistivity problem

∇ · (σ(m)∇uj) = qj on Ω with ∇uj · n = 0 on ∂Ω

Idea: Describe problem by using a type (abstraction, multiple dispatch).

1 type DivSigGradParam <: ForwardProbType2 Mesh::AbstractMesh # mesh3 Sources::SparseMatrixCSC # sources4 Receivers::SparseMatrixCSC # receivers5 Fields::ArrayFloat64 # store fields6 Ainv::AbstractSolver # PDE solver7 end

Comments:line 3+4: sources and receivers can also be dense

line 5: typically empty upon initializationline 6: chooses / configures PDE solver




Solve Forward Problem

1 function getData(m::Vector,pFor::DivSigGradParam)2 pFor.Ainv.doClear=1 # clear factorization/preconditioner34 A = getDivSigGradMatrix(m,pFor.Mesh)5 pFor.Fields,pFor.Ainv = solveLinearSystem(A,6 pFor.Sources,pFor.Ainv)7 D = pFor.Receivers’*pFor.Fields89 pFor.Ainv.doClear=0 # keep factorization/preconditioner

10 return D, pFor11 end

Comments:

line 2 assume model has changed clear intermediatesline 4 pFor.Mesh is flexible as long as PDE can be discretized on mesh typeline 5 solveLinearSystem allows changing PDE solver via pFor.Ainv

line 9 keep those results in memory to accelerate sensitivity computations




Step 3: Compute Matvecs with SensitivityRecall:

Jj(σ) = −Aj(σ)−1∇σ (Aj(σ)uj) with ∇σ (Aj(σ)uj) = G>diag(Guj)A>v V.

1 function getSensMatVec(x::Vector,m::Vector,2 pFor::DivSigGradParam)34 A = getDivSigGradMatrix(m,pFor.Mesh)5 G = getNodalGradientMatrix(pFor.Mesh)6 Ae = getEdgeAverageMatrix(pFor.Mesh)7 V = getVolume(pFor.Mesh)89 Z = G’*(sdiag(Ae’*(V*x))*(G*pFor.Fields))

10 Z, = solveLinearSystem(A,Z,pFor.Ainv)11 Jv = -pFor.Receivers’*Z12 return vec(Jv)13 end

Comment (line 9): note that we might have multiple fields




(Some) Advanced Topics




Costs of an Inversion

toy problem Marine EM# sources 256 106 - 108

# frequencies 3 3# receivers 1024 106 - 108

GN iterations 20 20PCG iterations 5 5

line search steps 1 1PDE solves ≈ 180, 000 ≈ 8 · 1011

Too big to be solved? Maybe not if we. . .I reduce costs per PDE solve model order reductionI reduce number of PDE solves stochastic optimizationI reduce computation time parallel / distributed computing




Mesh Decoupling

OcTree for one forward problem OcTree for inverse problem

Thus, the objective function becomes

Φ(m) =

N∑j,k

D(p>k A(Pj σ(m))−1qj,djk) + R(m),

where Pj is an interpolation matrix (fine mesh to coarse mesh).E. Haber and C. SchwarzbachParallel inversion of large-scale airborne time-domain EM data with multiple OcTree meshes.Inverse Problems, 30, 28 p




Mesh Decoupling in jInv

Mesh decoupling: Use (fine) inverse mesh for m and (locally refined) meshes forcomputing fields, uj.

Assume:I Minv: fine mesh for inversionI pFor[k].M: mesh for kth forward problem

Generate interpolation matrix

1 for k=1:length(pFor) # loop over all forward problems2 Mesh2Mesh[k] = getInterpolationMatrix(Minv,pFor[k].M)3 end

Compute data (in parallel)

1 Dobs,pFor = getData(sigma,pFor,Mesh2Mesh)




Step 1: Prepare forward problems on workers

jInv allows for automatic or user-defined scheduling. Here: Automatic.

Main Process

d1,q1,p1, ω1Ref[1]

d2,q2,p2, ω2Ref[2]

d3,q3,p3, ω3Ref[3]

Worker 1 Worker 2

d1,q1,p1, ω1prepare pFor[1]

d3,q3,p3, ω3

prepare pFor[3]d3, pFor[3] RemoteRef

d1, pFor[1]d2,q2,p2, ω2

prepare pFor[2]

RemoteRef

d2, pFor[2]

1. Send data 1 to worker 1 and data 2 to worker 22. Get remote reference from worker 1 and send problem 33. Get remote reference from worker 1 and 2




Computing Misfit with two Workers

Assume N = 3 and two workers.

Main Process

m, dj,Rj, . . .

D = 0

∇D = 0

D = D1

∇D = ∇D1

D = D1 + D2

∇D = ∇D1 +∇D2

Worker 1 Worker 2

(m, d1,R1, . . .)computing D1

(m, d3,R3, . . .)

computing D3D1(m),∇D1(m)

”A−11 ”

(m, d2,R2, . . .)computing D2

D2(m),∇D2

”A−12 ”

watch the time

1. Send problem 1 to worker 1 and problem 2 to worker 22. Get result from worker 1, update and send problem 33. Get result from worker 2, update and set timer4. Wait for worker 1 for some time, then interrupt




Simple Parallelization using Multiple DispatchExample: DC Resistivity problem with 10 sources.

Option 1: Use single pFor for sequential computation:

1 pFor = DivSigGradParam(Mesh,Sources,Receivers,[],Ainv)

Option 2: Split up sources and use ArrayDivSigGradParam for parallelization:

1 pFor1 = DivSigGradParam(Mesh,Sources[1:5],Receivers,[],Ainv)2 pFor2 = DivSigGradParam(Mesh,Sources[6:10],Receivers,[],Ainv)3 pForp = [pFor1; pFor2]

Option 3: Distribute forward problems a priori to reduce communication:

1 pFord = ArrayRemoteRefChannelAny(2)2 pFord[1] = @spawnat processor1 identity(pFor1)3 pFord[2] = @spawnat processor2 identity(pFor2)

Then: getData(m,pFor) = getData(m,pForp) = getData(m,pFord)

jInv uses multiple dispatch to find correct method of getData




Weak Scaling: DC Resistivity Forward Problem

Test environment:I julia-0.4.6 on AWS (c4.large)I 50 instances, each 2 virtual coresI 3.75GB RAM

Test problem:I mesh size: 48× 48× 24I 10 sources per workerI Direct solver: MUMPS.jlI Iterative solvers: cg and blockCG

from KrylovMethods.jl 10 20 30 4050

60

70

80

90

100

number of workersef

ficie

ncy

in%

MUMPSBlock PCGPCGideal

Almost optimal scalability on distributed memory system




Summary




Σ: Numerical Methods for PDE Constrained OptimizationPDE Constrained Optimization is highly interdisciplinary:

I (geo)physicist: modeling, measurement design, interpret/assess results,. . .I PDE people: analyze well-posedness, regularization, numerical solvers,. . .I linear algebraist: solve linear PDEs, help optimizer,. . .I optimizers: efficient sampling, globalization strategies,. . .I computer scientist: software engineering, parallel implementation,. . .

Learning ObjectivesI introduction to optimization with PDE constraintsI overview about some terminology and fundamental ideasI shed light into the ”black box” optimization approachI give pointers to software and literature for future studyI setup and solve a PDE parameter estimation problem

(Some) Advanced TopicsI model order reduction (multiscale, adaptive discretization, . . . )I multiphysics inversion (exploit multimodality)I parallel computingI all-at-once methodsI statistical inversion stochastic PDEsI . . .


Documents

Numerical Methods for PDE-Constrained Optimization ...lruthot/courses/PDECO/2016... · PDE-Constrained Optimization Doktorandenkolleg, Weißensee 2016 ... PDE-Constrained Optimization