Krylov 3 Biksm

8/12/2019 Krylov 3 Biksm

1/10


2/10

2 Martin H. Gutknecht

of them can also be used as (xed) preconditioners, in which case they areknown as polynomial preconditioners ; but in this function they are reportedlynot competitive. Flexible preconditioning allows us to apply any Krylov spacesolver as a preconditioner of another Krylov space solver. Such combinations,

called inner-outer iteration methods, may be very effective; see, e.g., [14].Krylov space methods for solving Ax = b have the special feature that theN N matrix A needs only be given as an operator: for any N -vector y onemust be able to compute Ay ; so A may be given as a function (or procedureor subroutine). We refer to the operation Ay as matrix-vector product (MV) .In practice, it may be much more complicated than the multiplication of avector by a sparse matrix; e.g., this operation may include the application of a preconditioner, which may also require the solution of a large linear system.

1 From Jacobi iteration to Krylov space methods

The simplest iterative method is Jacobi iteration . It is the same as diagonally

preconditioned xed point iteration : if the diagonal matrix D with the diagonalof A is nonsingular, we can transform Ax = b into

x = Bx + b with B := I D 1 A , b := D

1 b (1)

and apply the xed point iteration x n +1 := Bx n + b .It is easy to show by considering the powers of the Jordan canonical formof B that the following convergence result is valid for Jacobi iteration:x n x for every x 0 ( B ) < 1 , (2)where x := A 1 b and ( B ) := max {|| eigenvalue of B} is the spectral radius of

B . From the suggested proof of (2) one actually sees that the as-

ymptotic root convergence factor is bounded by (

B ) (if less than 1), and that

this bound is sharp:

limsupn

xn x 1 /n ( B ) , (3)and equality holds for some x 0 .

Unfortunately, even the simplest examples of boundary value problems,like u = f on (0, 1) with u(0) = u(1) = 0, discretized by a standard nitedifference method, show that Jacobi iteration may converge extremely slowly.

Unless we know the solution x we cannot compute the nth error (vector)

d n := xn x . (4)Thus, for checking the convergence we normally use the nth residual (vector)

r n := b Ax n . (5)Note that r n = A (x n x ) = Ad n .


3/10

Brief Introduction to Krylov Space Solvers 3

Assuming D = I and letting B := I A we haver n = b Ax n = Bx n + b x n = xn +1 x n ,

so we can rewrite the Jacobi iteration as

xn +1 := xn + r n . (6)

Multiplying this by A , we obtain a recursion for the residual:r n +1 := r n Ar n = Br n . (7)

So we can compute rn either according to denition (5) or by the recursion(7); either way it requires one MV. From (7) it follows by induction that

r n = pn (A )r 0 span {r 0 , Ar 0 , . . . , An r 0 }, (8)where pn ( ) = (1 )n is a polynomial of exact degree n . From (6) we have

x n = x 0 + r 0 + + r n 1 = x 0 + q n 1 (A )r 0 (9)with a polynomial q n 1 of exact degree n 1. So, x n lies in the affine spacex 0 + span r 0 , . . . , An 1 r 0 obtained by shifting the subspace of r n 1 .Computing q n 1 (A )r 0 and pn (A )r 0 requires n + 1 MVs, because we need

to build up the subspace span r 0 , Ar 0 , . . . , An 1 r 0 . This is the main work.We may ask: Is there a better choice for xn in the same affine space?

The subspace that appears in (8) and (9) is what we call a Krylov space:

Denition 1. Given a nonsingular A C N N and y = o C N , the nth Krylov (sub)space Kn (A , y ) generated by A from y is Kn := Kn (A , y ) := span (y , Ay , . . . , An

1 y). (10)

Clearly, K1 K2 K3 ... , and the dimension increases at most by onein each step. But when does the equal sign hold? Moreover, it seems clever tochoose the nth approximate solution xn in x 0 + Kn (A , r 0 ). But can we expect to nd the exact solution x in one of those affine space? These questions areanswered in the following lemmas and corollaries that we cite without proof.

Lemma 1. There is a positive integer := (y , A ), called grade of y with respect to A, such that

dim Kn (A , y ) =n if n , if n .

Corollary 1. K (A , y ) is the smallest A invariant subspace that contains y .

Lemma 2. The nonnegative integer of Lemma 1 satises = min n A 1 yKn (A , y ) .


4/10


Corollary 2. Let x be the solution of Ax = b and let x0 be any initial approximation of it and r 0 := b Ax 0 the corresponding residual. Moreover,let := (r 0 , A ). Then

x x 0 + K (A , r 0 ) .The idea behind Krylov space solvers is to generate a sequence of approx-

imate solutions xn x0 + Kn (A , r 0 ) of Ax = b, so that the correspondingresiduals r n Kn +1 (A , r 0 ) converge to the zero vector o. Here, convergemay also mean that after a nite number of steps, rn = o, so that xn = xand the process stops. This is in particular true (in exact arithmetic) if amethod ensures that the residuals are linearly independent: then r = o. Inthis case we say that the method has the nite termination property.

It is not so easy to give a formal denition of Krylov space solvers thatcovers all relevant cases and is not too general in the sense that it no longergrasps some relevant aspects of Krylov space solvers. Here is our proposal:

Denition 2. A (standard) Krylov space method for solving a linear system Ax = b or, briey, a Krylov space solver is an iterative method starting from some initial approximation x0 and the corresponding residual r 0 := b Ax 0 and generating for all, or at least most n , until it possibly nds the exact solution, iterates x n such that

x n x 0 = q n 1 (A )r 0 Kn (A , r 0 ) (11)with a polynomial q n 1 of exact degree n 1. For some n, xn may not exist or q n 1 may have lower degree.

A similar statement can be made for the residuals, if they exist:

Lemma 3. The residuals rn of a Krylov space solver satisfy

r n = pn (A )r 0 r 0 + AKn (A , r 0 )Kn +1 (A , r 0 ) , (12)where pn is a polynomial of degree n , called n th residual polynomial, which is related to the polynomial q n 1 of (11) by

pn ( ) = 1 q n 1 ( ) . (13)In particular, it satises the consistency condition pn (0) = 1 .

The vague expression for all, or at least most n in Denition 2 is neededbecause in some widely used Krylov space solvers (e.g., BiCG ) there may existexceptional situations, where for some n the iterate xn and the residual rnare not dened. In other Krylov space solvers (e.g., CR ), there may be indiceswhere x n exists, but the polynomial q n 1 is of lower degree than n

1.

There are also nonstandard Krylov space methods where the search spacefor x n x 0 is still a Krylov space but one that differs from Kn (A , r 0 ).


5/10


When applied to large real-world problems Krylov space solvers often con-verge very slowly if at all. In practice, Krylov space solvers are thereforenearly always applied with preconditioning : Ax = b is replaced by

CA Ax = Cb b

or AC AC

1

x x= b or C L AC R A

C 1

R x x= C L b b

.

The rst two cases are referred to as left and right preconditioning , respec-tively, while in the last case we apply a split preconditioner C L C R . Actually,here C and C L C R are approximate inverses of A . Often we use instead of Ca preconditioner M A with the property that the system My = z is easilysolved for any z. Then, in the above formulas we have to replace C by M 1and C L , C R by M

1L , M

1R . Applying a preconditioned Krylov space solver

just means to apply the method to A x = b .2 The conjugate gradient method

The conjugate gradient (CG ) method is due to Hestenes and Stiefel [7]. Itshould only be applied to systems that are symmetric positive denite (spd)or Hermitian positive denite (Hpd), and it is still the method of choice forthis case. We will assume real data here.

CG is the archetype of a Krylov space solver that is an orthogonal projec-tion method and satises a minimality condition: the error is minimal in theso-called energy norm or A -norm of the error vector d := x x ,

d A = d T Ad .In nature, stable states are characterized by minimum energy. Discretiza-

tion leads to the minimization of a quadratic function:

(x) := 12 xT Ax b

T x + (14)

with an spd matrix A . is convex and has a unique minimum. Its gradientis (x) = Ax b = r , where r is the residual corresponding to x . Hence,

x minimizer of (x) = o Ax = b . (15)If x denotes the minimizer and d := x x the error vector, and if we choose := 12 b T A 1 b , it is easily seen that

d 2A = x x 2A = Ax b 2A 1 = r 2A 1 = 2 (x) . (16)

In summary: If A is spd, to minimize the quadratic function means tominimize the energy norm of the error vector of the linear system Ax = b.The minimizer x is the solution of Ax = b .


6/10


The above discussion suggest to nd the minimizer of by descending onthe surface representing by following the direction of steepest descent. If we took innitely many innitesimal steps, we would nd the minimum byfollowing a curved line. However, each determination of the gradient requires

an MV, and therefore we need to take long steps and follow a piecewise straightline. In each step we go to the lowest point in the chosen descent direction.Since A is spd, the level curves (x) = const are just concentric ellipses if

N = 2 and concentric ellipsoids if N = 3. As can be seen from a sketch, even for a 22 system many steps may be needed to get close to the solution.We can do much better: by choosing the second direction v1 conjugate or A-orthogonal to v0 , the rst one, i.e., so that vT1 Av 0 = 0, we nd theminimum in at most two steps, because for ellipses, any radius vector and thecorresponding tangent direction are conjugate to each other.

How does this generalize to N dimensions, and what can be said aboutthe intermediate results? We choose search directions or direction vectors vnthat are conjugate (A orthogonal) to each other:

vT

n Av k = 0 , k = 0 , . . . , n 1, (17)and dene

xn +1 := xn + v n n , (18)

so that r n +1 = r n Av n n . Here, n is again chosen such that the A -normof the error is minimized on the line x n + v n . This means thatn := r n , v nvn , Av n . (19)

Denition 3. Any iterative method satisfying (17), (18), and (19) is called a conjugate direction (CD ) method .

By denition, such a method chooses the step length n

so that xn

+1 islocally optimal on the search line. But does it also yield the best

x n +1 x 0 + span {v 0 , . . . , vn } (20)with respect to the A -norm of the error? By verifying that

(xn +1 ) = (xn ) n vT

n r 0 + 12

2n v

T

n Av n .

one can prove that this is indeed the case [3]:

Theorem 1. For a conjugate direction method the problem of minimizing the energy norm of the error of an approximate solution of the form (20) decouples into n +1 one-dimensional minimization problems on the lines x k + vk ,k = 0 , 1, . . . , n . Therefore, a conjugate direction method yields after n +1 steps the approximate solution of the form (20) that minimizes the energy norm of the error in this affine space.


7/10


In general, conjugate direction methods are not Krylov space solvers, butwith suitably chosen search directions they are. Since

x n +1 = x 0 + v 0 0 + + v n n x 0 + span {v 0 , v 1 , . . . , vn } ,we need that

span {v 0 , . . . , v n }= Kn +1 (A , r 0 ) , n = 0 , 1, 2, . . . . (21)Denition 4. The conjugate gradient (CG ) method is the conjugate di-rection method with the choice (21).

Theorem 1 yields now the main result on CG :

Theorem 2. The CG method yields approximate solutions xn x 0 + Kn (A , r 0 )that are optimal in the sense that they minimize the energy norm (A -norm )of the error (i.e., the A 1 -norm of the residual ) for x n from this affine space.

Associated with this minimality is the Galerkin condition

Kn r n Kn +1 , (22)which implies that the residuals {r n }

1n =0 are orthogonal to each other and

form an orthogonal basis of K . On the other hand, by assumption, the searchdirections {v n } 1

n =0 form a conjugate basis of K .Here is a formulation of the standard version of CG due to Hestenes andStiefel, which is sometimes called OrthoMin or OMin version now.

Algorithm 1 ( OMin form of the CG method) For solving Ax = b let x 0 be an initial approximation, and let v0 := r 0 := b Ax 0 and 0 := r 0 2 .Then, for n = 0 , 1, 2, . . . , compute

n := vn2A , (23a)

n := n / n , (23b)x n +1 := xn + v n n , (23c)r n +1 := r n Av n n , (23d) n +1 := r n +1 2 , (23e)

n := n +1 / n , (23f)v n +1 := r n +1 vn n . (23g)

If r n +1 tol , the algorithm terminates and x n +1 is a sufficiently accurate approximation of the solution.The conjugate residual (CR ) method is dened in analogy to the CG

method, but the 2-norm is replaced by the A-norm. So what is minimizedis now the A 2 -norm of the error, which is the 2-norm of the residual. Here,

the residuals {r n } 1

n =0 form an A-orthogonal basis of K and the search di-rections {vn } 1

n =0 form a A2 -orthogonal basis of K . The challenge is to ndrecurrences so that still only one MV is needed per iteration.


8/10


3 Methods for nonsymmetric systems

Solving nonsymmetric (or non-Hermitian) linear systems iteratively withKrylov space solvers is considerably more difficult and costly than symmetric

(or Hermitian) systems. There are two different ways to generalize CG :

Maintain the orthogonality of the projection and the related minimalityof the error by constructing either orthogonal residuals r n ( generalized CG (GCG )) or AT A -orthogonal search directions v n ( generalized CR(GCR )). Then, the recursions involve all previously constructed residuals or search directions and all previously constructed iterates.

Maintain short recurrence formulas for residuals, direction vectors anditerates ( biconjugate gradient (BiCG ) method, Lanczos-type product methods (LTPM )). The resulting methods are at best oblique projection methods . There is no minimality property of error or residuals vectors.

3.1 The biconjugate gradient ( BiCG ) method

While CG (for spd A ) has mutually orthogonal residuals rn with

r n = pn (A )r 0 span {r 0 , Ar 0 , . . . , A n r 0 }=: Kn +1 (A , r 0 ) ,BiCG constructs in the same spaces residuals that are orthogonal to a dualKrylov space spanned by shadow residuals

r n = pn (AT )r 0 span r 0 , A

T r 0 , . . . , (AT )n r 0 =: Kn +1 (AT , r 0 ) =: Kn +1 .The initial shadow residual r 0 can be chosen freely. So, BiCG requires twoMVs to extend Kn and Kn : one multiplication by A and one by A

T . But thereare still short recurrences for xn , rn , and r n . Now there are two Galerkinconditions

Kn

r n

Kn +1 ,

Kn

r n

Kn +1 ,

but only the rst one is relevant for determining x n .The residuals {r n }mn =0 and the shadow residuals {r n }mn =0 form biorthogonal

bases or dual bases of Km +1 and Km +1 :r m , r n =

0 if m = n, n = 0 if m = n.

The search directions {vn }mn =0 and the shadow search directions {vn }mn =0form biconjugate bases of Km +1 and Km +1 :v m , Av n =

0 if m = n, n = 0 if m = n.

BiCG goes back to Lanczos [8], but was brought to its current, CG -like form later. For a detailed discussion of versions and difficulties such asbreakdowns and a possibly somewhat erratic convergence see [6].


9/10


3.2 Lanczos-type product methods (LTPMs)

Sonneveld [17] found with the (bi)conjugate gradient squared method (CGS )a way to replace the multiplication with AT by a second one with A . The nthresidual polynomial of CGS is p2n , where pn is still the nth BiCG residualpolynomial. In each step the dimension of the Krylov space and the searchspace increases by two. Convergence is nearly twice as fast as for BiCG , buteven more erratic.

BiCGStab by Van der Vorst [18] includes some local optimization andsmoothing, and therefore the residual norm histories tend to be much smoother.The n th residual polynomial is pn tn , where now tn satises the recursion

tn +1 ( ) = (1 n +1 )tn ( )with n +1 chosen by residual minimization. However, in BiCGStab all zerosof tn are real (if A , b are real). If A has a complex spectrum, it is better tochoose two possibly complex new zeros in every other iteration. This is theidea behind BiCGStab 2 [5]. Further generalizations of BiCGStab includeBiCGStab ( ) by Sleijpen and Fokkema [15], [16] and GPBI -CG by Zhang[19]; see [6] for references to yet other proposals.

In view of the form of the residual polynomials, this family of methods hasbeen called Lanczos-type product methods . These are often the most efficientsolvers. They have short recurrences, they are typically about twice as fast asBiCG , and they do not require AT . Unlike in GMRes , the memory neededdoes not increase with the iteration index n.

3.3 Solving the system in coordinate space: MinRes , SymmLQ ,GMRes , and QMR

There is yet another class of Krylov space solvers, which includes well-knownmethods like MinRes , SymmLQ , GMRes , and QMR . It was pioneered byPaige and Saunders [10]. Their approach was later adapted to more generalcases by Saad and Schultz [12] and Freund and Nachtigal [2].

The basic idea is to successively construct a basis of the Krylov space bycombining the extension of the space with Gram-Schmidt orthogonalization orbiorthogonalization, and to update at each iteration the approximate solutionof Ax = b in coordinate space. There are essentially three cases:

symmetric Lanczos process MinRes , SymmLQ [10] nonsymmetric Lanczos process QMR [2] Arnoldi process GMRes [12]3.4 Further reading

For further study we suggest recent books on Krylov space solvers such as

those of Greenbaum [4], Meurant [9], and Saad [11], as well as the reviewarticle by Simoncini and Szyld [13], which also covers developments of thelast ten years. Nonsymmetric Lanczos-type solvers were reviewed in [6].


10/10


References

1. J. Dongarra and F. Sullivan. Guest editors introduction to the top 10 algo-rithms. Computing in Science and Engineering , 2(1):2223, 2000.

2. R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residual method fornon-Hermitian linear systems. Numer. Math. , 60:315339, 1991.3. G. H. Golub and C. F. van Loan. Matrix Computations . Johns Hopkins Uni-

versity Press, Baltimore, MD, 3nd edition, 1996.4. A. Greenbaum. Iterative Methods for Solving Linear Systems . SIAM, Philadel-

phia, PA, 1997.5. M. H. Gutknecht. Variants of BiCGStab for matrices with complex spectrum.

SIAM J. Sci. Comput. , 14:10201033, 1993.6. M. H. Gutknecht. Lanczos-type solvers for nonsymmetric linear systems of

equations. Acta Numerica , 6:271397, 1997.7. M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear

systems. J. Res. Nat. Bureau Standards , 49:409435, 1952.8. C. Lanczos. Solution of systems of linear equations by minimized iterations. J.

Res. Nat. Bureau Standards , 49:3353, 1952.

9. G. Meurant. Computer solution of large linear systems , volume 28 of Studies in Mathematics and its Applications . North-Holland, Amsterdam, 1999.10. C. C. Paige and M. A. Saunders. Solution of sparse indenite systems of linear

equations. SIAM J. Numer. Anal. , 12:617629, 1975.11. Y. Saad. Iterative Methods for Sparse Linear Systems . SIAM, Philadelphia,

2nd. edition, 2003.12. Y. Saad and M. H. Schultz. Conjugate gradient-like algorithms for solving

nonsymmetric linear systems. Math. Comp. , 44:417424, 1985.13. V. Simoncini and D. B. Szyld. Recent developments in krylov subspace methods

for linear systems. Numer. Linear Algebra Appl. To appear.14. V. Simoncini and D. B. Szyld. Flexible inner-outer Krylov subspace methods.

SIAM J. Numer. Anal. , 40(6):22192239 (electronic) (2003), 2002.15. G. L. G. Sleijpen and D. R. Fokkema. BiCGstab( l ) for linear equations involving

unsymmetric matrices with complex spectrum. Electronic Trans. Numer. Anal. ,

1:1132, 1993.16. G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema. BiCGstab( l ) andother hybrid Bi-CG methods. Numerical Algorithms , 7:75109, 1994.

17. P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems.SIAM J. Sci. Statist. Comput. , 10:3652, 1989.

18. H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Statist.Comput. , 13:631644, 1992.

19. S.-L. Zhang. GPBI-CG: generalized product-type methods based on Bi-CG forsolving nonsymmetric linear systems. SIAM J. Sci. Comput. , 18(2):537551,1997.

Documents

Krylov 3 Biksm