Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Iterative Methods for Linear Systems

Eran Treister

Computer Science Department,Ben-Gurion University of the Negev,

Israel.

March 31, 2019

1 / 24

Iterative methods for linear systems

Definition

An iterative method is defined as

x(k+1) = φ(x(k)),

which simply “looks” only at one previous vector or

x(k+1) = φ(x(k), ..., x(0)),

designed to solveAx = b

2 / 24

Iterative methods for linear systems

Definition

An iterative method is defined as

x(k+1) = φ(x(k)),

which simply “looks” only at one previous vector or

x(k+1) = φ(x(k), ..., x(0)),

designed to solveAx = b

Initialize with an arbitrary guess x(0).

Iteratively improve this guess until the solution of the linearsystem is achieved up to some accuracy.

Usually applied when direct methods are too expensive orimpossible to use.

3 / 24

Requirements for Iterative methods

The first requirement from φ is that it converges:

limk→∞{x(k)} = x∗

where Ax∗ = b

The second requirement is that the method converges as fastas possible. We define a convergence rate to be

limk→∞

‖x(k+1) − x∗‖‖x(k) − x∗‖p

= C ,

where p is the order of convergence and C is called theconvergence factor

4 / 24

Error and Residual

Definition (Error vector)

The vectore(k) = x∗ − x(k)

is called the error vector at iteration k .

For convergence, it should hold limk→∞ e(k) = 0

Definition (Residual vector)

The vectorr(k) = b− Ax(k) = Ae(k)

is called the residual vector.

5 / 24

Error and Residual

Definition (Error vector)

e(k) = x∗ − x(k)

Definition (Residual vector)

r(k) = b− Ax(k) = Ae(k)

The key difference between the two is that we cannotmeasure the error without knowing the solution, but wecan measure the residual. Note that convergence means

limk→∞{e(k)} = lim

k→∞{r(k)} = 0.

6 / 24

Simple iterative methods

Split the matrix A into two: A = M + N. Then the linearsystem is written as

Mx + Nx = b,

The iteration is defined as:

x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),

7 / 24

Simple iterative methods

Split the matrix A into two: A = M + N. Then the linearsystem is written as

Mx + Nx = b,

The iteration is defined as:

x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),

Remark

M (called the preconditioner) is ”inverted” every iteration.

The cost of the solution is naturally comprised from thenumber of iterations times the work needed to “invert” M.

8 / 24

Practical stopping conditions

We usually stop iterating if one of the following is satisfied forsome tolerance ε:

‖Ax(k) − b‖‖b‖

< ε or‖x(k) − x(k−1)‖‖x(k)‖

< ε.

The left term indicates that the residual is low enoughcompared to a zero solution

The second criterion indicates that the relative change in theiterations is small enough.

9 / 24

General Iterative Method

Input: A ∈ Rn×n, b ∈ Rn, x(0) ∈ Rn, M,N ∈ Rn×n,maxIter , ε, Convergence criterion Output: x s.t

Ax ≈ b

k = 1, ...,maxIter Apply iteration:x(k) = M−1(b− Nx(k−1)) orx(k) = x(k−1) + M−1(b− Ax(k−1)),

If ‖Ax(k)−b‖‖b‖ < ε or alternatively ‖x(k)−x(k−1)‖

‖x(k)‖ < ε.

Convergence is reached, stop the iterations.

Return x(k) as the solution.

10 / 24

The Jacobi method

Example

Assume that we need to solve Ax = b:4x1 − x2 + x3 = 74x1 − 8x2 + x3 = −21−2x1 + x2 + 5x3 = 15

(1)

Rewrite: 4 −1 14 −8 1−2 1 5

× x1

x2x3

=

7−2115

(2)

11 / 24

The Jacobi method

Example

Assume that we need to solve Ax = b: 4 −1 14 −8 1−2 1 5

× x1

x2x3

=

7−2115

(3)

A is a diagonal dominant, so it can be approximated well by adiagonal matrix. Let us split the matrix:

A = D+L+U =

4−8

5

+

4−2 1

+

−1 11

,

12 / 24

The Jacobi method

Example

A = D+L+U =

4−8

5

+

4−2 1

+

−1 11

,Choosing M = D,the method then becomes (in matrix form):

x(k+1) = D−1(b− (L + U)x(k)) = x(k) + D−1(b− Ax(k)). (4)

In our example this will be: x(k+1)1

x(k+1)2

x(k+1)3

=

14(7 + x

(k)2 − x

(k)3 )

18(21 + 4x

(k)1 + x

(k)3 )

15(15 + 2x

(k)1 − x

(k)2 )

13 / 24

The Jacobi method

Running the iterations from a guess x (0) = [1, 2, 2] yields:Iter: 0: [1.0, 2.0, 2.0]

Iter: 1: [1.75, 3.375, 3.0]

Iter: 2: [1.84375, 3.875, 3.025]

Iter: 3: [1.9625, 3.925, 2.9625]

Iter: 4: [1.99063, 3.97656, 3.0]

Iter: 5: [1.99414, 3.99531, 3.00094]

Iter: 6: [1.99859, 3.99719, 2.99859]

Iter: 7: [1.99965, 3.99912, 3.0]

Iter: 8: [1.99978, 3.99982, 3.00004]

Iter: 9: [1.99995, 3.99989, 2.99995]

14 / 24

The Jacobi method

Figure: The residual and error history norm for the Jacobi iterations. Notethe logarithmic scale of the y axis, when plotting convergence history.

15 / 24

The Gauss-Seidel method

The GS method is achieved by the split:

(L + D)x = b− Ux⇒ (L + D)x(k+1) = b− Ux(k),

Choosing M = L + D, each iteration reads

x(k+1) = (L+D)−1(b− Ux(k)

)= x(k)+(L+D)−1

(b− Ax(k)

).

In scalar form, the method is given by

x(k+1)i =

1

aii

bi −∑j<i

aijx(k+1)j −

∑j>i

aijx(k)j

, i = 1, ..., n.

16 / 24

The Gauss-Seidel method

GS Iteration:

x(k+1) = (L + D)−1(b− Ux(k)

)= x(k) + (L + D)−1

(b− Ax(k)

).

Example x(k+1)1

x(k+1)2

x(k+1)3

=

14(7 + x

(k)2 − x

(k)3 )

18(21 + 4x

(k+1)1 + x

(k)3 )

15(15 + 2x

(k+1)1 − x

(k+1)2 )

Convergence is much faster than the Jacobi method:Iter: 0: [1.0, 2.0, 2.0]

Iter: 1: [1.75, 3.75, 2.95]

Iter: 2: [1.95, 3.96875, 2.98625]

Iter: 3: [1.99562, 3.99609, 2.99903]

Iter: 4: [1.99927, 3.99951, 2.9998]

Iter: 5: [1.99993, 3.99994, 2.99998]

Iter: 6: [1.99999, 3.99999, 3.0]

Iter: 7: [2.0, 4.0, 3.0]17 / 24

Convergence of Iterative methods

We saw the general problem:

x(k+1) = x(k) + M−1(b− Ax(k)).

The error at (k+1)-th iteration:

e(k+1) = x∗ − x(k+1) = x∗ − x(k) −M−1(Ax∗ − Ax(k)).

The iteration matrix for the error is given by

e(k+1) = (I −M−1A)︸︷︷︸T

e(k).

18 / 24


Assuming T is diagonaizable with eigenpairs (λi , vi ), ande(0) =

∑ni=1 αivi :

e(k+1) = T k+1e(0) = T k+1n∑

i=1

αivi =n∑

i=1

αiλk+1i vi

The error e(k+1) will go to 0 (as k →∞) only if the largesteigenvalue in magnitude is smaller than 1.

Recall: the largest eigenvalue in magnitude is defined as thespectral radius.

19 / 24


Theorem

Given Ax = b where A is invertible,the general iteration

x(k+1) = x(k) + M−1(b− Ax(k))

converges for any starting vector x(0) if and only if

ρ(I −M−1A) < 1.

This spectral radius is also the convergence factor of the iteration.That is, for every vector norm

limk→∞

‖e(k+1)‖‖e(k)‖

= ρ(I −M−1A).

20 / 24

Checking Convergence

Remark

Spectral radius is hard to compute. Therefore, we often try to usematrix norms to check convergence, since any matrix norm upperbounds the spectral radius. That is,

‖I −M−1A‖ < 1⇒ ρ(I −M−1A) < 1,

and if we found a norm for which ‖I −M−1A‖ < 1, then ourmethod converges.

Example

In the previous examples, the error iteration matrix is: 0 14

−14

48 0 1

825−15 0

⇒ ‖T‖∞ =5

8< 1.

21 / 24

Practical Convergence test

Definition (Strictly diagonally dominant matrices (SDD))

A matrix A is strictly diagonally dominant in rows if for every row i

|aii | >∑j 6=i

|aij |

.

Theorem

If the matrix A is strictly diagonally dominant in rows, then bothJacobi and Gauss Seidel methods converge.

22 / 24

The variational meaning of GS

Consider the following problem:

f (x) =1

2‖x− x∗‖2A =

1

2x>Ax− x>b +

1

2(x∗)>b,

where A is positive definite.

To minimize f (x), we require ∇f (x) = 0 and get a linearsystem Ax = b

In GS, we zero the residual ri for each i, given the other x’s.

The residuals are basically the equations of the gradient.Thus, for each i we require ∂f

∂xi= 0, thus ∇f (x (k)) = 0

Corollary

The updates of Gauss-Seidel for each xi are equivalent tominimizing f (x)

23 / 24

variational GS

Example (Variational property of Gauss Seidel)

Consider the following linear system:

A =

[2 11 3

], b =

[34

].

It is easy to show that f (x) = x21 + x1x2 + 1.5x22 − 3x1 − 4x2. Thecondition ∇f = 0 in this case is

∂f

∂x1= 2x1 + x2 − 3 = 0 (5)

∂f

∂x2= x1 + 3x2 − 4 = 0 (6)

(7)

24 / 24

Documents

Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each