Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Iterative Methods for Linear Systems
Eran Treister
Computer Science Department,Ben-Gurion University of the Negev,
Israel.
March 31, 2019
1 / 24
Iterative methods for linear systems
Definition
An iterative method is defined as
x(k+1) = φ(x(k)),
which simply “looks” only at one previous vector or
x(k+1) = φ(x(k), ..., x(0)),
designed to solveAx = b
2 / 24
Iterative methods for linear systems
Definition
An iterative method is defined as
x(k+1) = φ(x(k)),
which simply “looks” only at one previous vector or
x(k+1) = φ(x(k), ..., x(0)),
designed to solveAx = b
Initialize with an arbitrary guess x(0).
Iteratively improve this guess until the solution of the linearsystem is achieved up to some accuracy.
Usually applied when direct methods are too expensive orimpossible to use.
3 / 24
Requirements for Iterative methods
The first requirement from φ is that it converges:
limk→∞{x(k)} = x∗
where Ax∗ = b
The second requirement is that the method converges as fastas possible. We define a convergence rate to be
limk→∞
‖x(k+1) − x∗‖‖x(k) − x∗‖p
= C ,
where p is the order of convergence and C is called theconvergence factor
4 / 24
Error and Residual
Definition (Error vector)
The vectore(k) = x∗ − x(k)
is called the error vector at iteration k .
For convergence, it should hold limk→∞ e(k) = 0
Definition (Residual vector)
The vectorr(k) = b− Ax(k) = Ae(k)
is called the residual vector.
5 / 24
Error and Residual
Definition (Error vector)
e(k) = x∗ − x(k)
Definition (Residual vector)
r(k) = b− Ax(k) = Ae(k)
The key difference between the two is that we cannotmeasure the error without knowing the solution, but wecan measure the residual. Note that convergence means
limk→∞{e(k)} = lim
k→∞{r(k)} = 0.
6 / 24
Simple iterative methods
Split the matrix A into two: A = M + N. Then the linearsystem is written as
Mx + Nx = b,
The iteration is defined as:
x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),
7 / 24
Simple iterative methods
Split the matrix A into two: A = M + N. Then the linearsystem is written as
Mx + Nx = b,
The iteration is defined as:
x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),
Remark
M (called the preconditioner) is ”inverted” every iteration.
The cost of the solution is naturally comprised from thenumber of iterations times the work needed to “invert” M.
8 / 24
Practical stopping conditions
We usually stop iterating if one of the following is satisfied forsome tolerance ε:
‖Ax(k) − b‖‖b‖
< ε or‖x(k) − x(k−1)‖‖x(k)‖
< ε.
The left term indicates that the residual is low enoughcompared to a zero solution
The second criterion indicates that the relative change in theiterations is small enough.
9 / 24
General Iterative Method
Input: A ∈ Rn×n, b ∈ Rn, x(0) ∈ Rn, M,N ∈ Rn×n,maxIter , ε, Convergence criterion Output: x s.t
Ax ≈ b
k = 1, ...,maxIter Apply iteration:x(k) = M−1(b− Nx(k−1)) orx(k) = x(k−1) + M−1(b− Ax(k−1)),
If ‖Ax(k)−b‖‖b‖ < ε or alternatively ‖x(k)−x(k−1)‖
‖x(k)‖ < ε.
Convergence is reached, stop the iterations.
Return x(k) as the solution.
10 / 24
The Jacobi method
Example
Assume that we need to solve Ax = b:4x1 − x2 + x3 = 74x1 − 8x2 + x3 = −21−2x1 + x2 + 5x3 = 15
(1)
Rewrite: 4 −1 14 −8 1−2 1 5
× x1
x2x3
=
7−2115
(2)
11 / 24
The Jacobi method
Example
Assume that we need to solve Ax = b: 4 −1 14 −8 1−2 1 5
× x1
x2x3
=
7−2115
(3)
A is a diagonal dominant, so it can be approximated well by adiagonal matrix. Let us split the matrix:
A = D+L+U =
4−8
5
+
4−2 1
+
−1 11
,
12 / 24
The Jacobi method
Example
A = D+L+U =
4−8
5
+
4−2 1
+
−1 11
,Choosing M = D,the method then becomes (in matrix form):
x(k+1) = D−1(b− (L + U)x(k)) = x(k) + D−1(b− Ax(k)). (4)
In our example this will be: x(k+1)1
x(k+1)2
x(k+1)3
=
14(7 + x
(k)2 − x
(k)3 )
18(21 + 4x
(k)1 + x
(k)3 )
15(15 + 2x
(k)1 − x
(k)2 )
13 / 24
The Jacobi method
Running the iterations from a guess x (0) = [1, 2, 2] yields:Iter: 0: [1.0, 2.0, 2.0]
Iter: 1: [1.75, 3.375, 3.0]
Iter: 2: [1.84375, 3.875, 3.025]
Iter: 3: [1.9625, 3.925, 2.9625]
Iter: 4: [1.99063, 3.97656, 3.0]
Iter: 5: [1.99414, 3.99531, 3.00094]
Iter: 6: [1.99859, 3.99719, 2.99859]
Iter: 7: [1.99965, 3.99912, 3.0]
Iter: 8: [1.99978, 3.99982, 3.00004]
Iter: 9: [1.99995, 3.99989, 2.99995]
14 / 24
The Jacobi method
Figure: The residual and error history norm for the Jacobi iterations. Notethe logarithmic scale of the y axis, when plotting convergence history.
15 / 24
The Gauss-Seidel method
The GS method is achieved by the split:
(L + D)x = b− Ux⇒ (L + D)x(k+1) = b− Ux(k),
Choosing M = L + D, each iteration reads
x(k+1) = (L+D)−1(b− Ux(k)
)= x(k)+(L+D)−1
(b− Ax(k)
).
In scalar form, the method is given by
x(k+1)i =
1
aii
bi −∑j<i
aijx(k+1)j −
∑j>i
aijx(k)j
, i = 1, ..., n.
16 / 24
The Gauss-Seidel method
GS Iteration:
x(k+1) = (L + D)−1(b− Ux(k)
)= x(k) + (L + D)−1
(b− Ax(k)
).
Example x(k+1)1
x(k+1)2
x(k+1)3
=
14(7 + x
(k)2 − x
(k)3 )
18(21 + 4x
(k+1)1 + x
(k)3 )
15(15 + 2x
(k+1)1 − x
(k+1)2 )
Convergence is much faster than the Jacobi method:Iter: 0: [1.0, 2.0, 2.0]
Iter: 1: [1.75, 3.75, 2.95]
Iter: 2: [1.95, 3.96875, 2.98625]
Iter: 3: [1.99562, 3.99609, 2.99903]
Iter: 4: [1.99927, 3.99951, 2.9998]
Iter: 5: [1.99993, 3.99994, 2.99998]
Iter: 6: [1.99999, 3.99999, 3.0]
Iter: 7: [2.0, 4.0, 3.0]17 / 24
Convergence of Iterative methods
We saw the general problem:
x(k+1) = x(k) + M−1(b− Ax(k)).
The error at (k+1)-th iteration:
e(k+1) = x∗ − x(k+1) = x∗ − x(k) −M−1(Ax∗ − Ax(k)).
The iteration matrix for the error is given by
e(k+1) = (I −M−1A)︸ ︷︷ ︸T
e(k).
18 / 24
Convergence of Iterative methods
Assuming T is diagonaizable with eigenpairs (λi , vi ), ande(0) =
∑ni=1 αivi :
e(k+1) = T k+1e(0) = T k+1n∑
i=1
αivi =n∑
i=1
αiλk+1i vi
The error e(k+1) will go to 0 (as k →∞) only if the largesteigenvalue in magnitude is smaller than 1.
Recall: the largest eigenvalue in magnitude is defined as thespectral radius.
19 / 24
Convergence of Iterative methods
Theorem
Given Ax = b where A is invertible,the general iteration
x(k+1) = x(k) + M−1(b− Ax(k))
converges for any starting vector x(0) if and only if
ρ(I −M−1A) < 1.
This spectral radius is also the convergence factor of the iteration.That is, for every vector norm
limk→∞
‖e(k+1)‖‖e(k)‖
= ρ(I −M−1A).
20 / 24
Checking Convergence
Remark
Spectral radius is hard to compute. Therefore, we often try to usematrix norms to check convergence, since any matrix norm upperbounds the spectral radius. That is,
‖I −M−1A‖ < 1⇒ ρ(I −M−1A) < 1,
and if we found a norm for which ‖I −M−1A‖ < 1, then ourmethod converges.
Example
In the previous examples, the error iteration matrix is: 0 14
−14
48 0 1
825−15 0
⇒ ‖T‖∞ =5
8< 1.
21 / 24
Practical Convergence test
Definition (Strictly diagonally dominant matrices (SDD))
A matrix A is strictly diagonally dominant in rows if for every row i
|aii | >∑j 6=i
|aij |
.
Theorem
If the matrix A is strictly diagonally dominant in rows, then bothJacobi and Gauss Seidel methods converge.
22 / 24
The variational meaning of GS
Consider the following problem:
f (x) =1
2‖x− x∗‖2A =
1
2x>Ax− x>b +
1
2(x∗)>b,
where A is positive definite.
To minimize f (x), we require ∇f (x) = 0 and get a linearsystem Ax = b
In GS, we zero the residual ri for each i, given the other x’s.
The residuals are basically the equations of the gradient.Thus, for each i we require ∂f
∂xi= 0, thus ∇f (x (k)) = 0
Corollary
The updates of Gauss-Seidel for each xi are equivalent tominimizing f (x)
23 / 24
variational GS
Example (Variational property of Gauss Seidel)
Consider the following linear system:
A =
[2 11 3
], b =
[34
].
It is easy to show that f (x) = x21 + x1x2 + 1.5x22 − 3x1 − 4x2. Thecondition ∇f = 0 in this case is
∂f
∂x1= 2x1 + x2 − 3 = 0 (5)
∂f
∂x2= x1 + 3x2 − 4 = 0 (6)
(7)
24 / 24