Upload
marshadjafer
View
214
Download
0
Embed Size (px)
DESCRIPTION
Set5
Citation preview
'&
$
%
Set 5: Linear Least Squares
Kyle A. GallivanDepartment of Mathematics
Florida State University
Foundations of Computational Math 1Fall 2009
1
'&
$
%
Simple Example
Arithmetic Mean: = 1n
Pni=1
i
= argminR
f() = argminR
nXi=1
(i )2
f() =
nXi=1
2(i ) and f () = 0
0 =
nXi=1
2(i )
nXi=1
=
nXi=1
i =
Pni=1
i
n
2
'&
$
%
Slightly Less Simple Example
1-dimensional Linear Regression:
Given a set of points (i, i), i = 1, . . . n
Determines a line = l() = + that minimizesf(, ) = ni=1(i l(i))
2
Solution in terms of means, variance, covariance
xy =1
n 1
nXi=1
(i )(i ) =xy
2x
2
x =1
n 1
nXi=1
(i )2
l() =( ) +
3
'&
$
%
Matrix Least Squares Form
Arithmetic Mean: min
0BBBBBB@
1
2
.
.
.
n
1CCCCCCA
0BBBBBB@
1
1
.
.
.
1
1CCCCCCA
2
1-D Linear Regression: min
0BBBBBB@
1
2
.
.
.
n
1CCCCCCA
0BBBBBB@
1 1
2 1
.
.
.
.
.
.
n 1
1CCCCCCA
0@
1A
2
Overdetermined set of equations, Ac = y = Is there a solution?
4
'&
$
%
Existence of a Solution
Lemma 5.1. Suppose A Rnk, k n has full column rank and letspan[a1, . . . , ak] = R(A) = S then:
If y S then there exists a unique c Rk such that Ac = y. If y 6 S then no c Rk exists such that Ac = y.
5
'&
$
%
Observations
If k = n then c = A1y
We must redefine solving a system when y 6 S .
Essentially, we must generalize the notion of A1 to define an operatorA Rkn.
The new definition must be consistent with the definitions used when y Sand/or when k = n.
6
'&
$
%
The Problem
Assume that A Rnk, k n, has column rank k and b Rn. Find thevector x Rk that minimizes over all vectors in Rk the norm of theresidual
bAx2
That is,xmin = argmin
xRkbAx2 = argmin
xRkr2
7
'&
$
%
Transformation-based Solution
Assume A Rnk is full rank and b Rn.
minxbAx2 = r2
Let Q Rnn be an orthogonal matrix and we have
r2 = Qr2
Change coordinate system of problem to
minxbAx2 min
xQbQAx2
Same solution different defining coefficients.
8
'&
$
%
Transformation-based Solution
What coordinate system, i.e., what constraints on Qb and QA?
k degrees of freedom What k k system defines xmin?
How do we determine/construct Q?
9
'&
$
%
Transformation-based Solution
Compute H Rnn, an orthogonal matrix, and R Rkk a nonsingularupper triangular matrix such that
HA =
R
0
and Hb =
c
d
Transformation introduces structure with 0 values.
compresses basis into k directions with O(k2) scalars
10
'&
$
%
Transformation-based Solution
r22 = Hr22
cd
R0
x
2
2
=
cRx
d
2
2
= cRx22 + d22.
Minimized when Rx = c.
xmin = R1c is unique.
11
'&
$
%
Elementary Orthogonal Matrices
Theorem 5.2. Given x Rn with x2 6= 0,
Q = I + xxT is orthogonal = 2x22
or = 0
Note thatQ = I + xxT = I 2uuT
where u2 = 1
Q is called an elementary reflector.
12
'&
$
%
Householder Reflector
Theorem 5.3. Given a vector v Rn, and = v2, if
x = v + e1 and = 2
x22.
thenHv = (I + xxT )v = e1
The sign of is chosen as sgn(eT1 v) for numerical reasons.
13
'&
$
%
Example
v =
1
1
1
1
, = 2, x =
3
1
1
1
, x22 = 12, =
1
6
H =1
6
3 3 3 3
3 5 1 1
3 1 5 1
3 1 1 5
Hv = 2e1
14
'&
$
%
Householder Reflector
Easy to generalize to introducing 0s in positions i+ 1, , n leavingelements 1, , i 1 untouched and element i an updated nonzero value:
Hiv =
Ii1 0
0 H
v1
v2
=
v1
e1
H = Ini+1 + xxT , x Rni+1
Hi =
Ii1 0
0 Ini+1
+
0
x
(
0 xT)
15
'&
$
%
Transformation-based Factorization
Assume A Rnk with n k and full rank. Hi are Householderreflectors. R Rkk is upper triangular.
HkHk1 H2H1A = B
B =
R
0
HA = B A = HTB
16
'&
$
%
Transformation-based Factorization
H = HkHk1 H2H1 is an n n orthogonal matrix.
Note that H and HT do not have a simple structure like L in the LUfactorization.
H is never computed explicitly, the individual parameters for the Hiare stored.
17
'&
$
%
Nonzero Pattern
H1A = A(1)
H1
=
0
0
0
0
0
18
'&
$
%
Nonzero Pattern
H2A(1) = A(2)
H2
0
0
0
0
0
=
0
0 0
0 0
0 0
0 0
19
'&
$
%
Nonzero Pattern
H3A(2) = A(3)
H3
0
0 0
0 0
0 0
0 0
=
0
0 0
0 0 0
0 0 0
0 0 0
20
'&
$
%
Nonzero Pattern
H4A(3) = A(4)
H4
0
0 0
0 0 0
0 0 0
0 0 0
=
0
0 0
0 0 0
0 0 0 0
0 0 0 0
A(4) is in upper trapezoidal form as desired.
21
'&
$
%
Geometry of Linear Least Squares
At the minimizer xmin = R1c and rmin22 = d22
r22 = b Ax22 = HbHAx
22 = Hr
22
cd
R0
x
2
2
=
cRx
d
2
2
= cRx22 + d22.
Hrmin =
cRxmin
d
=
0
d
22
'&
$
%
Geometry of Linear Least Squares
Hrmin =
0
d
rmin = HT
0
d
rTminA =(
0T dT)HA =
(0T dT
) R0
= 0T
Inner product of rmin with each column of A is 0.
23
'&
$
%
Geometry of Linear Least Squares
Theorem 5.4. Assume that A Rnk, k n, has column rank k andb Rn. If
xmin = argminxRk
bAx2
minimizes the least squares cost function thenrmin = bAxmin
rTminA = 0T
That is, the residual is orthogonal to R(A).
24
'&
$
%
Arithmetic Mean
minR
y e2 = minR
1
2.
.
.
n
1
1.
.
.
1
2
min = =
ni=1 i
n
rTmine = (y e)T e =
ni=1
i ni=1
1 =ni=1
i n = 0
25
'&
$
%
Orthogonal Complements
Definition 5.1. If S is a subspace of Rn with dimension k then
S = {x Rn | y S < x, y >= xT y = 0}
Lemma 5.5. If S is a subspace of Rn and x Rn then there exist twounique vectors u S and v S such that x = u+ v. Equivalently,Rn = S S.
26
'&
$
%
Orthogonal Complements
Proof. Let x = u+ v = u+ v.x x = 0 = (u u) (v v) = du dv du = dv
du = (u u) S and dv = (v v) S dTudv = 0
Since du dv22 = du22 + dv22 2dTudv0 = du
22 + dv
22 + 0 du = dv = 0
u = u and v = v
27
'&
$
%
Geometry of Linear Least Squares
Theorem 5.6. Assume that A Rnk, k n, has column rank k andb Rn with b = b1 + b2, where b1 R(A) and b2 R(A). If
xmin = argminxRk
bAx2
minimizes the least squares cost function thenb1 = Axmin
rmin = bAxmin = b2 R(A)
The linear map A : Rn Rk that maps b xmin is called thegeneralized (or pseudo) inverse of A. It is one-to-one and onto forR(A) Rk.
28
'&
$
%
Householder Reflectors and Projections
Let A Rnk have full rank, H Rnn be an orthogonal matrix andR Rkk a nonsingular upper triangular matrix such that
HA =
R
0
and Hb =
c
d
HT =[Q Q
], Q Rnk and Q Rnnk
29
'&
$
%
Householder Reflectors and Projections
We have
A = QR and b = Qc+Qd = b1 + b2
Therefore, given a Householder reflector factorization of H we cancompute
an orthonormal basis for R(A)
the projection of b onto R(A), and the projection of b onto R(A)
by applying the relectors to a series of vectors.
30