30
Set 5: Linear Least Squares Kyle A. Gallivan Department of Mathematics Florida State University Foundations of Computational Math 1 Fall 2009 1

set5

Embed Size (px)

DESCRIPTION

Set5

Citation preview

  • '&

    $

    %

    Set 5: Linear Least Squares

    Kyle A. GallivanDepartment of Mathematics

    Florida State University

    Foundations of Computational Math 1Fall 2009

    1

  • '&

    $

    %

    Simple Example

    Arithmetic Mean: = 1n

    Pni=1

    i

    = argminR

    f() = argminR

    nXi=1

    (i )2

    f() =

    nXi=1

    2(i ) and f () = 0

    0 =

    nXi=1

    2(i )

    nXi=1

    =

    nXi=1

    i =

    Pni=1

    i

    n

    2

  • '&

    $

    %

    Slightly Less Simple Example

    1-dimensional Linear Regression:

    Given a set of points (i, i), i = 1, . . . n

    Determines a line = l() = + that minimizesf(, ) = ni=1(i l(i))

    2

    Solution in terms of means, variance, covariance

    xy =1

    n 1

    nXi=1

    (i )(i ) =xy

    2x

    2

    x =1

    n 1

    nXi=1

    (i )2

    l() =( ) +

    3

  • '&

    $

    %

    Matrix Least Squares Form

    Arithmetic Mean: min

    0BBBBBB@

    1

    2

    .

    .

    .

    n

    1CCCCCCA

    0BBBBBB@

    1

    1

    .

    .

    .

    1

    1CCCCCCA

    2

    1-D Linear Regression: min

    0BBBBBB@

    1

    2

    .

    .

    .

    n

    1CCCCCCA

    0BBBBBB@

    1 1

    2 1

    .

    .

    .

    .

    .

    .

    n 1

    1CCCCCCA

    0@

    1A

    2

    Overdetermined set of equations, Ac = y = Is there a solution?

    4

  • '&

    $

    %

    Existence of a Solution

    Lemma 5.1. Suppose A Rnk, k n has full column rank and letspan[a1, . . . , ak] = R(A) = S then:

    If y S then there exists a unique c Rk such that Ac = y. If y 6 S then no c Rk exists such that Ac = y.

    5

  • '&

    $

    %

    Observations

    If k = n then c = A1y

    We must redefine solving a system when y 6 S .

    Essentially, we must generalize the notion of A1 to define an operatorA Rkn.

    The new definition must be consistent with the definitions used when y Sand/or when k = n.

    6

  • '&

    $

    %

    The Problem

    Assume that A Rnk, k n, has column rank k and b Rn. Find thevector x Rk that minimizes over all vectors in Rk the norm of theresidual

    bAx2

    That is,xmin = argmin

    xRkbAx2 = argmin

    xRkr2

    7

  • '&

    $

    %

    Transformation-based Solution

    Assume A Rnk is full rank and b Rn.

    minxbAx2 = r2

    Let Q Rnn be an orthogonal matrix and we have

    r2 = Qr2

    Change coordinate system of problem to

    minxbAx2 min

    xQbQAx2

    Same solution different defining coefficients.

    8

  • '&

    $

    %

    Transformation-based Solution

    What coordinate system, i.e., what constraints on Qb and QA?

    k degrees of freedom What k k system defines xmin?

    How do we determine/construct Q?

    9

  • '&

    $

    %

    Transformation-based Solution

    Compute H Rnn, an orthogonal matrix, and R Rkk a nonsingularupper triangular matrix such that

    HA =

    R

    0

    and Hb =

    c

    d

    Transformation introduces structure with 0 values.

    compresses basis into k directions with O(k2) scalars

    10

  • '&

    $

    %

    Transformation-based Solution

    r22 = Hr22

    cd

    R0

    x

    2

    2

    =

    cRx

    d

    2

    2

    = cRx22 + d22.

    Minimized when Rx = c.

    xmin = R1c is unique.

    11

  • '&

    $

    %

    Elementary Orthogonal Matrices

    Theorem 5.2. Given x Rn with x2 6= 0,

    Q = I + xxT is orthogonal = 2x22

    or = 0

    Note thatQ = I + xxT = I 2uuT

    where u2 = 1

    Q is called an elementary reflector.

    12

  • '&

    $

    %

    Householder Reflector

    Theorem 5.3. Given a vector v Rn, and = v2, if

    x = v + e1 and = 2

    x22.

    thenHv = (I + xxT )v = e1

    The sign of is chosen as sgn(eT1 v) for numerical reasons.

    13

  • '&

    $

    %

    Example

    v =

    1

    1

    1

    1

    , = 2, x =

    3

    1

    1

    1

    , x22 = 12, =

    1

    6

    H =1

    6

    3 3 3 3

    3 5 1 1

    3 1 5 1

    3 1 1 5

    Hv = 2e1

    14

  • '&

    $

    %

    Householder Reflector

    Easy to generalize to introducing 0s in positions i+ 1, , n leavingelements 1, , i 1 untouched and element i an updated nonzero value:

    Hiv =

    Ii1 0

    0 H

    v1

    v2

    =

    v1

    e1

    H = Ini+1 + xxT , x Rni+1

    Hi =

    Ii1 0

    0 Ini+1

    +

    0

    x

    (

    0 xT)

    15

  • '&

    $

    %

    Transformation-based Factorization

    Assume A Rnk with n k and full rank. Hi are Householderreflectors. R Rkk is upper triangular.

    HkHk1 H2H1A = B

    B =

    R

    0

    HA = B A = HTB

    16

  • '&

    $

    %

    Transformation-based Factorization

    H = HkHk1 H2H1 is an n n orthogonal matrix.

    Note that H and HT do not have a simple structure like L in the LUfactorization.

    H is never computed explicitly, the individual parameters for the Hiare stored.

    17

  • '&

    $

    %

    Nonzero Pattern

    H1A = A(1)

    H1

    =

    0

    0

    0

    0

    0

    18

  • '&

    $

    %

    Nonzero Pattern

    H2A(1) = A(2)

    H2

    0

    0

    0

    0

    0

    =

    0

    0 0

    0 0

    0 0

    0 0

    19

  • '&

    $

    %

    Nonzero Pattern

    H3A(2) = A(3)

    H3

    0

    0 0

    0 0

    0 0

    0 0

    =

    0

    0 0

    0 0 0

    0 0 0

    0 0 0

    20

  • '&

    $

    %

    Nonzero Pattern

    H4A(3) = A(4)

    H4

    0

    0 0

    0 0 0

    0 0 0

    0 0 0

    =

    0

    0 0

    0 0 0

    0 0 0 0

    0 0 0 0

    A(4) is in upper trapezoidal form as desired.

    21

  • '&

    $

    %

    Geometry of Linear Least Squares

    At the minimizer xmin = R1c and rmin22 = d22

    r22 = b Ax22 = HbHAx

    22 = Hr

    22

    cd

    R0

    x

    2

    2

    =

    cRx

    d

    2

    2

    = cRx22 + d22.

    Hrmin =

    cRxmin

    d

    =

    0

    d

    22

  • '&

    $

    %

    Geometry of Linear Least Squares

    Hrmin =

    0

    d

    rmin = HT

    0

    d

    rTminA =(

    0T dT)HA =

    (0T dT

    ) R0

    = 0T

    Inner product of rmin with each column of A is 0.

    23

  • '&

    $

    %

    Geometry of Linear Least Squares

    Theorem 5.4. Assume that A Rnk, k n, has column rank k andb Rn. If

    xmin = argminxRk

    bAx2

    minimizes the least squares cost function thenrmin = bAxmin

    rTminA = 0T

    That is, the residual is orthogonal to R(A).

    24

  • '&

    $

    %

    Arithmetic Mean

    minR

    y e2 = minR

    1

    2.

    .

    .

    n

    1

    1.

    .

    .

    1

    2

    min = =

    ni=1 i

    n

    rTmine = (y e)T e =

    ni=1

    i ni=1

    1 =ni=1

    i n = 0

    25

  • '&

    $

    %

    Orthogonal Complements

    Definition 5.1. If S is a subspace of Rn with dimension k then

    S = {x Rn | y S < x, y >= xT y = 0}

    Lemma 5.5. If S is a subspace of Rn and x Rn then there exist twounique vectors u S and v S such that x = u+ v. Equivalently,Rn = S S.

    26

  • '&

    $

    %

    Orthogonal Complements

    Proof. Let x = u+ v = u+ v.x x = 0 = (u u) (v v) = du dv du = dv

    du = (u u) S and dv = (v v) S dTudv = 0

    Since du dv22 = du22 + dv22 2dTudv0 = du

    22 + dv

    22 + 0 du = dv = 0

    u = u and v = v

    27

  • '&

    $

    %

    Geometry of Linear Least Squares

    Theorem 5.6. Assume that A Rnk, k n, has column rank k andb Rn with b = b1 + b2, where b1 R(A) and b2 R(A). If

    xmin = argminxRk

    bAx2

    minimizes the least squares cost function thenb1 = Axmin

    rmin = bAxmin = b2 R(A)

    The linear map A : Rn Rk that maps b xmin is called thegeneralized (or pseudo) inverse of A. It is one-to-one and onto forR(A) Rk.

    28

  • '&

    $

    %

    Householder Reflectors and Projections

    Let A Rnk have full rank, H Rnn be an orthogonal matrix andR Rkk a nonsingular upper triangular matrix such that

    HA =

    R

    0

    and Hb =

    c

    d

    HT =[Q Q

    ], Q Rnk and Q Rnnk

    29

  • '&

    $

    %

    Householder Reflectors and Projections

    We have

    A = QR and b = Qc+Qd = b1 + b2

    Therefore, given a Householder reflector factorization of H we cancompute

    an orthonormal basis for R(A)

    the projection of b onto R(A), and the projection of b onto R(A)

    by applying the relectors to a series of vectors.

    30