Least Squares

Christopher Carbone

We will connect the Best Approximation Theory to Least Squares approximation through the use of a motivating example. We will further explore data modeling by using least squares to approximate a line of best fit, or a curve of best fit.

You complete an experiment to weigh Fiddler Crabs that were submerged in a saline solution.

Time (minutes)

Mean Change of Weight (grams)

0 0.0

20 0.9

40 1.1

60 1.1

80 1.1

100 1.2

Table 1

We want to predict the weight of the Fiddler Crabs at 120 minutes.

Take x1 to be the initial weight of the Fiddler Crabs and take x2 to be the weight gain per minute:

x1 + 0x2 = 0.0 x1 + 20x2 = 0.9 x1 + 40x2 = 1.1 x1 + 60x2 = 1.1 x1 + 80x2 = 1.1 x1 + 100x2 = 1.2

To approximate the vector u on the plane W:

We will show that projWu is the best approximation of u by vectors in W. To do this, choose a vector v∈W such that v ≠ projWu.

Approximating ||u-‐projWu||, amounts to minimizing ||u-‐v||.

Given a finite-‐dimension subspace W, we need to write a vector u as a linear combination of vectors in the subspace W. But, we cannot do this exactly since u∉W. We will show that the best we can do is projWu since for all v∈W where v ≠ projWu, ||u-‐projwu|| < ||u – v||. Therefore, projwu is the “best approximation” of u by vectors in W.

Proof: �  Let W be a finite-‐dimension subspace of an inner product space V with u∈V and v∈W.

�  projWu,v∈W, so (projWu – v)∈W. �  u – projWu is orthogonal to W, so u – projWu is orthogonal to projWu – v as well.

�  u – v = (u – projWu) + (projWu – v). � We can apply the Pythagorean Theorem, ||u-‐v||2 = ||u – projWu||2 + ||projWu -‐ v||2.

�  As long as v ≠ projWu, ||u-‐v||2 > ||u – projWu||2. Therefore, ||u-‐v|| > || u – projWu||. ☐

If we have a matrix A, then we can let the column space of A be W. Thus, W represents the vectors Ax for all x∈Rn. Since the system Ax = b is inconsistent, b∉W. Thus, ∄x∈Rn such that Ax = b. We will find x’∈Rn such that Ax’∈W where Ax’ is the best approximation for b∉W.

�  Given the inconsistent linear system Ax = b, we want to find the least squares solution x’ to minimize ||Ax -‐ b||. If ||Ax’ – b|| is large, then x’ is regarded as a poor solution. If ||Ax’ – b|| is small, x’ is a good solution. Thus, this solution x’ is called a least squares solution of Ax = b.

�  Let e = Ax – b. Then writing e component-‐wise yields e = (e1, e2, … , em). Since we try to minimize this

vector, we are minimizing ||e|| = . Therefore, the solution will minimize ||e||2 = (e12 + e22 + … + em2) as well. Thus, we are trying to find the sum of the least squares to minimize that vector.

�e21 + e22 + . . . + e2m

�  From the Best Approximation Theorem, the closest vector x’ is the orthogonal projection of b on W.

�  For x’ to be a least squares solution to the system Ax=b, Ax’ = projWb.

�  Since b∉W, b – Ax’ = b – projWb is orthogonal to W. � W is the column space of A, so b – Ax’ must lie in the nullspace of AT.

�  A least squares solution x’ of Ax’ = b would satisfy AT(b – Ax’) = 0, or equivalently ATAx’ = ATb.

�  The system must be consistent as we are assuming Ax’ = b, since x’ is the least squares solution to Ax = b.

For an n x m matrix A, A has linearly independent column vectors if and only if ATA is invertible.

The square matrix that is obtained from ATA is invertible when the matrix A has linearly independent column vectors.

Thus, from ATAx’ = ATb. We obtain x’ = (ATA)-‐1ATb.

Recalling our earlier example about the Fiddler Crabs, the water regulation is called osmoregulation, such that the crabs will undergo osmosis in the solution, and thus gain weight. Let’s see the linear equations produced again:

x1 + 0x2 = 0.0 x1 + 20x2 = 0.9 x1 + 40x2 = 1.1 x1 + 60x2 = 1.1 x1 + 80x2 = 1.1 x1 + 100x2 = 1.2

A =

⇤

⌥⌥⌥⌥⌥⌥⇧

1 01 201 401 601 801 100

⌅

��⌃x =

�x1

x2

⇥b =

⇤

⌥⌥⌥⌥⌥⌥⇧

0.00.91.11.11.11.2

⌅

��⌃

AT =

�1 1 1 1 120 40 60 80 100

⇥

(ATA) =

�6 300300 22000

⇥(ATA)�1 =

�0.52381 �0.00714�0.00714 0.00014

⇥AT b =

�5.4336

⇥

x� =

�0.52381 �0.00714�0.00714 0.00014

⇥ �5.4336

⇥=

�0.428570.00943

⇥=

�x1

x2

⇥

Given a set of data of the form (x1,y1), (x2,y2), …, (xn,yn), we model the data by trying to fit an equation to the data.

n + 1 sets of data can be modeled by polynomials all the way up to the nth-‐degree polynomial. But, when we try to fit a curve to data, some measurement error and rounding issues exists. Therefore, the curve cannot fit the data exactly. So, we will use the least squares technique.

We minimize the error between the data points and the actual line formed from the line of best fit. This is the vertical distance between the points and the line, denoted as the distance dj.

We believe that there is an additive error with the vertical coordinates, not the horizontal coordinates, producing yj = a0 + a1xj + dj.

The equation of the line would be y = a0 + a1x. So, the resulting linear system would look like:

y1 = a0 + a1x1 y2 = a0 + a1x2

… yn = a0 + a1xn

Or equivalently,

A =

⇤

⌥⌥⇧

1 x1

1 x2

... ...1 xn

⌅

��⌃x =

�a0a1

⇥b =

⇤

⌥⌥⇧

y1y2...yn

⌅

��⌃

In the Example, we have carried out the least squares solution for a straight line of best fit.

The equation would be y = 0.42857 + 0.00943x.

The equation is y = a0 + a1x + a2x2.

The equation is y = 0.14286 + 0.03086x – 0.00002x2.

A =

�

⇧⇧⇤

1 x1 x21

1 x2 x22

... ...1 xn x2

n

⇥

⌃⌃⌅x =

�

⇤a0a1a2

⇥

⌅ b =

�

⇧⇧⇤

y1y2...yn

⇥

⌃⌃⌅

A =

�

⇧⇧⇧⇧⇧⇧⇤

1 0 01 20 4001 40 16001 60 36001 80 64001 100 10000

⇥

⌃⌃⌃⌃⌃⌃⌅x =

�

⇤a0a1a2

⇥

⌅ b =

�

⇧⇧⇧⇧⇧⇧⇤

0.00.91.11.11.11.2

⇥

⌃⌃⌃⌃⌃⌃⌅

The equation is y = a0 + a1x + a2x2 + a3x3.

The equation is y = 0.0150794 + 0.0600331x – 0.0010129x2 + 0.0000053x3

A =

�

⇧⇧⇤

1 x1 x21 x3

1

1 x2 x22 x3

2

... ...1 xn x2

n x3n

⇥

⌃⌃⌅x =

�

⇧⇧⇤

a0a1a2a3

⇥

⌃⌃⌅ b =

�

⇧⇧⇤

y1y2...yn

⇥

⌃⌃⌅

A =

�

⇧⇧⇧⇧⇧⇧⇤

1 0 0 01 20 400 80001 40 1600 640001 60 3600 2160001 80 6400 5120001 100 10000 1000000

⇥

⌃⌃⌃⌃⌃⌃⌅x =

�

⇧⇧⇤

a0a1a2a3

⇥

⌃⌃⌅ b =

�

⇧⇧⇧⇧⇧⇧⇤

0.00.91.11.11.11.2

⇥

⌃⌃⌃⌃⌃⌃⌅

The number of columns increased in matrix A for each degree of the proposed polynomial function. We can fit a polynomial of degree n to m data points as:

y1 = a0 + a1x1 + a2x12 + … + anx1n y2 = a0 + a1x2 + a2x22 + … + anx2n

… ym = a0 + a1xm + a2xm2 + … + anxmn

We can solve for the least squares solution x’ = (ATA)-‐1ATb to fit m data points with an m – 1 degree polynomial:

Anton, Howard. Elementary Linear Algebra. 9th ed. John Wiley & Sons. United States of America. 2005.

Johnson, Arlene Prof. Animal Osmoregulation. Biology Lab. BI114L. 26 February 2010.

Documents

Least Squares