GeoSci 236: The Fundamental Theorem of Linear Algebra

Gidon Eshel
491 Hinds
Dept. of the Geophysical Sciences,
5734 S. Ellis Ave., The Univ. of Chicago,
Chicago, IL 60637
(773) 702-0440, geshel@midway.uchicago.edu


 
Figure 1: The forward problem (the fundamental theorem of linear algebra). A's domain is the upper-left space, while its range is the lower-right one. In the domain, A's row-space is shown in red, while its nullspace in blue. A generic vector comprising both a row-space and a nullspace components is the vector on which A operates, mapping it onto the adjoint space (lower-right). In the latter space, the shown b comprises components from A's range (column-space) and left nullspace.
\begin{figure}\centerline{\psfig{file=spaces.ps,width=140mm}}\end{figure}

Figure 1 represents the operation of a matrix ${\bf A}_{M\times N}$ on a vector ${\bf x}\in\Re^N$ (the upper-left space). That is, it shows schematically what happens when an arbitrary vector from ${\bf A}$'s domain (the space corresponding dimensionally to ${\bf A}$'s row dimension N) is mapped by ${\bf A}$ onto the range space (the space corresponding dimensionally to ${\bf A}$'s column dimension M). Hence the schematic shows what happens to ${\bf x}$ from the upper-left space as ${\bf A}$ transforms it to the range, the lower-right space. Put differently, this schematic represents the forward problem, while the inverse problem will be shown later.

To enable visualization, both the range and the domain spaces are depicted as three-dimensional. This, of course, is just a metaphor, meant to make the schematic comprehensible by our feeble sense of space; both the domain and the range can obviously be of arbitrary dimensions.

Both spaces are divided into 2 orthogonal subspaces; $\Re^N$ (upper left) comprises the row and null spaces, ${\bf I\!R}^T$ and $\cal N$, respectively. The row space is shown in red, while ${\bf A}$'s nullspace is blue. [Note that ${\bf I\!R}^T$ is a shorthand for ${\bf I\!R}({\bf A}^T)$.] An arbitrary $\Re^N$ vector ${\bf x}$ comprises 2 orthogonal contributions; ${\bf x}={\bf x}_n+{\bf x}_r$, where ${\bf
x}_n$ is the nullspace part, while ${\bf x}_r$ is the row-space part.

You might find it puzzling (or overly restrictive) that the 2 spaces are presented as aligned with the principal coordinates. This is a fair point - there in nothing to guarantee that this will prove the case! However, we can always transform the coordinate system so that this will be true. This will be the case when $\Re^N$ (domain, upper-left) is spanned by $\{{\bf v}_i\}$, the columns of ${\bf V}$, while $\Re^M$ (range, lower-right) is spanned by $\{{\bf u}_i\}$, the columns of ${\bf U}$, ${\bf V}$ and ${\bf U}$ arising from ${\bf A}$'s SVD representation ${\bf A}={\bf U\Sigma V}^T$.

When we take ${\bf Ax}={\bf Ax}_n+{\bf Ax}_r$, a number of things happen. First, ${\bf
x}_n$ is mapped to the zero vector in the `target' space (the adjoint space, in $\Re^M$); ${\bf Ax}_n=0$by construction. This is shown by the uppermost arrow, from ${\bf
x}_n$'s tip in the upper-left space to ${\bf0}\in\Re^M$ in the lower-right space. Conversely, ${\bf x}$'s row-space part, ${\bf x}_r$, is mapped by ${\bf A}$ onto the corresponding point in the adjoint space ${\bf b}_r$; ${\bf Ax}_r={\bf b}_r$. Note that this is exactly the same as ${\bf A}$ operating on the entire ${\bf x}$, because the nullspace component is mapped to zero, by construction, thus not changing the end result in any way. Note also that there is nothing the combination of ${\bf A}$ and ${\bf x}$ can do to give rise to ${\bf b}_n\in{\cal N}({\bf
A}^T)\subseteq \Re^M$.

You might ask yourself the following question: If we are solving here the forward problem, why in the world are we going to be so stupid as to choose an ${\bf x}$ which has a nonzero nullspace component? This is a very sensible question, but one which can be answered very easily. The forward problem suggests a physical model. That is, we envision certain physics, which collectively account for the particular ${\bf A}$ of our problem, and with which the problem is integrated forward, ${\bf x}_{t+1}={\bf Ax}_t$, ${\bf
x}_{t+2}={\bf Ax}_{t+1}$, etc. Without boundary conditions, however, ${\bf A}$ will be rank-deficient. In other words, it will have a nonempty nullspace, a situation also explicitly addressed by our schematic.


 
Figure 2: The inverse problem. Figure 1's notation is mantained.
\begin{figure}\centerline{\psfig{file=inverse.ps,width=140mm}}\end{figure}

Now we turn to the inverse problem, shown in figure 2. While the spaces appear the same, the problem is different. Here we start by obtaining a set of observations, which we put in the components of the vector ${\bf b}$. Next, we envision certain physics which gave rise to ${\bf b}$, and, as before, construct ${\bf A}$ according to those physics. However, the parameters on which the solution depends (the components of ${\bf x}$) are now solved-for optimally. The optimality of the solution means that the left nullspace part of ${\bf b}$, ${\bf
b}_n$ (which we think of as `noise', hence the $\epsilon$ notation), is as small as possible. However the solution is obtained [i.e., whether the mapping ${\bf Bb}$ from $\Re^M$ (lower-right) to $\Re^N$(upper-left) is achieved using ${\bf B}={\bf A}^+$ derived from ${\bf A}$'s SVD representation, or simply by ${\bf B}={\bf A}^{-1}$ if ${\bf A}$ is both square and full-rank], there is nothing we can do about the (hopefully minor) inconsistencies ${\bf
b}_n$ in the set of linear equations ${\bf Ax}={\bf b}$; they are simply mapped onto the zero vector ${\bf B}$ ${\bf b}_n={\bf0}\in\Re^N$, as the uppermost arrow shows.

Given this, the best we can do is solve the closest problem to ${\bf Ax}={\bf b}$ we know how to solve, ${\bf
A}\hat{\bf x}={\bf p}$, where ${\bf p}$ is the projection of the data vector ${\bf b}$ onto ${\bf A}$'s column space; ${\bf p}={\bf A}^T{\bf b}$.

If you think our troubles are now over, hold your horses! there still exists the all-too-realistic possibility that while we have reduced the insoluble problem ${\bf Ax}={\bf b}$ to the tractable ${\bf
A}\hat{\bf x}={\bf p}$, the solution may not be unique! This will happen when ${\bf A}$'s rank q is smaller than its row dimension; q<N. In this case, ${\bf V}$'s $q+1\rightarrow N$ columns, $\{{\bf
v}_{q+1\rightarrow N}\}$, are going to span a nonempty nullspace in $\Re^N$. In such a case, any nullspace component ${\bf
x}_n$ can be added to the solution without any change in the goodness of the fit. Then, the solution ${\bf Bp}={\bf Bb}$ performs a second optimization, much less straightforward than minimizing $\Vert{\bf
\epsilon}\Vert=\Vert{\bf b}_n\Vert$: it simply picks the shortest possible x, the one with no nullspace component. That is, it sets ${\bf
x}_n={\bf0}$. This may be justified as the least conjectural, or most parsimonious, solution (given our imperfect information), but it is far from clear that these are the solution's most desirable attributes.