A Linear Algebra Mini-Primer I

Gidon Eshel
Dept. of the Geophysical Sciences
The Univ. of Chicago
401 Hinds
5734 S. Ellis Ave.
Chicago, IL 60637
Office Tel: (773) 702-0440
Home Tel: (773) 955-4058
geshel@uchicago.edu

Geophysical data analysis almost universally boils down to linear algebraic techniques. Unfortunately, there is no way we can do justice to this extremely important and wonderful topic in these humble notes. We will just introduce a few absolutely essential ideas. If you would like to learn more (as you should!), there is clearly no better introduction to the subject than Strang, G. Linear Algebra and its Applications 3rd Ed., Harcourt Brace Jovanovich, San Diego (1988) ISBN 0-15-551005-3.

Matrices and vectors

Matrices are the most fundamental building block of linear algebra. They arise in many, highly diverse, situations, which we will get to later. A matrix is a rectangular array of numbers, e.g.,

\begin{displaymath}\left(\begin{array}{rrr} 1 & 1 & -4 \\ 0 & 3 & 2 \\ 5 & 11 & 24 \\
-1 & 31 & 4 \end{array}\right).\end{displaymath}

A matrix is said to be $M\times N$(M by N) when it comprises M rows and N columns. The typical notation reserves boldfaced uppercase letters to matrix variables (e.g., ${\bf A}$). It is sometime helpful to explicitly report the matrix size as a subscript when it is first introduced, ${\bf
A}_{M\times N}$. A vector, typically denoted by boldfaced lowercase letters, is a special case of matrix, one for which either M or Nare 1 (e.g., ${\bf x}={\bf x}_{M\times 1}$, ${\bf y}={\bf y}_{1\times
N}$). The former (${\bf x}$) is column vector, while ${\bf y}$ is a row vector. Unless otherwise stated, we will treat vectors as column vectors.

It is useful at this point to introduce some basic matrix operations. Addition (and subtraction) are simple; for ${\bf C}={\bf A}+{\bf B}$ to be defined, ${\bf A}$ and ${\bf B}$ must have the same dimensions. Then, ${\bf C}$ `inherits' these dimensions, and

\begin{displaymath}{\bf C}_{ij}={\bf A}_{ij}+{\bf B}_{ij}.\end{displaymath}

The transpose of a matrix is constructed by swapping columns and rows. That is, if ${\bf B}={\bf A}^T$ (where the superscript Tdenotes the transpose of a real matrix), then ${\bf B}_{ij}={\bf
A}_{ji}$, where the first and second indices index rows and columns, respectively (i.e., element ij resides in row i and column j). Note that taking the transpose changes the matrix dimension; if ${\bf A}$ is $M\times N$, ${\bf A}^T$ is $N\times M$, and if ${\bf x}$is $M\times 1$, ${\bf x}^T$ is $1\times M$.

The product of 2 matrices is much more restrictive than that of 2 scalars. To see this, let us first introduce the inner product of 2 vectors, which yields a scalar. Given 2 column vectors ${\bf x}$and ${\bf y}$ of the same length N,

\begin{displaymath}a={\bf x}^T{\bf
y}\equiv\sum_{i=1}^N x_i y_i.\end{displaymath}

It is immediately obvious that the inner product is only defined when both participating vectors are of the same length, the first a row vector, the second a column vector. An easy way to check if a given pair of vectors can be multiplied together is to write down their dimensions in the order of the multiplication; if the inner dimensions agree (are the same), then the multiplication `consumes' the inner dimension, leaving the result a $1\times 1$ matrix, better known as a scalar. For example, if we wish to consider ${\bf xy}$ with ${\bf x}_{N\times 1}$ and ${\bf
y}_{N\times 1}$, we write $(N\times 1)\times(N\times 1)$. Since the inner dimensions do not match, the operation is not permitted. Conversely, ${\bf x}^T{\bf y}$ is permitted, because $(1\times N)\times(N\times 1)\rightarrow(1\times\not N)\times(\not N
\times 1)\rightarrow (1\times 1)$, a scalar. Note that ${\bf x}^T{\bf
y}={\bf y}^T{\bf x}$, i.e., the inner product is commutative.

The product of 2 matrices ${\bf A}$ and ${\bf B}$ is given by

\begin{displaymath}{\bf C}={\bf AB},\;\;\;\; {\bf C}_{ij}={\bf a}_i{\bf b}_j,\end{displaymath}

where ${\bf a}_i$ is ${\bf A}$'s ith row, and ${\bf b}_j$ is ${\bf B}$'s jth column. Thus ${\bf C}_{M\times N}={\bf A}_{M\times
K} {\bf B}_{K\times N}$, where K is some shared inner dimension which gets `consumed' by the multiplication process. To visualize this, consider

\begin{displaymath}{\bf A}=\left(\begin{array}{rr}1 & 4 \\ 2 & -3 \\ 3 & 1 \\ -1...
...\begin{array}{rrr}-1 & 2 & 1 \\ -9 & 2 & -3
\end{array}\right),\end{displaymath}

whose product ${\bf C}_{4\times 3}={\bf AB}=$

\begin{displaymath}=\left(\begin{array}{rrr}
\left(\begin{array}{rr} 1 & 4\end{a...
... 25 & -2 & 11 \\ -12
& 8 & 0 \\ 1 & -2 & -1 \end{array}\right).\end{displaymath}

Note that in general the matrix product is not commutative; ${\bf AB}\neq {\bf BA}$. In the above example, ${\bf BA}$ is not even permitted [it is $(2\times
3)\times(2\times 3)$ which does not have the required common inner dimension; $3\neq2$]. For this reason, we must explicitly distinguish post-multiplication from pre-multiplication, unless there is no ambiguity.

Vector Spaces

I want to be as concise and informal as possible, often at the expense of rigor, completeness and generality.

For our purposes, it is sufficient to think of a vector space as the set of all vectors of a certain kind. While the vectors need not in fact be actual vectors like the ones we have just discussed (they can also be functions, matrices, etc.), it will suffice here to consider `vectors' as literally column vectors. An example will clarify this; $\Re^1$ (pronounced ``r-one'') is the real line (we will only consider here real matrices, whose elements are all real numbers). In $\Re^1$, one and only one kind of inhabitants can be found: 1-vectors whose single element is any one of the real numbers from $-\infty$ to $\infty$. The numerical value of $v\in\Re^1$ (``v which is an element from within $\Re^1$'') is the distance along the real line from the origin, (0) (it is not boldfaced here, because it is one-dimensional, or a scalar). $\Re^0$ is a vector space too, but it comprises a single point, (0); not too interesting. Vector spaces must have a few simple rules: they are commutative and have a zero vector ${\bf o}$ satisfying ${\bf v} + {\bf o} = {\bf v}$, vectors and their negative counterparts satisfy ${\bf v} + (-{\bf v})={\bf o}$, and addition and multiplication are defined, yielding vectors still within the space.

The familiar plane is $\Re^2$, where all 2-element vectors exist. Each of these 2-vectors describe a point in the plane. That point is one end of the vector, the other end being the origin, (0,0). Thus the 2 elements are the distance from the origin to the projections of the point on the 2 coordinates. Likewise, the familiar physical space is $\Re^3$, where each vector ${\bf v}=(\begin{array}{ccc} v_1 & v_2 &
v_3 \end{array})^T$ is stretched between the origin (0,0,0) and ( v1,v2, v3).

Vector spaces must be spanned. By this we mean a particular (non-unique) choice of N vectors from $\Re^N$ which we choose to express any other arbitrary $\Re^N$ vector as a linear combination of. Once the choice of these vectors is made, the vectors are collectively referred-to as a `basis' for $\Re^N$, and each one of them is a basis vector. For example, for spanning $\Re^3$ (`the physical space'), the Cartesian basis is often chosen. This basis comprises the 3 vectors $\hat{\bf i}$, $\hat{\bf j}$ and $\hat{\bf k}$(sometimes denoted $\hat{\bf x}$, $\hat{\bf y}$ and $\hat{\bf z}$; the `hat' denotes a basis vector)

\begin{displaymath}\hat{\bf
i}=\left(\begin{array}{c}1\\ 0\\ 0\end{array}\right)...
... \hat{\bf
k}=\left(\begin{array}{c}0\\ 0\\ 1\end{array}\right).\end{displaymath}

Any $\Re^3$ vector can be expressed as a linear combination of these basis vectors;

\begin{displaymath}{\bf v}=\left(\begin{array}{c}v_1\\ v_2\\ v_3\end{array}\righ...
...nd{array}\right)= v_1\hat{\bf
i}+v_2\hat{\bf j}+v_3\hat{\bf k}.\end{displaymath}

Note, again, that this is not a unique choice for spanning $\Re^3$; there are infinitely many such choices. The only constrain on the choice is that in order to fully span $\Re^3$, the 3 vectors must be linearly independent, that is, that no non-trivial $\alpha$, $\beta$ and $\gamma$satisfying

\begin{displaymath}\alpha\hat{\bf i}+\beta\hat{\bf j}+\gamma\hat{\bf
k}={\bf0}\end{displaymath}

can be found. Linearly dependent vectors are mutually parallel.

The requirement for mutual linear independence of the basis vector follows from the fact that a 3-vector has 3 independent pieces of information, v1, v2 and v3. Given these 3 degrees of freedom (much more on that later in the course), we must have corresponding 3 basis vectors to work with. If one of the basis vectors is a linear combination of other ones, e.g., if $\hat{\bf j}=\alpha\hat{\bf k}$say, then $\hat{\bf j}$ and $\hat{\bf k}$ no longer represent 2 directions in $\Re^3$, but just 1. To show how this happens, consider the choice

\begin{displaymath}\hat{\bf
i}=\left(\begin{array}{c}1\\ 0\\ 0\end{array}\right)...
...rray}{c}2\\ 3\\ 0\end{array}\right)= 2\hat{\bf
i}+3\hat{\bf j}.\end{displaymath}

With this basis set, the vector $(\begin{array}{ccc}0&0&v_3\end{array})$ cannot be represented. Thus this choice of a basis doesn't fully span $\Re^3$ (it does span $\Re^2$, though, just as well as $\hat{\bf i}$ and $\hat{\bf j}$alone, but for 3 vectors to span $\Re^2$ is not very impressive). Note that this is not because none of our basis vectors has a nonzero 3rd element; try to come up with $\alpha$, $\beta$ and $\gamma$ so that

\begin{displaymath}\left(\begin{array}{r}1\\ -2\\ 1\end{array}\right)=\alpha
\le...
...ght)+\gamma
\left(\begin{array}{r}2\\ 1\\ 0\end{array}\right)!!\end{displaymath}

You cannot find such coefficients [other than (0,0,0)] no matter how hard you try, because

\begin{displaymath}\left(\begin{array}{r}2\\ 1\\ 0\end{array}\right)=
\left(\beg...
...ay}\right)+
\left(\begin{array}{r}1\\ 0\\ -1\end{array}\right).\end{displaymath}

To better understand these issues, it is useful to visualize the geometry of the problem, which will be made easier with

\begin{displaymath}\hat{\bf
i}=\left(\begin{array}{c}1\\ 0\\ 0\end{array}\right)...
...array}{c}2\\ 3\\ 0\end{array}\right)= 2\hat{\bf
i}+3\hat{\bf j}\end{displaymath}

as in the previous example. The first 2 basis vectors span $\Re^2$, as was mentioned above. In order to now add a 3rd basis vector that will complete the spanning of $\Re^3$, we need a vector, any vector, that rises above the plane z=0. Since $\left(\begin{array}{ccc}2&3&0\end{array}\right)^T$ does not do that, it doesn't help us at all; it is fully contained within the plane already successfully spanned by the previous 2 basis vectors. The same happens with

\begin{displaymath}\hat{\bf
i}=\left(\begin{array}{c}1\\ 1\\ 1\end{array}\right)...
...; \hat{\bf
k}=\left(\begin{array}{c}2\\ 1\\ 0\end{array}\right)\end{displaymath}

only this situation is harder to visualize because the redundancy takes place in a plane that is not parallel to any one of the usual basis vectors [ $(\begin{array}{ccc}1&0&0\end{array})^T$, $(\begin{array}{ccc}0&1&0\end{array})^T$ and $(\begin{array}{ccc}0&0&1\end{array})^T$], but in a plane inclined with respect to all of them. (Note, however, that we can transform the coordinates, as we will learn later, so that this redundant plane becomes a fixed value of one coordinate, which we can then eliminate from the problem, thus reducing the apparent dimensionality, 3, to the actual dimensionality, 2.)

Now let's go back to the previous example [ $\hat{\bf i}=
(\begin{array}{ccc}1&0&0\end{array})^T$, $\hat{\bf j}=
(\begin{array}{ccc}0&1&0\end{array})^T$], which is easier to picture. We have realized above that in order to assist in spanning $\Re^3$, the additional basis vector must rise above the z=0plane. However, we are still left with a choice of exactly which vector among all those satisfying this requirement we are going to add; based on this criterion, we can equally well add $\left(\begin{array}{ccc}1&1&1\end{array}\right)^T$, $\left(\begin{array}{ccc}0&0&1\end{array}\right)^T$, $\left(\begin{array}{ccc}1&1&-4\end{array}\right)^T$, etc., all rising to |z|>0, as required. Given this indeterminacy, the choice is ours; any one of these vectors will do just fine. It is often useful to choose the basis vectors so that they are mutually orthogonal.

At the simplest, two vectors ${\bf u}$ and ${\bf v}$ are said to be orthogonal if ${\bf u}^T{\bf v}={\bf v}^T{\bf u}=0$; the information that one contains is entirely absent from the other, and vice versa. With $\left(\begin{array}{ccc}1&0&0\end{array}\right)^T$and $\left(\begin{array}{ccc}0&1&0\end{array}\right)^T$ already chosen, the vector that is orthogonal to both is $\left(\begin{array}{ccc}0&0&\alpha\end{array}\right)^T$, with arbitrary $\alpha$. To show this, denote the sought vector by $\hat{\bf k}=\left(\begin{array}{ccc}a&b&c\end{array}\right)^T$. To be orthogonal to both $\left(\begin{array}{ccc}1&0&0\end{array}\right)^T$and $\left(\begin{array}{ccc}0&1&0\end{array}\right)^T$, $\hat{\bf k}$must satisfy

\begin{displaymath}\left(\begin{array}{ccc}1 &0 &0 \end{array}\right)
\left(\beg...
...}\right)
\left(\begin{array}{c }a\\ b\\ c \end{array}\right)=0,\end{displaymath}

which can only hold if a=b=0. When these conditions are met, any c will satisfy the orthogonality conditions. Hence the undetermined $\alpha$.

The arbitrariness of $\alpha$ can be alleviated by the further requirement, which is customary but not essential, to normalize basis vectors to unit norm, making them orthonormal. The norm of a vector can be defined in a number of ways; the most commonly used is the so-called L2-norm, defined for a general column vector ${\bf
v}\in\Re^N$ by

\begin{displaymath}\Vert{\bf v}\Vert\equiv \sqrt{{\bf v}^T{\bf v}}= \sqrt{\sum_{i=1}^N
v_i^2}.\end{displaymath}

Given this definition, the normalization of the basis vectors means that

\begin{displaymath}\hat{\bf i}^T\hat{\bf i}=\hat{\bf j}^T\hat{\bf j}=\hat{\bf
k}^T\hat{\bf k}=1,\end{displaymath}

i.e.,

\begin{displaymath}\left(\begin{array}{ccc}0& 0 &\alpha\end{array}\right)
\left(\begin{array}{c }0\\ 0\\ \alpha\end{array}\right)=1,\end{displaymath}


\begin{displaymath}\alpha^2=1\rightarrow\alpha=1.\end{displaymath}

Fundamental Spaces and the Rank of A Matrix

An attribute of matrices that is very important in multivariate statistics (as well as in most situations where matrices arise) is the matrix rank. The simplest definition of a rank is the number of independent columns or rows; the rank q of an $M\times N$ matrix satisfies $q\le min(M,N)$. But this is not very revealing or satisfying. To better understand what this means and why it is important, let us introduce now a new view of matrices, that of a matrix as a map.

When an $M\times N$ matrix operates on (pre-multiplies) a vector (as in ${\bf Ax}$, where ${\bf A}$ operates on ${\bf x}$) the vector must be a column vector of the same length as ${\bf A}$'s rows; ${\bf
A}_{M\times N}{\bf x}_{N\times 1}$. That is, ${\bf x}$ is a vector from $\Re^N$; ${\bf x}\in\Re^N$. But in general $M\neq N$, which is most often the case with data matrices (for reasons we will explore later). Consequently, ${\bf b}={\bf Ax}\not\in\Re^N$; ${\bf
b}\in\Re^M$. That is, the outcome vector ${\bf b}$ is not from $\Re^N$, but rather from $\Re^M$. Thus we say that ${\bf A}$ maps $\Re^N$ vectors to $\Re^M$ ones. For example, if ${\bf A}$ is $41\times 12$ and ${\bf x}$ is $12\times 1$, ${\bf b}$ will be $41\times 1$; ${\bf x}\in\Re^{12}\stackrel{\bf A}{\rightarrow}{\bf
b}\in\Re^{41}$. Thus a matrix is associated with 2 fundamental spaces, that of its rows (the row-space, where vectors it operates on come from), and that of its columns (the column-space, where vectors that result from its operation on row-space vectors come from). That the resultant vectors ${\bf b}$ are from ${\bf A}$'s column-space can be seen by noting that

\begin{displaymath}{\bf Ax}=\left(\begin{array}{cccc}
a_{11} & a_{12} & \dots & ...
...begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_N \end{array}\right)=\end{displaymath}


\begin{displaymath}=x_1{\bf a}_1 + x_2{\bf a}_2 + \dots + x_N{\bf a}_N = \sum_{i=1}^N
x_i{\bf a}_i={\bf b}.\end{displaymath}

This demonstrates visually the fact that the resulting vector ${\bf b}$ is a linear combination of ${\bf A}$'s columns, i.e., ${\bf b}$ is a vector from ${\bf A}$'s column-space. Note also that the weighting factors are none other than ${\bf x}$'s elements, $x_i,\;i=1,2\cdots N$. The row-space of ${\bf A}$ is best viewed as simply the column-space of ${\bf A}^T$, following trivially from the definition of the transpose.

Now comes a slightly tricky twist. We have established above that the column- and row-spaces of ${\bf
A}_{M\times N}$ are associated with $\Re^M$ and $\Re^N$, respectively. However, the pairs need not coincide! that is, the column-space may simply be $\Re^M$, but it doesn't have to be; it can be a subspace of $\Re^M$. Similarly, the row-space may be either the full $\Re^N$, or a subspace thereof. Let's clarify this with an example. Consider an ${\bf A}$ given by

\begin{displaymath}{\bf A}=\left(\begin{array}{ccc}1 & 1 & 4 \\ 1 & 2 & 3 \\ 1 & 3 & 2
\\ 1 & 4 & 1 \end{array}\right).\end{displaymath}

Clearly this ${\bf A}$'s column-space is associated with $\Re^4$; but is it $\Re^4$?! Right off the bat we notice that ${\bf A}$ has only 3 columns. Thus at the most, ${\bf A}$'s column-space is 3-dimensional (because 3 vectors can span at most a 3-dimensional space, even if they are 4-vectors). Thus we already know that ${\bf A}$'s column-space cannot possibly be $\Re^4$; it can, at most, be a 3-dimensional subspace of $\Re^4$. This demonstrates the importance of the minimum operator we used before to describe q, ${\bf A}$'s rank [ $q\le min(M,N)$].

Next, we want to find out if ${\bf A}$'s column-space is a 3-dimensional subspace of $\Re^4$, or an even lower-dimensional subspace thereof. Recall that for ${\bf A}$'s column-space to be $\Re^3$, ${\bf A}$'s 3 columns must be linearly independent. This is the case when no non-trivial $\alpha$, $\beta$ and $\gamma$ can be found that satisfy

\begin{displaymath}\alpha\left(\begin{array}{c}1\\ 1\\ 1\\ 1\end{array}\right) +...
...=
\left(\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right)={\bf0}.\end{displaymath}

We need to solve the above system. One way to proceed is Gaussian elimination. We operate on ${\bf A}$ with a sequence of so-called elementary operations (adding to various rows multiples of rows above them), to reduce ${\bf A}$'s elements below the main diagonal (the ij positions) to zero. We start by adding to rows 2-4 a copy of the 1st row, so that their left-most elements will vanish. With

\begin{displaymath}{\bf E}_1=\left(\begin{array}{rrrr}
1&0&0&0\\ -1&1&0&0\\ -1&0...
...array}{rrr}
1&1&4\\ 0&1&-1\\ 0&2&-2\\ 0&3&-3\end{array}\right).\end{displaymath}

Next, we turn to the 3rd row; With

\begin{displaymath}{\bf E}_2=\left(\begin{array}{rrrr}
1&0&0&0\\ 0&1&0&0\\ 0&1&-...
...n{array}{rrr}
1&1&4\\ 0&1&-1\\ 0&0&0\\ 0&0&0\end{array}\right).\end{displaymath}

We have achieved our objective (all sub-diagonal elements are zero), but with a twist; both bottom rows became all zeros. This is an indication that ${\bf A}$'s rank is 2, not 3; $q_{\bf A}=2$. In retrospect, we should have expected the last row to vanish; as we said before, because ${\bf A}$ has only 3 columns, its rank can be at most 3. And since there are 4 rows, the last one must vanish. The fact that the second-to-last row vanished too is a consequence (or a demonstration, if you will) of the fact that $q_{\bf A}=2$. And this, in turn, is true because ${\bf A}$'s first column is one fifth the sum of the other 2 columns;

\begin{displaymath}\frac{1}{5}\left[
\left(\begin{array}{r}4\\ 3\\ 2\\ 1\end{ar...
...right]=
\left(\begin{array}{r}1\\ 1\\ 1\\ 1\end{array}\right).\end{displaymath}

In other words, with

\begin{displaymath}\left(\begin{array}{r}\alpha\\ \beta\\ \gamma\end{array}\right)=
\left(\begin{array}{r} -5 \\ 1 \\ 1 \end{array}\right),\end{displaymath}


\begin{displaymath}{\bf A}\left(\begin{array}{r} -5 \\ 1 \\ 1 \end{array}\right)...
...=
\left(\begin{array}{r}0\\ 0\\ 0\\ 0\end{array}\right)={\bf0},\end{displaymath}

which is the previously emphasized condition for ${\bf A}$'s columns being linearly dependent.

A matrix whose q<min(M,N) is said to be rank-deficient; if q=min(M,N), the matrix is full-rank. The meaning of these terms becomes clearer when ${\bf A}$ is considered in the context of a set of coupled linear algebraic equations, like the set we have just solved [for $(\begin{array}{ccc}\alpha&\beta&\gamma\end{array})^T$]. Then, the system ${\bf A}_{M\times N}{\bf x}_{N\times 1}={\bf b}_{M
\times 1}$ is a system of M equations with N unknowns. The question is whether there are enough equations (constraints) to fully determine ${\bf x}$ (a related, and equally important, question is whether or not the system is consistent; we will get to it later). Let's consider again the same ${\bf A}$, but introduce a rhs (right hand side) vector. To emphasize the equation set, I will write it out explicitly;

\begin{displaymath}\begin{array}{ccrcrcc} \alpha & + & \beta & + & 4\gamma & = &...
...t(\begin{array}{c}8\\ 7\\ 6\\ 5\end{array}\right)\equiv{\bf b}.\end{displaymath}

Recall the 2 elementary matrices that we have obtained above to reduce ${\bf A}$ to an upper-triangular form (i.e., to make all sub-main-diagonal elements vanish); we called them ${\bf E}_1$ and ${\bf E}_2$, and their product (in the appropriate order) turned out to be

\begin{displaymath}{\bf E}_2{\bf E}_1=
\left(\begin{array}{rrrr}
1&0&0&0\\ 0&1&0...
...frac{1}{2}&0\\
-\frac{2}{3}&1&0&-\frac{1}{3}\end{array}\right)\end{displaymath}

and

\begin{displaymath}{\bf E}_2{\bf E}_1{\bf A}=\left(\begin{array}{rrr} 1&1&4\\ 0&1&-1\\
0&0&0\\ 0&0&0\end{array}\right).\end{displaymath}

To conserve the conditions of the equations, we must operate on the rhs vector in the same way;

\begin{displaymath}{\bf E}_2{\bf E}_1{\bf b}= \left(\begin{array}{rrrr}
1&0&0&0\...
...right)=
\left(\begin{array}{r}8\\ -1\\ 0\\ 0\end{array}\right).\end{displaymath}

What we have done here is to transform the original problem, ${\bf Ax}={\bf b}$ to an equivalent, but readily solvable, problem, ${\bf Ux}={\bf d}$, simply by premultiplying both sides by ${\bf E}_2{\bf E}_1$,

\begin{displaymath}{\bf E}_2{\bf E}_1{\bf Ax}={\bf Ux}=\left(\begin{array}{rrr}
...
...(\begin{array}{r}8\\ -1\\ 0\\ 0\end{array}\right)\equiv{\bf d}.\end{displaymath}

Because ${\bf A}$ is rank-deficient, the bottom 2 equations do not tell us anything. The 2nd equation says that $\beta-\gamma=-1$, or $\beta=\gamma-1$. The 1st equations, $\alpha+\beta+4\gamma=8$, can be combined with $\beta=\gamma-1$, yielding

\begin{displaymath}\alpha=9-5\gamma.\end{displaymath}

That is, any

\begin{displaymath}{\bf x}=\left(\begin{array}{c} 9-5\gamma \\ \gamma-1 \\ \gamma
\end{array}\right)\end{displaymath}

satisfies the system ${\bf Ax}={\bf
x}$. In other words, we can solve the equation up to a point; the value of $\gamma$ is totally undetermined, and any choice will satisfy the constraints (equations) perfectly (and equally) well; ${\bf A}$ is rank-deficient. It is convenient to split ${\bf x}$,

\begin{displaymath}{\bf x}= \left(\begin{array}{r}9\\ -1\\ 0\end{array}\right)+
...
...gin{array}{r}-5\\ 1\\ 1\end{array}\right)
={\bf x}_p+{\bf x}_h,\end{displaymath}

the particular and homogeneous parts of the solution.

The first vector, ${\bf x}_p$, solves the equations;

\begin{displaymath}\left(\begin{array}{ccc}1 & 1 & 4 \\ 1 & 2
& 3 \\ 1 & 3 & 2 \...
...\right)=
\left(\begin{array}{r}8\\ 7\\ 6\\ 5\end{array}\right),\end{displaymath}

as required.

However, that's not the end of the story. The vector

\begin{displaymath}\hat{\bf n}=\gamma\left(\begin{array}{r} -5 \\ 1 \\ 1
\end{array}\right)\end{displaymath}

spans the null-space of ${\bf A}$'s column-space. That is, any vector parallel to $\hat{\bf n}$ (any choice of $\gamma$) is mapped by ${\bf A}$ to (0,0,0,0)T, so we can add it to the solution without violating the constraints. For example, let's choose $\gamma=1$, in which case

\begin{displaymath}\left(\begin{array}{ccc}1 & 1 & 4 \\ 1 & 2
& 3 \\ 1 & 3 & 2 \...
...right)=
\left(\begin{array}{r}0\\ 0\\ 0\\ 0\end{array}\right)
.\end{displaymath}

Now let's contrast the above situation with

\begin{displaymath}{\bf A}=\left(\begin{array}{rrr}
1 & 1 & 4 \\
0 & 2 & 3 \\...
...right)=
\left(\begin{array}{r}7\\ 5\\ 3\\ 5\end{array}\right)
.\end{displaymath}

Now, with

\begin{displaymath}{\bf E}_1=\left(\begin{array}{rrrr}
1 & 0 & 0 & 0 \\
0 & 1 &...
...0 & 0 \\
0 & -2 & 1 & 0 \\
0 & -2 & 0 & 1 \end{array}\right),\end{displaymath}


\begin{displaymath}{\bf E}_2{\bf E}_1=\left(\begin{array}{rrrr}
1 & 0 & 0 & 0 \\...
...0 & 0 \\
1 & -2 & 1 & 0 \\
0 & -2 & 0 & 1 \end{array}\right),\end{displaymath}


\begin{displaymath}{\bf E}_2{\bf E}_1{\bf A}=\left(\begin{array}{rrr}
1 & 1 & 4 ...
...b}=\left(\begin{array}{r} 7 \\ 5 \\ 0 \\ -5
\end{array}\right).\end{displaymath}

That is,

\begin{displaymath}{\bf Ux}=\left(\begin{array}{rrr}
1 & 1 & 4 \\
0 & 2 & 3 \\ ...
...\begin{array}{r}7\\ 5\\ 0\\ -5\end{array}\right)\equiv{\bf d}
.\end{displaymath}

We now employ back-substitution; The 4th equation indicates that $\gamma=1$. The 2nd then becomes $2\beta+3\gamma=5$, or $\beta=1$. Finally, the 1st equation yields $\alpha=2$, and

\begin{displaymath}{\bf x}=\left(\begin{array}{r} 2 \\ 1 \\ 1 \end{array}\right).\end{displaymath}

Now ${\bf x}$ is completely and fully determined, with no lingering ambiguity or indeterminacy. This is because this ${\bf A}$ is full rank, q=3=min(4,3). In turn, this is so because ${\bf A}$'s left-most column is independent of the other 2, and thus ${\bf A}$'s columns are all mutually independent, yielding full-rank.

The number of independent columns (or rows) of an arbitrary ${\bf A}$is a very fundamental question, one that has a number of ways to answer (all of which, of course, will lead to the same conclusion!) It is also a question that is somewhat ambiguous in practice. To see why, recall that linearly dependent vectors are parallel. In simple examples, vectors are either mutually parallel or not; there is no in-between. Noisy data matrices, on the other hand, can have columns that are almost parallel, but not quite. Then, the question is a more subtle one. The conditioning number of a matrix (which we will define later) is meant to answer the question, but in a somewhat subjective way. We need to know more about matrices in order to fully appreciate this dilemma.

So for now, we conclude this brief discussion by noting that the rank of an $M\times N$ matrix is the number of independent rows or columns; if q=min(M,N), ${\bf A}$ is full-rank. Otherwise [ q<min(M,N)], ${\bf A}$ is rank-deficient. Rank-deficiency implies the presence of a nontrivial null-space in the appropriate space [the smaller of (M,N)]. If $M\neq N$, the larger of (M,N) must have a non-trivial null-space, because ${\bf A}$'s highest possible rank is min(M,N).

With the rank q defined, and the distinction between rank-deficient and full-rank matrices established, we are now ready to introduce the all-important inverse of a square, full-rank, matrix (to be clear: only full-rank square matrices have an inverse in the sense we use here). The inverse is denoted by ${\bf A}^{-1}$, and

\begin{displaymath}{\bf A}^{-1}{\bf A}={\bf A}{\bf A}^{-1}={\bf I}_N,\end{displaymath}

where ${\bf I}_N$, the $N\times N$ identity matrix, is the N-dimensional generalization of 1;

\begin{displaymath}{\bf I}_{ij}=\left\{\begin{array}{cc}1&i=
j\\ 0&i\neq j\end{array}\right.\;\;\;\;\;{\rm or}\end{displaymath}


\begin{displaymath}{\bf I}_N=\left(\begin{array}{ccccccc} 1 & 0 & 0 & \dots & 0 ...
... 0 & 1 & 0 \\ 0 & 0 & 0 & \dots & 0 & 0 & 1 \end{array}\right).\end{displaymath}

It leaves intact vectors it operates on, changing neither their direction nor their magnitude.