Gidon Eshel
Dept. of the Geophysical Sciences
The Univ. of Chicago
401 Hinds
5734 S. Ellis Ave.
Chicago, IL 60637
Office Tel: (773) 702-0440
Home Tel: (773) 955-4058
geshel@uchicago.edu
Geophysical data analysis almost universally boils down to linear algebraic techniques. Unfortunately, there is no way we can do justice to this extremely important and wonderful topic in these humble notes. We will just introduce a few absolutely essential ideas. If you would like to learn more (as you should!), there is clearly no better introduction to the subject than Strang, G. Linear Algebra and its Applications 3rd Ed., Harcourt Brace Jovanovich, San Diego (1988) ISBN 0-15-551005-3.
Matrices are the most fundamental building block of linear
algebra. They arise in many, highly diverse, situations, which we will
get to later. A matrix is a rectangular array of numbers, e.g.,
It is useful at this point to introduce some basic matrix operations.
Addition (and subtraction) are simple; for
to be
defined,
and
must have the same dimensions. Then,
`inherits' these dimensions, and
The transpose of a matrix is constructed by swapping columns and
rows. That is, if
(where the superscript Tdenotes the transpose of a real matrix), then
,
where the first and second indices index rows and columns,
respectively (i.e., element ij resides in row i and column
j). Note that taking the transpose changes the matrix dimension; if
is
,
is
,
and if
is
,
is
.
The product of 2 matrices is much more restrictive than that of
2 scalars. To see this, let us first introduce the inner product
of 2 vectors, which yields a scalar. Given 2 column vectors
and
of the same length N,
The product of 2 matrices
and
is given by
I want to be as concise and informal as possible, often at the expense of rigor, completeness and generality.
For our purposes, it is sufficient to think of a vector space as the
set of all vectors of a certain kind. While the vectors need not in
fact be actual vectors like the ones we have just discussed (they can
also be functions, matrices, etc.), it will suffice here to consider
`vectors' as literally column vectors. An example will clarify this;
(pronounced ``r-one'') is the real line (we will only consider
here real matrices, whose elements are all real numbers). In
,
one and only one kind of inhabitants can be found: 1-vectors whose
single element is any one of the real numbers from
to
.
The numerical value of
(``v which is an
element from within
'') is the distance along the real line
from the origin, (0) (it is not boldfaced here, because it is
one-dimensional, or a scalar).
is a vector space too, but it
comprises a single point, (0); not too interesting. Vector spaces
must have a few simple rules: they are commutative and have a zero
vector
satisfying
,
vectors and
their negative counterparts satisfy
,
and addition and multiplication are defined, yielding vectors still
within the space.
The familiar plane is
,
where all 2-element vectors exist. Each
of these 2-vectors describe a point in the plane. That point is one
end of the vector, the other end being the origin, (0,0). Thus the 2
elements are the distance from the origin to the projections of the
point on the 2 coordinates. Likewise, the familiar physical space is
,
where each vector
is stretched between the origin (0,0,0) and
(
v1,v2, v3).
Vector spaces must be spanned. By this we mean a particular
(non-unique) choice of N vectors from
which we choose to
express any other arbitrary
vector as a linear combination
of. Once the choice of these vectors is made, the vectors are
collectively referred-to as a `basis' for
,
and each one of
them is a basis vector. For example, for spanning
(`the
physical space'), the Cartesian basis is often chosen. This basis
comprises the 3 vectors
,
and
(sometimes denoted
,
and
;
the
`hat' denotes a basis vector)
The requirement for mutual linear independence of the basis vector
follows from the fact that a 3-vector has 3 independent pieces of
information, v1, v2 and v3. Given these 3 degrees of freedom
(much more on that later in the course), we must have corresponding 3
basis vectors to work with. If one of the basis vectors is a linear
combination of other ones, e.g., if
say, then
and
no longer represent 2
directions in
,
but just 1. To show how this happens, consider
the choice
does not do
that, it doesn't help us at all; it is fully contained within the
plane already successfully spanned by the previous 2 basis
vectors. The same happens with
Now let's go back to the previous example [
,
], which is easier to
picture. We have realized above that in order to assist in spanning
,
the additional basis vector must rise above the z=0plane. However, we are still left with a choice of exactly which
vector among all those satisfying this requirement we are going to
add; based on this criterion, we can equally well add
,
,
,
etc., all rising
to |z|>0, as required. Given this indeterminacy, the choice is ours;
any one of these vectors will do just fine. It is often useful to
choose the basis vectors so that they are mutually orthogonal.
At the simplest, two vectors
and
are said to be
orthogonal if
;
the
information that one contains is entirely absent from the other, and
vice versa. With
and
already
chosen, the vector that is orthogonal to both is
,
with
arbitrary
.
To show this, denote the sought vector by
.
To be
orthogonal to both
and
,
must satisfy
The arbitrariness of
can be alleviated by the further
requirement, which is customary but not essential, to normalize basis
vectors to unit norm, making them orthonormal. The norm of
a vector can be defined in a number of ways; the most commonly used is
the so-called L2-norm, defined for a general column vector
by
An attribute of matrices that is very important in multivariate
statistics (as well as in most situations where matrices arise) is the
matrix rank. The simplest definition of a rank is the number of
independent columns or rows; the rank q of an
matrix
satisfies
.
But this is not very revealing or
satisfying. To better understand what this means and why it is
important, let us introduce now a new view of matrices, that of a
matrix as a map.
When an
matrix operates on (pre-multiplies) a vector (as
in
,
where
operates on
)
the vector must
be a column vector of the same length as
's rows;
.
That is,
is a vector
from
;
.
But in general
,
which is
most often the case with data matrices (for reasons we will explore
later). Consequently,
;
.
That is, the outcome vector
is not from
,
but rather from
.
Thus we say that
maps
vectors to
ones. For example, if
is
and
is
,
will be
;
.
Thus a matrix is associated with 2 fundamental spaces,
that of its rows (the row-space, where vectors it operates on come
from), and that of its columns (the column-space, where vectors that
result from its operation on row-space vectors come from). That the
resultant vectors
are from
's column-space can be
seen by noting that
Now comes a slightly tricky twist. We have established above that the
column- and row-spaces of
are associated with
and
,
respectively. However, the pairs need not
coincide! that is, the column-space may simply be
,
but
it doesn't have to be; it can be a subspace of
.
Similarly, the row-space may be either the full
,
or a
subspace thereof. Let's clarify this with an example. Consider an
given by
Next, we want to find out if
's column-space is a
3-dimensional subspace of
,
or an even lower-dimensional
subspace thereof. Recall that for
's column-space to be
,
's 3 columns must be linearly independent. This is
the case when no non-trivial
,
and
can be
found that satisfy
A matrix whose
q<min(M,N) is said to be rank-deficient; if
q=min(M,N), the matrix is full-rank. The meaning of these
terms becomes clearer when
is considered in the context of a
set of coupled linear algebraic equations, like the set we have just
solved [for
].
Then, the system
is a system of M equations with N unknowns. The
question is whether there are enough equations (constraints) to fully
determine
(a related, and equally important, question is
whether or not the system is consistent; we will get to it
later). Let's consider again the same
,
but introduce a rhs
(right hand side) vector. To emphasize the equation set, I will write
it out explicitly;
The first vector,
,
solves the equations;
However, that's not the end of the story. The vector
Now let's contrast the above situation with
The number of independent columns (or rows) of an arbitrary
is a very fundamental question, one that has a number of ways to
answer (all of which, of course, will lead to the same conclusion!) It
is also a question that is somewhat ambiguous in practice. To see why,
recall that linearly dependent vectors are parallel. In simple
examples, vectors are either mutually parallel or not; there is no
in-between. Noisy data matrices, on the other hand, can have columns
that are almost parallel, but not quite. Then, the question is a more
subtle one. The conditioning number of a matrix (which we will
define later) is meant to answer the question, but in a somewhat
subjective way. We need to know more about matrices in order to fully
appreciate this dilemma.
So for now, we conclude this brief discussion by noting that the rank
of an
matrix is the number of independent rows or columns;
if
q=min(M,N),
is full-rank. Otherwise [
q<min(M,N)],
is rank-deficient. Rank-deficiency implies the presence of
a nontrivial null-space in the appropriate space [the smaller of
(M,N)]. If
,
the larger of (M,N) must have a non-trivial
null-space, because
's highest possible rank is min(M,N).
With the rank q defined, and the distinction between rank-deficient
and full-rank matrices established, we are now ready to introduce the
all-important inverse of a square, full-rank, matrix (to be
clear: only full-rank square matrices have an inverse in the sense we
use here). The inverse is denoted by
,
and