GeoSci 236: Empirical Orthogonal Functions

Gidon Eshel
491 Hinds
Dept. of the Geophysical Sciences,
5734 S. Ellis Ave., The Univ. of Chicago,
Chicago, IL 60637
(773) 702-0440, geshel@uchicago.edu

One of the most ubiquitous uses of eigenanalysis in data analysis is the construction of EOFs, the topic of this section. EOFs are a transform of the data; the original set of numbers is transformed into a different set with some desirable properties. In this sense the EOF transform is similar to other transforms such as the Fourier or Laplace transforms. In all these cases, we project the original data onto a set of orthogonal functions, thus replacing the original data with the set of projection coefficients on the basis vectors. However, the choice of the specific basis set varies from case to case. In the Fourier case, for example, the choice is a set of sines and cosines of various frequencies. This is motivated by the desire to identify the principal modes of oscillation of the system. Thus if the signal projects strongly on sine waves of 2 frequencies, we will say that the signal is approximately the linear combination of these 2 frequencies. We will then attribute the remainder to other processes that are more weakly represented in the signal (the signal has low projection on them), and are thus assumed unimportant for the signal. Another important property for a basis is orthogonality (like sines or various frequencies); we would like to account for a certain component of the signal only once. An alternative to the sine/cosine set is a set of orthogonal polynomials, such as those named after Legendre. (Orthogonality often holds only over a specific interval, and sometime requires `weighting functions'. These are related to the choice of metric, which we will talk about a bit.) The representation of the signal in terms of the projection coefficients on a basis set is often very useful at separating cleanly various scales. For example, if our data is the sea surface temperature of a given ocean basin, we can think of the projection on the lowest frequency wave (the one which has one crest and one trough within the spatial extent of the domain) as representing the ocean's `large-scale', while that on wavelengths of order 10-100 km as `eddies'.

In EOF analysis we also project the original data on a set of orthogonal basis vectors. However, the choice of the basis is different. Here, the first EOF is chosen to be the pattern, without the constraint of a particular analytic form, on which the data project most strongly. In other words, the leading EOF (sometime called the `gravest', or `leading', mode) is the pattern most frequently realized. The second mode is the one most commonly realized under the constraint of orthogonality to the first one, the third is the most frequently realized pattern that is orthogonal to both higher modes, and so on. Hence the term `empirical'; we still have an orthogonal basis, like the Fourier or Legendre bases, but whose members are not chosen based on analytic considerations, but based on maximization of the projection of the data on them.

Often data matrices have 2 distinct dimensions that correspond to different physical units. For example, suppose we have ${\bf
A}_{M\times N}$ representing monthly surface air temperatures along the
35$^\circ$N parallel at fixed spatial intervals over N/12 years. The column M-vector ${\bf a}_j$ comprising all space points at time j is ${\bf A}$'s jth column, $j=1,2\cdots N$ for the N times. Using the SVD representation ${\bf A}={\bf U\Sigma V}^T$ we get the modes of ${\bf A}$; ${\bf U}$'s columns are ${\bf A}$'s EOFs, while ${\bf V}$'s columns are the corresponding `principal components'. (For this reason, in some fields the exact same analysis is called `principal component analysis'.) Given the above arrangement of ${\bf A}$, with time running along the rows and space running along the columns (which is a very common convention), ${\bf U}$'s columns span ${\bf A}$'s column space, which corresponds to the spatial dimension. They are ${\bf A}$'s EOFs. Similarly, ${\bf V}$'s columns span ${\bf A}$'s row space, which corresponds to the timeseries at a given spatial location. Because the modes are arranged in descending order ( $\sigma_i>\sigma_{i+1}$), ${\bf u}_1$, ${\bf U}$'s 1st column, is the spatial pattern most frequently realized, the 2nd is the spatial pattern orthogonal to ${\bf U}$'s 1st column that is most frequently realized, and so on.

Examples will clarify matters. Consider the signal

\begin{displaymath}f(x,y) =
A\; cos\left(\frac{2\pi x}{196}\right)\; cos\left(\...
...{2\pi x}{ 98}\right)\; cos\left(\frac{2\pi y}{ 50}\right) +
\xi\end{displaymath}

with

\begin{displaymath}\begin{array}{rr}
A \sim N\left(0,0.7\right) & x = 0,1 \cdots...
...y = 0,1 \cdots 50 \\
\xi\sim N\left(0,0.1\right) & \end{array}\end{displaymath}


  
Figure 1: Four spatial patterns used to generate the combined synthetic signal discussed in the text.
\begin{figure}\centerline{\psfig{file=eof1.ps,width=5in}}
\end{figure}

The signal is thus a linear combination of (primarily) the first rhs term (because A's variance is 7 times larger than other additive terms), (some of) the 2nd rhs term, and unstructured noise $\xi(x,y)$. The 2 deterministic patterns are shown in Fig. 1, panels b and d. Note that they are mutually orthogonal (the cosines in both x and y are Fourier frequencies).

Now imagine 50 such f(x,y) fields (xy maps representing random combinations of the 2 patterns plus noise as given above), or a series of $51\times 98$ matrices ${\bf F}_i, \;\;i=1,2\cdots 50$. This is meant to simulate a geophysical situation in which a certain time-dependent field, say sea-level pressure, is generated by some known, deterministic, physics, plus other, low-amplitude, processes, collectively represented here as $\xi(x,y,t)$. Given 50 realizations of this process, we want to identify the dominant spatial patterns of ${\bf F}$, or, put differently, the spatial structures of ${\bf F}$that are most frequently realized.

To identify these structures, we first make the 3-dimensional array ${\bf F}(t)=\left\{ F_{ij}^k \right\}$ (where i is the latitude index, j is the longitude index, and k is the time index) into a 2-dimensional array (matrix), by storing an entire field in one column vector. That is

\begin{displaymath}{\bf A}=\left(\begin{array}{rrrr} \vdots & \vdots & & \vdots ...
... {\bf a}_{50} \\ \vdots & \vdots & & \vdots
\end{array}\right),\end{displaymath}

where each of ${\bf A}$'s columns, ${\bf a}_k$, comprises the Fijk for all i and j of a given k. The order of the reshaping of each of the ${\bf F}^k$ matrices into a single column vector is not important (MATLAB does it column-wise, but the row-wise reshaping is also perfectly legitimate, as long as you remember how to undo it later. Let's choose to do it the MATLAB (column-wise) way, which, with

\begin{displaymath}{\bf F}^k=\left(\begin{array}{rrrr}
\vdots & \vdots & & \vdo...
...f}_N^1 & {\bf f}_N^2 & \cdots & {\bf f}_N^K
\end{array}\right).\end{displaymath}

Now all the information we have about ${\bf F}$is condensed into a single matrix, ${\bf A}$. If we next use ${\bf A}$'s SVD representation, ${\bf A}={\bf U\Sigma V}^T$, and reshape ${\bf U}$'s columns in a manner exactly opposite to the one we employed while forming ${\bf A}$ from ${\bf F}$, we get ${\bf F}$'s EOFs. The kth EOF is in ${\bf u}^k$, ${\bf U}$'s kth column;

\begin{displaymath}{\bf E}_k=\left(\begin{array}{llll} u_1^k & u_{M+1}^k & \cdot...
...s & \\ u_M^k & u_{2M}^k & \cdots & u_{MN}^k
\end{array}\right).\end{displaymath}

Note that the actual numbers (${\bf E}$'s elements) have essentially no interpretable meaning; it is only their relative magnitudes, and signs, that identify the spatial structures of potentially important physical processes.


  
Figure 2: The 3 leading spatial modes (EOFs) of 2 signals. The left panels are for the signal comprising cosines only. The right panels show the EOFs of the signal with both cosine and exponential terms, as described in the text.
\begin{figure}\centerline{\psfig{file=eof2.ps,width=5in}}
\end{figure}

To demonstrate the method in action, let's use the 50 fields of the above f(x,y,t), generated from the patterns shown in Fig. 1b and d. Fig. 2a, b and c show the 3 leading EOFs of the cosine signal. Note how both generating patterns are well reproduced by the method (the leading 2 patterns), despite the noise and the random blending of the the signals by the amplitudes A and B. Note also the sign reversal, which is completely immaterial - the singular values (and hence the EOFs and Principal Components) are all known to within a sign, as they are the square root of the eigenvalues of ${\bf AA}^T$ and ${\bf A}^T{\bf
A}$. Clearly, the 3rd pattern, a lot like a Jackson Pollock painting, is structureless noise.

A possible critique of the previous example is that we made the method's job particularly (and artificially) easy by using 2 mutually orthogonal generating patterns. This can be fair - if the method is designed to turn arbitrary signals into an orthogonal decomposition of those signals, the real test of the method is with non-orthogonal signals.

To test this, we use the signal

\begin{displaymath}\begin{array}{lcl} f(x,y) & = & A\; cos\left(\frac{2\pi
x}{19...
...(-\frac{(X-79)^2}{50} - \frac{(Y-35)^2}{40} \right)
\end{array}\end{displaymath}

with $C\sim D\sim N\left(0,0.1\right)$. While this expression certainly looks hideously complicated, it is, in fact, simply a noisy random blend of the four patterns of Fig. 1. Thus the only difference from the previous EOF analysis is that now we add both orthogonal and non-orthogonal components, to address the possible weakness of our previous analysis. Fig. 2d-f show that the method functions well even when the input signal is not artificially orthogonal. The 2 leading modes are nicely reproduced with good fidelity (compared with the generating patterns). The 3rd, while clearly structured (unlike the case with the 2 orthogonal cosines), is a combination of the generating patterns, not an individual pattern. This is the consequence of the non-orthogonality of the exponential generating patterns. Since they are not orthogonal either to each other or to the cosines, they project on them, resulting in the blend shown in Fig. 2f.


  
Figure 3: The singular spectra of the 2 synthetic cases discussed in the text. The empty blue diamonds are for the cosines-only case, while the solid red circles are for the case with all 4 generating patterns.
\begin{figure}\centerline{\psfig{file=eof3.ps,width=4in}}
\end{figure}

It is always extremely important to examine the fraction of the total variance the various modes account for. For the 2 synthetic cases above, this is shown in Fig. 3. (Note that only the leading 9 are shown, of of the 50 total. The rest are very near zero in both cases.) The cosine signals are very similar in both (modes 1 and 2). Higher modes differ. In the cosine only, where the only reminder is noise, it is roughly equally distributed over the entire spectrum. Conversely, in the case of the added exponentials, the remainder has 2 structured modes (the 2 exponential terms, and indeed the singular values 3 and 4 are distinct from zero. The rest, just like in the pure cosine case, are statistically indistinguishable from zero.


  
Figure 4: Leading mode of the winter 700 mb geopotential height anomalies over the Atlantic sector during the indicated period.
\begin{figure}\centerline{\psfig{file=f1.ps,width=5.6in}}
\end{figure}

Note that the decrease in amplitude with mode number (the falloff of the singular spectrum) is a property of the analysis, and does not always contain useful information. There is a substantial body of literature about the issue of the appropriate cutoff of the singular spectrum, beyond which, it is assumed, there is little or no useful information. The most commonly used cutoff rule in geophysics is the so-called `Rule N', which basically retains only those modes whose amplitudes stand out above the population of singular spectra extracted from a large number of synthetic matrices of the same dimensions as the one being tested. We will not treat this issue here.


  
Figure 5: The singular spectrum of the observed winter (DJF) 700 mb geopotential height anomalies between 1958 and 2000.
\begin{figure}\centerline{\psfig{file=svZ7.ps,width=5.6in}}
\end{figure}

Finally, Fig. 4 shows an example of a real-life use of EOFs. Height anomalies within this domain (which encompasses $\sim$5000 grid points) obviously display a very rich spectrum of variability in time and space. And yet, when piped through the EOF algorithm, a very clear and coherent large scale structure emerges. This information is corroborated by Fig. 5, where the singular spectrum falloff clearly singles out the gravest mode as substantially more important than the 2nd mode, accounting for approximately twice as much variance.