Introduction to Spatio-temporal Data Analysis Methods
A book to be published by the Princeton University Press

Gidon Eshel
Department of Geophysical Sciences
University of Chicago
5734 S. Ellis Ave.
Chicago, IL 60637
Tel: (773) 702-0440
Fax: (773) 702-9505
Email: geshel@uchicago.edu, geshel@gmail.com
Home: http://geosci.uchicago.edu/~gidon

The book covers a number of mostly linear data analysis methods that are widely used in such diverse fields as geophysics, climatology, meteorology, ecology, among others. While broadly applied, to my knowledge these methods have never found an adequate home in an introductory-level book that is readily accessible by beginning or even advanced graduate students, or professional researchers with backgrounds in fields other than applied mathematics. In fact, while the mathematical methods underlying those data analysis methods are discussed in countless volumes, very few books expressly address their application to data analysis in the above-mentioned fields (and the applications are not always trivially related to their algebraic origins); in climatology and meteorology (the area I am most familiar with), e.g., there is only one such book. Given the relatively technical tone of these books, it is not entirely surprising that they have not enjoyed a wide following, and are not often consulted. Thus the proposed book is meant to occupy a particular niche that is currently an obvious void in the existing physical science data analysis literature, the application-centered, user-friendly yet thorough introduction to analysis methods for data that vary in both time and space.

Book Outline:

Matlab Introduction: When combined with today's computers, modern upper-level, interpreted computer languages offer unparalleled accessibility to computational power. This can be an invaluable pedagogical tool for mathematics students, as mathematical ideas are clearly best learned by hands-on trial-and-error. In this book I relay heavily on this tool. For uniformity and usability, this requires the use of single language throughout. The language chosen for this book is Matlab, which is rapidly becoming the unofficial global standard for the practice of linear algebra. In addition to Matlab's ubiquity, which will substantially broaden the book's appeal, the choice of Matlab is based on the fact that it is offered to students at a deep discount that makes it reasonably affordable.

The uniformity of computing language augments the discussion of examples in the text, by removing ambiguity; when bug-free, efficient and transparent code that solves a given a problem is available to the reader (with numerous explanatory comments), there can be no lingering confusion as to exactly which equation is appropriate for the case at hand, or why.

For completeness, this reliance on Matlab requires an introduction to its use and basic principles. Some of this Introduction is already written (see http://geosci.uchicago.edu/~gidon/geosci236/organize/matlab.html), but it will need to be expanded somewhat. However, given the numerous excellent Matlab resources now available both online and in book form, for more technical topics the reader will be referred to existing Matlab references.

Introductory Linear Algebra: The book continues with several optional introductory linear algebra chapters that bring the less mathematically savvy reader to a level of mathematical-and specifically algebraic-sophistication that is required for the successful use of later, more applied and specific, chapters. Topics covered include matrices and vectors; linear vector spaces; definition of permissible operations; fundamental (row, column) spaces of a matrix; matrix rank, rank-deficiency and invertability; left and right nullspaces; Gaussian elimination and systems of coupled linear algebraic equations; various matrix decompositions; Gram-Schmidt Orthogonalization; eigenvalues and eigenvectors; matrix representation of systems of coupled ordinary differential equations and difference equations; asymptotic stability; and transient (non-asymptotic) stability associated with non-self-adjointness (non-normality). Most of this knowledge is brought together in the concluding discussion of this part of the book, the presentation, explanation and pictorial representation of the fundamental theorem of linear algebra (see http://geosci.uchicago.edu/~gidon/geosci236/fundam/index.html) While this theorem is invariably outlined in linear algebra texts, its power to tie together logically and coherently most topics of linear algebra is often untapped.

Introduction to Basic Statistical Ideas: The next chapter addresses a few basic statistical ideas that repeatedly arise in data analysis. Since these topics are very well covered by numerous excellent texts, no attempt is made here to be particularly original or exhaustive, but simply to make the book more self-contained. Topics covered are probability and probability density functions, significance of and confidence in empirical results, traditional parametric tests, and non-parametric significance tests and Monte-Carlo tests. In addition, some timeseries material is also covered, including stochastic vs. deterministic signals, autocovariance and autocorrelation functions, derivation of theoretical autocorrelation functions for specific processes, the Yule-Walker equations and harmonic and Fourier analyses.

Empirical Orthogonal Function and Singular Value Decomposition Analyses: This section of the book is devoted to the 2 principal methods for the analysis of spatiotemporal data, the so-called Empirical Orthogonal Function (EOF) Analysis and the Singular Value Decomposition (SVD) Analysis. The discussion begins with motivating each analysis method in the context of several specific examples, and proceeds to a detailed discussion of the algebra and code used in the solutions. A crucial aspect of both of these analyses is the truncation problem of singular spectra. While absolutely essential for the results, this problem-and the related one of the significance of individual modes-are rarely addressed in a thorough and adequate manner. The proposed book first offers a somewhat detailed discussion of the traditional so-called Rule-N truncation. It then proceeds to describe some problems and shortcomings of rule-N in practice. Finally, an alternative based on Monte-Carlo simulation is proposed and described in full details, including real-data comparison of truncations based on either of the methods.

Advanced Standard Methods: These chapters discuss methods that are directly derived from EOF and SVD analyses but are less commonly used. Those include Extended EOF Analysis, presented as the identification of 'favored phase-space trajectories', joint EOF analysis of several fields, Rotated EOFs and various rotation methods, and Singular Spectrum Analysis. In keeping with the book's spirit, the mathematical discussion of each method is followed by several specific real data examples that are solved with provided code segments.

Advanced Non-Standard Methods: The final chapter introduces less common analytic methods, principally the use of multi-dimensional regression techniques for statistical forecasting of spatiotemporal phenomena. Following the formal introduction of the topics, exhaustive discussions are offered on the need for statistical forecasting for deterministic dynamical systems, the adequacy, strengths and shortcomings of statistical forecasting, the various objections raised against it, and the circumstances under which those objections are justified.

Table of Content

Part I: Preliminaries

Part II: Geophysical Data Analysis