modelutil: Writing Flexible and Robust Models


As is the case for many software packages, balancing model reliability and flexibility is a critical issue for climate models. In climate models, flexibility is commonly an issue when we desire a single code set to support multiple domains (e.g. 1-D, 2-D and 3-D models). Reliability problems often come in difficulties with variable management. In this section we describe how to write models based on the modelutil framework that seek to optimize both flexibility and reliability.

Coding to a Generalized Model Domain

We would like to write model code so that it can be used in a variety of circumstances, ranging from 1-D single column models to 3-D domains, from localized regions to global domains, without the addition of pre-processor directives and other cluttering constructs. The modelutil package framework provides a few methods and functions that coupled with Numpy/numarray array syntax enable this to be done straightforwardly.

Using Array Notation

The first requirement for coding to a generalized model domain is to write expressions using array notation rather than for loops. The resulting code will work regardless of the rank of the variables passed into it. For example, say you have rank 1 Field variables a and b each with n elements, and you want the sum of a and b to be c. Using loops, you would write (all code below assumes an import modelutil.field as field has already been executed):

c = field.zeros(n)
for i in range(n):
    c[i] = a[i] + b[i]

In array notation, the above code becomes:

c = a + b

The second form, however, will work regardless of the rank of a and b, whereas the for loop form (sometimes called the "unrolled" form) will only work if a and b are rank 1.

The unrolled form has one distinct advantage in that it makes element-wise conditional testing very straightforward; you just add if statements inside the loop. How then to do element-wise testing in array notation? Masks are one solution.

As an example, say you wanted c to be set to 1.5 where a is less than 5 and a+b otherwise. To do so, first create masks of where this is true and where it is not true, using 1.0 for True and 0.0 for False.

a_lt_5 = field.where(a < 5.0, 1.0, 0.0)
a_ge_5 = field.logical_not(a_lt_5).astype(N.Float)

c then is:

c = (a_lt_5 * 1.5) + (a_ge_5 * (a + b))

Again, since these calculations are done in array notation, this code will work with variables of any rank. (Note the masks are set to floating type so there is no type-conversion issues and I assume numarray is aliased as N.)

Generalized Slicing

The slice_fixed method provides a method of slicing Field objects by selecting elements based on the value along a dimension rather than the index along a dimension. For instance, say you have a Field object var with a shape dimensioned (level, latitude, longitude). The Field objects for the dimensions have ids lev, lat, and lon, respectively, and are all stored in the dims State object. Using array slicing syntax, if you wanted the slice at latitude 60N and knew from searching dims["lat"] that dims["lat"][9]=60, the syntax to set this slice to variable sub1 would be:

sub1 = Field(var[:,9,:])

Recall that when a Field objects uses array slicing syntax a Numpy/numarray data array is returned. Thus, if you want sub to be a Field object, you have to create one initialized by the slice. Note though all the metadata attributes in sub1 are empty.

But say you didn't know which dimension was latitude? This is commonly the case if you want your model to work for a domain with an arbitrary number of spatial dimensions. The above slicing operation then may be accomplished by the following call:

sub1 = var.slice_fixed(dims, lat=60)

Note that the metadata (including extra_meta.axis) for sub1 produced by slice_fixed correctly describes the slice.

slice_fixed doesn't work only for a fixed dimension value but also for slicing "across" different values of dimensions. For instance, say latval was an array dimensioned (level, longitude) that contained various values of latitude you wanted to obtain at each (level, longitude) location. That slice sub2 would be:

sub2 = var.slice_fixed(dims, lat=latval)

slice_fixed can also slice multiple dimensions by adding the slicing values in those dimensions as keywords.

The slice_fixed_setdata method functions like the slice_fixed method except that it replaces the elements selected by slice_fixed with values from another array or a scalar. Together these two methods allow you to manipulate subarrays of a Field variable without knowing ahead of time the dimension structure of the variable.

Mixing Scalars and Arrays

Say you have three Field variables a, b, and c, all the same units and ids that are the same as the variable name. In one model instance, a and b are scalars while c's data is a rank 2 array. Then:

d = a + b + c
d.replace_all_meta(c) = "d"

will result in Field variable d that is also rank 2, the same shape as c, and with the correct metadata (assuming is also "d").

But let's say in another model instance a and c are scalar while b's data is a rank 3 array. (This situation might occur because the new model is a 3-D model whereas the previous model was a 2-D model. This type of situation can also occur if you decide that while b was a constant in the 2-D model, in this 3-D model it varies in space.) The above code would give incorrect metadata, since in this second case d is a rank 3 array of the same shape as b, but the code assigns it the axes metadata for a scalar c.

The copymeta_largest function in the field module solves this problem by copying and returning the metadata of the Field variable with the "largest" data array, which is the shape of the result d. Thus:

from field import copymeta_largest
d = a + b + c
d.replace_all_meta( copymeta_largest(a,b,c) ) = "d"

will result in metadata for d from the first variable on the righthand-side (a, b, or c) that has the shape of d. Numpy/numarray array operations (on which Field operations are based) automatically upcast the size of the result array to the appropriate size to hold the results.

Note copymeta_largest will only work for this purpose if all the Field variables involved are conformable. Otherwise, the array operations on the data are undefined.

Coding So Metadata Propagates Correctly Throughout the Model

Because variables in a model undergo many transformations, it's wise to code in such a way that when metadata needs to be updated in a derived variable, the metadata is obtained from an input variable rather than written by hand. For instance, consider the Field variables var1 and var2 that have the same shape and units but different ids. If in a function var1 is already defined and var2 is calculated, set most of the var2 metadata by using var1, setting by hand only those attributes that are different:

var2.replace_all_meta( var1.copymeta() ) = "variable2"

Note though to be careful to make sure you make the new variable's id different from the id of the variable the metadata was copied from. This is important since it's easy to copy metadata from one Field to another, forget to change the id, and then add the new Field to a State that includes the metadata source Field, overwriting that original Field. One way of preventing some cases of this problem is to add new Fields all at once to a State using add; then at least the add method will check that all the Fields in the argument list are unique.

Variable Management

In the past, most climate models were written in Fortran. Fortran, through common blocks, makes altering variables in memory trivially easy. This facility is tremendously useful to scientists who would like global access to model variables, but makes models brittle and less reliable.

A modeling framework should enable the user to access a maximum number of variables at all model/sub-model levels while ensuring variables are not accidentally overwritten and altered. In modelutil models, examples of how these goals are accomplished include:

Return to modelutil Reference Manual Index.
Return to modelutil Home Page.

Valid HTML 4.01! Valid CSS! Updated: January 12, 2005 by Johnny Lin <email address>. License.