modelutil: Writing Flexible and Robust Models
As is the case for many software packages, balancing
model reliability and flexibility is a
critical issue for climate models.
In climate models,
flexibility is commonly an issue when we desire a single
code set to support multiple domains (e.g. 1-D, 2-D and 3-D models).
Reliability problems often come in difficulties
with variable management.
In this section we describe how to write models based on the
modelutil framework that seek to optimize both
flexibility and reliability.
We would like to write model code so that it can be used in a
variety of circumstances, ranging from 1-D single column models
to 3-D domains, from localized regions to global domains, without
the addition of pre-processor directives and other cluttering
modelutil package framework provides a few
methods and functions that coupled with Numpy/numarray array syntax
enable this to be done straightforwardly.
The first requirement for coding to a generalized model domain is
to write expressions using array notation rather than
loops. The resulting code will work regardless of the rank of the
variables passed into it. For example, say you have rank 1
b each with
elements, and you want the sum of
c. Using loops, you would write
(all code below assumes an
import modelutil.field as field
has already been executed):
c = field.zeros(n) for i in range(n): c[i] = a[i] + b[i]
In array notation, the above code becomes:
c = a + b
The second form, however, will work regardless of the rank of
b, whereas the
loop form (sometimes called the "unrolled" form)
will only work if
b are rank 1.
The unrolled form has one distinct advantage in that it
makes element-wise conditional testing very straightforward;
you just add
if statements inside the loop.
How then to do element-wise testing in array notation?
Masks are one solution.
As an example, say you wanted
c to be set to
a is less than 5 and
To do so, first create masks of where this is true
and where it is not true, using 1.0 for
and 0.0 for
a_lt_5 = field.where(a < 5.0, 1.0, 0.0) a_ge_5 = field.logical_not(a_lt_5).astype(N.Float)
c then is:
c = (a_lt_5 * 1.5) + (a_ge_5 * (a + b))
Again, since these calculations are done in array notation, this
code will work with variables of any rank.
(Note the masks are set to
floating type so there is no type-conversion issues and
numarray is aliased as
method provides a method of slicing
Field objects by selecting elements based on the value
along a dimension rather than the index along a dimension.
For instance, say you have a
var with a shape dimensioned
(level, latitude, longitude). The
Field objects for
the dimensions have
lon, respectively, and are all stored in the
Using array slicing syntax, if you wanted the slice at latitude
60N and knew from searching
dims["lat"]=60, the syntax to set this
slice to variable
sub1 would be:
sub1 = Field(var[:,9,:])
Recall that when a
Field objects uses array slicing syntax
a Numpy/numarray data array is returned.
Thus, if you want
sub to be a
you have to create one initialized by the slice. Note though all the
metadata attributes in
sub1 are empty.
But say you didn't know which dimension was latitude? This is commonly the case if you want your model to work for a domain with an arbitrary number of spatial dimensions. The above slicing operation then may be accomplished by the following call:
sub1 = var.slice_fixed(dims, lat=60)
Note that the metadata (including
sub1 produced by
slice_fixed correctly describes the slice.
slice_fixed doesn't work only for a fixed dimension
value but also for slicing "across" different values of dimensions.
For instance, say
latval was an array dimensioned
(level, longitude) that contained various values of latitude you
wanted to obtain at each (level, longitude) location.
sub2 would be:
sub2 = var.slice_fixed(dims, lat=latval)
slice_fixed can also slice multiple dimensions by adding
the slicing values in those dimensions as keywords.
method functions like the
slice_fixed method except that it replaces the
elements selected by
slice_fixed with values from
another array or a scalar. Together these two methods allow you
to manipulate subarrays of a
Field variable without
knowing ahead of time the dimension structure of the variable.
Say you have three
all the same units and
ids that are the
same as the variable name.
In one model instance,
are scalars while
c's data is a rank 2 array. Then:
d = a + b + c d.replace_all_meta(c) d.id = "d"
will result in
is also rank 2, the same shape as
c, and with
the correct metadata (assuming
d.id is also
But let's say in another model instance
are scalar while
b's data is a rank 3 array.
might occur because the new model is a 3-D model whereas the previous
model was a 2-D model. This type of situation can also occur if you
decide that while
b was a constant in the 2-D model, in
this 3-D model it varies in space.)
The above code would give incorrect metadata,
since in this second case
is a rank 3 array of the same shape as
b, but the code
assigns it the axes metadata for a scalar
function in the
field module solves this problem by
copying and returning the metadata of the
with the "largest" data array, which is the shape of the
from field import copymeta_largest d = a + b + c d.replace_all_meta( copymeta_largest(a,b,c) ) d.id = "d"
will result in metadata for
the first variable on the righthand-side
that has the shape of
Numpy/numarray array operations
Field operations are based)
automatically upcast the size of the result array to
the appropriate size to hold the results.
copymeta_largest will only work for this purpose
if all the
Field variables involved are conformable.
Otherwise, the array operations on the data are undefined.
Because variables in a model undergo many transformations, it's wise
to code in such a way that when metadata needs to be updated in a
derived variable, the metadata is obtained from an input variable
rather than written by hand.
For instance, consider
var2 that have the same shape and units but
ids. If in a function
is already defined and
var2 is calculated, set
most of the
var2 metadata by using
setting by hand only those attributes that are different:
var2.replace_all_meta( var1.copymeta() ) var2.id = "variable2"
Note though to be careful to make sure you make the new variable's
id different from the
id of the variable
the metadata was copied from. This is important since it's easy
to copy metadata from one
Field to another, forget to
id, and then add the new
State that includes the metadata source
Field, overwriting that original
One way of preventing some cases of this problem
is to add new
Fields all at once
add; then at least the
add method will check that all the
in the argument list are unique.
In the past, most climate models were written in Fortran. Fortran, through common blocks, makes altering variables in memory trivially easy. This facility is tremendously useful to scientists who would like global access to model variables, but makes models brittle and less reliable.
A modeling framework should enable the user to access a maximum
number of variables at all model/sub-model levels while ensuring
variables are not accidentally overwritten and altered.
modelutil models, examples of how these goals are
GenericModelto enable one model to access another model's variables,
subsetto add and group subsets of
meta_ok_to_interfacemethod (using sub-model
InterfaceMetaobjects as input), coupled with object-oriented encapsulation, to ensure models do not overwrite variables when they are not supposed to.