modelutil: Writing Flexible and Robust
Models
As is the case for many software packages, balancing
model reliability and flexibility is a
critical issue for climate models.
In climate models,
flexibility is commonly an issue when we desire a single
code set to support multiple domains (e.g. 1-D, 2-D and 3-D models).
Reliability problems often come in difficulties
with variable management.
In this section we describe how to write models based on the
modelutil framework that seek to optimize both
flexibility and reliability.
We would like to write model code so that it can be used in a
variety of circumstances, ranging from 1-D single column models
to 3-D domains, from localized regions to global domains, without
the addition of pre-processor directives and other cluttering
constructs.
The modelutil package framework provides a few
methods and functions that coupled with Numpy/numarray array syntax
enable this to be done straightforwardly.
The first requirement for coding to a generalized model domain is
to write expressions using array notation rather than for
loops. The resulting code will work regardless of the rank of the
variables passed into it. For example, say you have rank 1
Field variables
a and b each with n
elements, and you want the sum of a and b
to be c. Using loops, you would write
(all code below assumes an import modelutil.field as field
has already been executed):
c = field.zeros(n)
for i in range(n):
c[i] = a[i] + b[i]
In array notation, the above code becomes:
c = a + b
The second form, however, will work regardless of the rank of
a and b, whereas the for
loop form (sometimes called the "unrolled" form)
will only work if a and b are rank 1.
The unrolled form has one distinct advantage in that it
makes element-wise conditional testing very straightforward;
you just add if statements inside the loop.
How then to do element-wise testing in array notation?
Masks are one solution.
As an example, say you wanted c to be set to
1.5 where a is less than 5 and a+b otherwise.
To do so, first create masks of where this is true
and where it is not true, using 1.0 for True
and 0.0 for False.
a_lt_5 = field.where(a < 5.0, 1.0, 0.0) a_ge_5 = field.logical_not(a_lt_5).astype(N.Float)
c then is:
c = (a_lt_5 * 1.5) + (a_ge_5 * (a + b))
Again, since these calculations are done in array notation, this
code will work with variables of any rank.
(Note the masks are set to
floating type so there is no type-conversion issues and
I assume numarray is aliased as N.)
The slice_fixed
method provides a method of slicing
Field objects by selecting elements based on the value
along a dimension rather than the index along a dimension.
For instance, say you have a Field object
var with a shape dimensioned
(level, latitude, longitude). The Field objects for
the dimensions have ids lev, lat,
and lon, respectively, and are all stored in the
dims State object.
Using array slicing syntax, if you wanted the slice at latitude
60N and knew from searching dims["lat"]
that dims["lat"][9]=60, the syntax to set this
slice to variable sub1 would be:
sub1 = Field(var[:,9,:])
Recall that when a Field objects uses array slicing syntax
a Numpy/numarray data array is returned.
Thus, if you want sub to be a Field object,
you have to create one initialized by the slice. Note though all the
metadata attributes in sub1 are empty.
But say you didn't know which dimension was latitude? This is commonly the case if you want your model to work for a domain with an arbitrary number of spatial dimensions. The above slicing operation then may be accomplished by the following call:
sub1 = var.slice_fixed(dims, lat=60)
Note that the metadata (including extra_meta.axis)
for sub1 produced by
slice_fixed correctly describes the slice.
slice_fixed doesn't work only for a fixed dimension
value but also for slicing "across" different values of dimensions.
For instance, say latval was an array dimensioned
(level, longitude) that contained various values of latitude you
wanted to obtain at each (level, longitude) location.
That slice sub2 would be:
sub2 = var.slice_fixed(dims, lat=latval)
slice_fixed can also slice multiple dimensions by adding
the slicing values in those dimensions as keywords.
The
slice_fixed_setdata
method functions like the
slice_fixed method except that it replaces the
elements selected by slice_fixed with values from
another array or a scalar. Together these two methods allow you
to manipulate subarrays of a Field variable without
knowing ahead of time the dimension structure of the variable.
Say you have three Field variables
a, b, and c,
all the same units and ids that are the
same as the variable name.
In one model instance, a and b
are scalars while c's data is a rank 2 array. Then:
d = a + b + c d.replace_all_meta(c) d.id = "d"
will result in Field variable d that
is also rank 2, the same shape as c, and with
the correct metadata (assuming d.id is also
"d").
But let's say in another model instance a and c
are scalar while b's data is a rank 3 array.
(This situation
might occur because the new model is a 3-D model whereas the previous
model was a 2-D model. This type of situation can also occur if you
decide that while b was a constant in the 2-D model, in
this 3-D model it varies in space.)
The above code would give incorrect metadata,
since in this second case d
is a rank 3 array of the same shape as b, but the code
assigns it the axes metadata for a scalar c.
The copymeta_largest
function in the field module solves this problem by
copying and returning the metadata of the Field variable
with the "largest" data array, which is the shape of the
result d. Thus:
from field import copymeta_largest d = a + b + c d.replace_all_meta( copymeta_largest(a,b,c) ) d.id = "d"
will result in metadata for d from
the first variable on the righthand-side
(a, b, or c)
that has the shape of d.
Numpy/numarray array operations
(on which Field operations are based)
automatically upcast the size of the result array to
the appropriate size to hold the results.
Note copymeta_largest will only work for this purpose
if all the Field variables involved are conformable.
Otherwise, the array operations on the data are undefined.
Because variables in a model undergo many transformations, it's wise
to code in such a way that when metadata needs to be updated in a
derived variable, the metadata is obtained from an input variable
rather than written by hand.
For instance, consider
the Field variables var1 and
var2 that have the same shape and units but
different ids. If in a function var1
is already defined and var2 is calculated, set
most of the var2 metadata by using var1,
setting by hand only those attributes that are different:
var2.replace_all_meta( var1.copymeta() ) var2.id = "variable2"
Note though to be careful to make sure you make the new variable's
id different from the id of the variable
the metadata was copied from. This is important since it's easy
to copy metadata from one Field to another, forget to
change the id, and then add the new Field
to a State that includes the metadata source
Field, overwriting that original Field.
One way of preventing some cases of this problem
is to add new Fields all at once
to a State using add; then at least the
add method will check that all the Fields
in the argument list are unique.
In the past, most climate models were written in Fortran. Fortran, through common blocks, makes altering variables in memory trivially easy. This facility is tremendously useful to scientists who would like global access to model variables, but makes models brittle and less reliable.
A modeling framework should enable the user to access a maximum
number of variables at all model/sub-model levels while ensuring
variables are not accidentally overwritten and altered.
In modelutil models, examples of how these goals are
accomplished include:
asStateSet
method of GenericModel
to enable one model to access another model's variables,State object methods
add,
copysubset,
and
subset
to add and group subsets of Field variables,meta_ok_to_interface
method (using sub-model
InterfaceMeta objects as input),
coupled with object-oriented encapsulation, to ensure models do not
overwrite variables when they are not supposed to.