modelutil
: Writing Flexible and Robust
Models
As is the case for many software packages, balancing
model reliability and flexibility is a
critical issue for climate models.
In climate models,
flexibility is commonly an issue when we desire a single
code set to support multiple domains (e.g. 1-D, 2-D and 3-D models).
Reliability problems often come in difficulties
with variable management.
In this section we describe how to write models based on the
modelutil
framework that seek to optimize both
flexibility and reliability.
We would like to write model code so that it can be used in a
variety of circumstances, ranging from 1-D single column models
to 3-D domains, from localized regions to global domains, without
the addition of pre-processor directives and other cluttering
constructs.
The modelutil
package framework provides a few
methods and functions that coupled with Numpy/numarray array syntax
enable this to be done straightforwardly.
The first requirement for coding to a generalized model domain is
to write expressions using array notation rather than for
loops. The resulting code will work regardless of the rank of the
variables passed into it. For example, say you have rank 1
Field
variables
a
and b
each with n
elements, and you want the sum of a
and b
to be c
. Using loops, you would write
(all code below assumes an import modelutil.field as field
has already been executed):
c = field.zeros(n) for i in range(n): c[i] = a[i] + b[i]
In array notation, the above code becomes:
c = a + b
The second form, however, will work regardless of the rank of
a
and b
, whereas the for
loop form (sometimes called the "unrolled" form)
will only work if a
and b
are rank 1.
The unrolled form has one distinct advantage in that it
makes element-wise conditional testing very straightforward;
you just add if
statements inside the loop.
How then to do element-wise testing in array notation?
Masks are one solution.
As an example, say you wanted c
to be set to
1.5 where a
is less than 5 and a+b
otherwise.
To do so, first create masks of where this is true
and where it is not true, using 1.0 for True
and 0.0 for False
.
a_lt_5 = field.where(a < 5.0, 1.0, 0.0) a_ge_5 = field.logical_not(a_lt_5).astype(N.Float)
c
then is:
c = (a_lt_5 * 1.5) + (a_ge_5 * (a + b))
Again, since these calculations are done in array notation, this
code will work with variables of any rank.
(Note the masks are set to
floating type so there is no type-conversion issues and
I assume numarray
is aliased as N
.)
The slice_fixed
method provides a method of slicing
Field
objects by selecting elements based on the value
along a dimension rather than the index along a dimension.
For instance, say you have a Field
object
var
with a shape dimensioned
(level, latitude, longitude). The Field
objects for
the dimensions have id
s lev
, lat
,
and lon
, respectively, and are all stored in the
dims
State
object.
Using array slicing syntax, if you wanted the slice at latitude
60N and knew from searching dims["lat"]
that dims["lat"][9]=60
, the syntax to set this
slice to variable sub1
would be:
sub1 = Field(var[:,9,:])
Recall that when a Field
objects uses array slicing syntax
a Numpy/numarray data array is returned.
Thus, if you want sub
to be a Field
object,
you have to create one initialized by the slice. Note though all the
metadata attributes in sub1
are empty.
But say you didn't know which dimension was latitude? This is commonly the case if you want your model to work for a domain with an arbitrary number of spatial dimensions. The above slicing operation then may be accomplished by the following call:
sub1 = var.slice_fixed(dims, lat=60)
Note that the metadata (including extra_meta.axis
)
for sub1
produced by
slice_fixed
correctly describes the slice.
slice_fixed
doesn't work only for a fixed dimension
value but also for slicing "across" different values of dimensions.
For instance, say latval
was an array dimensioned
(level, longitude) that contained various values of latitude you
wanted to obtain at each (level, longitude) location.
That slice sub2
would be:
sub2 = var.slice_fixed(dims, lat=latval)
slice_fixed
can also slice multiple dimensions by adding
the slicing values in those dimensions as keywords.
The
slice_fixed_setdata
method functions like the
slice_fixed
method except that it replaces the
elements selected by slice_fixed
with values from
another array or a scalar. Together these two methods allow you
to manipulate subarrays of a Field
variable without
knowing ahead of time the dimension structure of the variable.
Say you have three Field
variables
a
, b
, and c
,
all the same units and id
s that are the
same as the variable name.
In one model instance, a
and b
are scalars while c
's data is a rank 2 array. Then:
d = a + b + c d.replace_all_meta(c) d.id = "d"
will result in Field
variable d
that
is also rank 2, the same shape as c
, and with
the correct metadata (assuming d.id
is also
"d"
).
But let's say in another model instance a
and c
are scalar while b
's data is a rank 3 array.
(This situation
might occur because the new model is a 3-D model whereas the previous
model was a 2-D model. This type of situation can also occur if you
decide that while b
was a constant in the 2-D model, in
this 3-D model it varies in space.)
The above code would give incorrect metadata,
since in this second case d
is a rank 3 array of the same shape as b
, but the code
assigns it the axes metadata for a scalar c
.
The copymeta_largest
function in the field
module solves this problem by
copying and returning the metadata of the Field
variable
with the "largest" data array, which is the shape of the
result d
. Thus:
from field import copymeta_largest d = a + b + c d.replace_all_meta( copymeta_largest(a,b,c) ) d.id = "d"
will result in metadata for d
from
the first variable on the righthand-side
(a
, b
, or c
)
that has the shape of d
.
Numpy/numarray array operations
(on which Field
operations are based)
automatically upcast the size of the result array to
the appropriate size to hold the results.
Note copymeta_largest
will only work for this purpose
if all the Field
variables involved are conformable.
Otherwise, the array operations on the data are undefined.
Because variables in a model undergo many transformations, it's wise
to code in such a way that when metadata needs to be updated in a
derived variable, the metadata is obtained from an input variable
rather than written by hand.
For instance, consider
the Field
variables var1
and
var2
that have the same shape and units but
different id
s. If in a function var1
is already defined and var2
is calculated, set
most of the var2
metadata by using var1
,
setting by hand only those attributes that are different:
var2.replace_all_meta( var1.copymeta() ) var2.id = "variable2"
Note though to be careful to make sure you make the new variable's
id
different from the id
of the variable
the metadata was copied from. This is important since it's easy
to copy metadata from one Field
to another, forget to
change the id
, and then add the new Field
to a State
that includes the metadata source
Field
, overwriting that original Field
.
One way of preventing some cases of this problem
is to add new Field
s all at once
to a State
using add
; then at least the
add
method will check that all the Field
s
in the argument list are unique.
In the past, most climate models were written in Fortran. Fortran, through common blocks, makes altering variables in memory trivially easy. This facility is tremendously useful to scientists who would like global access to model variables, but makes models brittle and less reliable.
A modeling framework should enable the user to access a maximum
number of variables at all model/sub-model levels while ensuring
variables are not accidentally overwritten and altered.
In modelutil
models, examples of how these goals are
accomplished include:
asStateSet
method of GenericModel
to enable one model to access another model's variables,State
object methods
add
,
copysubset
,
and
subset
to add and group subsets of Field
variables,meta_ok_to_interface
method (using sub-model
InterfaceMeta
objects as input),
coupled with object-oriented encapsulation, to ensure models do not
overwrite variables when they are not supposed to.