# `modelutil`: Writing Flexible and Robust Models

## Overview

As is the case for many software packages, balancing model reliability and flexibility is a critical issue for climate models. In climate models, flexibility is commonly an issue when we desire a single code set to support multiple domains (e.g. 1-D, 2-D and 3-D models). Reliability problems often come in difficulties with variable management. In this section we describe how to write models based on the `modelutil` framework that seek to optimize both flexibility and reliability.

## Coding to a Generalized Model Domain

We would like to write model code so that it can be used in a variety of circumstances, ranging from 1-D single column models to 3-D domains, from localized regions to global domains, without the addition of pre-processor directives and other cluttering constructs. The `modelutil` package framework provides a few methods and functions that coupled with Numpy/numarray array syntax enable this to be done straightforwardly.

### Using Array Notation

The first requirement for coding to a generalized model domain is to write expressions using array notation rather than `for` loops. The resulting code will work regardless of the rank of the variables passed into it. For example, say you have rank 1 `Field` variables `a` and `b` each with `n` elements, and you want the sum of `a` and `b` to be `c`. Using loops, you would write (all code below assumes an `import modelutil.field as field` has already been executed):

```c = field.zeros(n)
for i in range(n):
c[i] = a[i] + b[i]
```

In array notation, the above code becomes:

```c = a + b
```

The second form, however, will work regardless of the rank of `a` and `b`, whereas the `for` loop form (sometimes called the "unrolled" form) will only work if `a` and `b` are rank 1.

The unrolled form has one distinct advantage in that it makes element-wise conditional testing very straightforward; you just add `if` statements inside the loop. How then to do element-wise testing in array notation? Masks are one solution.

As an example, say you wanted `c` to be set to 1.5 where `a` is less than 5 and `a+b` otherwise. To do so, first create masks of where this is true and where it is not true, using 1.0 for `True` and 0.0 for `False`.

```a_lt_5 = field.where(a < 5.0, 1.0, 0.0)
a_ge_5 = field.logical_not(a_lt_5).astype(N.Float)
```

`c` then is:

```c = (a_lt_5 * 1.5) + (a_ge_5 * (a + b))
```

Again, since these calculations are done in array notation, this code will work with variables of any rank. (Note the masks are set to floating type so there is no type-conversion issues and I assume `numarray` is aliased as `N`.)

### Generalized Slicing

The `slice_fixed` method provides a method of slicing `Field` objects by selecting elements based on the value along a dimension rather than the index along a dimension. For instance, say you have a `Field` object `var` with a shape dimensioned (level, latitude, longitude). The `Field` objects for the dimensions have `id`s `lev`, `lat`, and `lon`, respectively, and are all stored in the `dims` `State` object. Using array slicing syntax, if you wanted the slice at latitude 60N and knew from searching `dims["lat"]` that `dims["lat"][9]=60`, the syntax to set this slice to variable `sub1` would be:

```sub1 = Field(var[:,9,:])
```

Recall that when a `Field` objects uses array slicing syntax a Numpy/numarray data array is returned. Thus, if you want `sub` to be a `Field` object, you have to create one initialized by the slice. Note though all the metadata attributes in `sub1` are empty.

But say you didn't know which dimension was latitude? This is commonly the case if you want your model to work for a domain with an arbitrary number of spatial dimensions. The above slicing operation then may be accomplished by the following call:

```sub1 = var.slice_fixed(dims, lat=60)
```

Note that the metadata (including `extra_meta.axis`) for `sub1` produced by `slice_fixed` correctly describes the slice.

`slice_fixed` doesn't work only for a fixed dimension value but also for slicing "across" different values of dimensions. For instance, say `latval` was an array dimensioned (level, longitude) that contained various values of latitude you wanted to obtain at each (level, longitude) location. That slice `sub2` would be:

```sub2 = var.slice_fixed(dims, lat=latval)
```

`slice_fixed` can also slice multiple dimensions by adding the slicing values in those dimensions as keywords.

The `slice_fixed_setdata` method functions like the `slice_fixed` method except that it replaces the elements selected by `slice_fixed` with values from another array or a scalar. Together these two methods allow you to manipulate subarrays of a `Field` variable without knowing ahead of time the dimension structure of the variable.

### Mixing Scalars and Arrays

Say you have three `Field` variables `a`, `b`, and `c`, all the same units and `id`s that are the same as the variable name. In one model instance, `a` and `b` are scalars while `c`'s data is a rank 2 array. Then:

```d = a + b + c
d.replace_all_meta(c)
d.id = "d"
```

will result in `Field` variable `d` that is also rank 2, the same shape as `c`, and with the correct metadata (assuming `d.id` is also `"d"`).

But let's say in another model instance `a` and `c` are scalar while `b`'s data is a rank 3 array. (This situation might occur because the new model is a 3-D model whereas the previous model was a 2-D model. This type of situation can also occur if you decide that while `b` was a constant in the 2-D model, in this 3-D model it varies in space.) The above code would give incorrect metadata, since in this second case `d` is a rank 3 array of the same shape as `b`, but the code assigns it the axes metadata for a scalar `c`.

The `copymeta_largest` function in the `field` module solves this problem by copying and returning the metadata of the `Field` variable with the "largest" data array, which is the shape of the result `d`. Thus:

```from field import copymeta_largest
d = a + b + c
d.replace_all_meta( copymeta_largest(a,b,c) )
d.id = "d"
```

will result in metadata for `d` from the first variable on the righthand-side (`a`, `b`, or `c`) that has the shape of `d`. Numpy/numarray array operations (on which `Field` operations are based) automatically upcast the size of the result array to the appropriate size to hold the results.

Note `copymeta_largest` will only work for this purpose if all the `Field` variables involved are conformable. Otherwise, the array operations on the data are undefined.

## Coding So Metadata Propagates Correctly Throughout the Model

Because variables in a model undergo many transformations, it's wise to code in such a way that when metadata needs to be updated in a derived variable, the metadata is obtained from an input variable rather than written by hand. For instance, consider the `Field` variables `var1` and `var2` that have the same shape and units but different `id`s. If in a function `var1` is already defined and `var2` is calculated, set most of the `var2` metadata by using `var1`, setting by hand only those attributes that are different:

```var2.replace_all_meta( var1.copymeta() )
var2.id = "variable2"
```

Note though to be careful to make sure you make the new variable's `id` different from the `id` of the variable the metadata was copied from. This is important since it's easy to copy metadata from one `Field` to another, forget to change the `id`, and then add the new `Field` to a `State` that includes the metadata source `Field`, overwriting that original `Field`. One way of preventing some cases of this problem is to add new `Field`s all at once to a `State` using `add`; then at least the `add` method will check that all the `Field`s in the argument list are unique.

## Variable Management

In the past, most climate models were written in Fortran. Fortran, through common blocks, makes altering variables in memory trivially easy. This facility is tremendously useful to scientists who would like global access to model variables, but makes models brittle and less reliable.

A modeling framework should enable the user to access a maximum number of variables at all model/sub-model levels while ensuring variables are not accidentally overwritten and altered. In `modelutil` models, examples of how these goals are accomplished include: