Averaging over an axis

Return to Python Lectures

A very common data manipulation in climate data analysis is averaging over one of the axes, or dimensions of the data. Examples include time averages, zonal means and global means (broken down into an average over latitude followed by an average over longitude). The MV module provides a very convenient and versatile function, MV.average(...), to facilitate this operation. The following examples use Jones_climatology.nc, which contains a climatology of monthly mean surface temperature as a three dimensional array.

The following example yields a two dimensional array containing the annual average surface temperature as a function of latitude and longitude, and gives the array a sensible name in case we want to write it out later for plotting.

>>>import cdms,MV
>>> f = cdms.open('Jones_climatology.nc')
>>> Tsurf = f['temp']
>>> Ts_ann_mean = MV.average(Tsurf)
>>> Ts_ann_mean.id = 'T_annual'

If you examine the axes of the mean data by issuing the command Ts_ann_mean.getAxisList(), you will see that MV.average(...) copies over the correct axis information from the original data.

Note that by default, MV.average(...) averages over the first axis, in this case time. If you want to average over a different axis, you specify the axis using the 'axis' keyword parameter. For example, to obtain a two dimensional array consisting of the zonal mean (i.e. average over longitude) for each month, we would use

>>>Ts_zonal_mean = MV.average(Tsurf,axis=2)

Axis indices start from zero, so the first axis is given by axis=0 (the default), the second by axis=1, and so forth. In the Jones data set, longitude is the third axis.

The preceding examples carried out the average with uniform weights. Often, however, one needs to do a weighted average, as in area-weighting an average over latitudes or mass-weighting vertical averages. MV.average(...) allows a vector of weights with the same length as the axis to be specified. The following example computes area weights for the latitude axis, and computes the time series of global mean temperatures from Ts_zonal_mean. In this data set, the latitude axis is called 'Y'. (see the topic About Axes for information on how to get such information using cdms).

>>>from math import pi
>>>latitude = f['Y']
>>>AreaWeight = MV.cos(latitude[:]*pi/180.) # Use cross section to get at the latitude data
>>> AreaWeight = AreaWeight/MV.average(AreaWeight) #Normalize weight
>>> T_global = MV.average(Ts_zonal_mean,axis=1,weights = AreaWeight)
>>> T_global[:]
array([ 13.60712385, 13.91143632, 14.51118444, 15.57971349, 16.439963 , 17.04888147,
17.42803931, 17.209453 , 16.56859413, 15.81402357, 14.70338267,
13.9047728 ,])

Note that we had to use an array cross section to get at the latitude data, since the entity "latitude" itself is an axis object that MV.cos doesn't know how to deal with. The reference latitude[:] creates an array with the latitude data. In the results we see that the January global mean surface temperature is 13.6C, whereas the July global mean is 17.4C -- a well-known result stemming from the fact that land heats up quickly and there is more land in the Northern Hemisphere.

The versatility of MV.average(...) is further enhanced by the fact that it works perfectly well with arbitrary array cross sections. For example, to get the Northern Hemisphere average we would do

>>> T_NH = MV.average(Ts_zonal_mean[:,18:36],axis=1,weights=AreaWeight[18:36])

The average can also be done over a cross-section with non-unit stride. For example, if we wanted the time average temperature just over every other month (i.e. Jan,March,...) we could do

>>> Ts1 = MV.average(Tsurf[0:12:2])

This particular average is pretty pointless, but if one had a data set with, say, 12 years of monthly mean data, one could use a similar construction to immediately obtain the monthly mean climatology by using an array cross section with a stride of 12, namely MV.average(Tsurf[0:144:12]).

As always, you can type help(MV.average) at the python command line to get full documentation.

Return to Python Lectures