Reading and Writing netCDF files

Return to Python Lectures

Reading a netCDF file

To do anything with a netCDF file, you need to first open it and create a file object. This is done using the command cdms.open(...), in a way similar to the handline of ordinary files using Python's open(...) command. For example, to open the netcdf file 'Jones_climatology.nc', and create a file object 'f ', you would use

f = cdms.open('Jones_climatology.nc')

By default, a file is opened with read-only access. Data in a file is referred to by name, using a dictionary lookup. You can get a list of all the variables in a file by using the listvariables() method of the file object. Once you have accessed a variable, you can find out its properties as you would for any other variable. The following session provides a complete example. If you want your own copy of the netcdf file so you can try out the example yourself,you can get it here.

>>> import cdms
>>> f = cdms.open('Jones_climatology.nc')
>>> f.listvariables()
['lat', 'lon', 'temp']
>>> Tsurf = f['temp']
>>> Tsurf.listattributes() #Find out what information is in there
['units', 'scale_factor', 'description', 'file_missing_value',
'add_offset', 'long_name', 'last_modified', 'fullname', 'missing_value']
>>> Tsurf.long_name # Get the long_name attribute
'surface temperature climatology'
>>> Tsurf.units
'degreeC'
>>> Tsurf.shape #Let's find out the dimensions
(12, 36, 72)
>>> Tsurf.getAxisList() # Find out what the axes are
[ id: T
units: months
Length: 12
First: 0.5
Last: 11.5
Other axis attributes:
gridtype: [1,]
fullname: /T
Python id: 6eab30
, id: Y
units: degree_north
Length: 36
First: -87.5
Last: 87.5
Other axis attributes:
gridtype: [0,]
fullname: /Y
Python id: 1e6be0
, id: X
units: degree_east
Length: 72
First: 2.5
Last: 357.5
Other axis attributes:
gridtype: [1,]
fullname: /X
Python id: 6ea8e0
]
>>> Tsurf[:,18,0] # Look at some of the data
temp
array([ 27.1099987 , 27.59999847, 27.70999908,
27.53999901, 26.94999886, 25.40999985,
24.46999931, 24.27999878, 24.93999863,
25.57999992, 26.43000031, 26.73999977,])

>>> latitude = f['Y'] # Get the latitude
>>> latitude[:] # Look at it
[-87.5,-82.5,-77.5,-72.5,-67.5,-62.5,-57.5,-52.5,
-47.5,-42.5,-37.5,-32.5,-27.5,
-22.5,-17.5,-12.5, -7.5, -2.5, 2.5, 7.5, 12.5, 17.5, 22.5, 27.5,
32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5,
87.5,]

Note that in this example, cdms was able to correctly associate the axes of the variable Tsurf with the axis data in the file because: (a) the file did indeed have axis data, and (b) the axis data was named according to the proper netCDF conventions for marking an axis. If the axis data isn't in the file or can't be identified, cdms will create some plain-vanilla axes of its own. Note that 'lat' and 'lon' in this data set are variables included in the file, which are not associated as axes. The following description of the file, provided by the ncdump command from the Unix command line, illustrates the naming convention for axes:

[Snufkin:~/scratch] rtp1% ncdump -h Jones_climatology.nc
netcdf Jones_climatology {
dimensions:
T = 12 ;
Y = 36 ;
X = 72 ;
variables:
float T(T) ;
T:gridtype = 1 ;
T:units = "months" ;
T:fullname = "/T" ;
float Y(Y) ;
Y:gridtype = 0 ;
Y:units = "degree_north" ;
Y:fullname = "/Y" ;
float X(X) ;
X:gridtype = 1 ;
X:units = "degree_east" ;
X:fullname = "/X" ;
float temp(T, Y, X) ;
temp:description = "Combined land/ocean temperature climatology grids, by Phil Jones" ;
temp:file_missing_value = 32767 ;
temp:fullname = "[/JONES /clim /temp]" ;
temp:long_name = "surface temperature climatology" ;
temp:last_modified = 858781479 ;
temp:units = "degreeC" ;
temp:missing_value = 1.e+20f ;
temp:name = "temp" ;
float lat(Y) ;
lat:name = "latitude variable" ;
float lon(X) ;
lon:name = "longitude variable" ;

// global attributes:
:Conventions = "CF-1.0" ; }

The variables X, Y and T are recognized as axes because the name of the variable (e.g.X) is the same as the name of its dimension (e.g. X). If the author of this file had wanted the variable lat to be recognized as an axis, the corresponding dimension would have had to have been called "lat" as well, rather than "Y". This is purely a naming convention, used to make up for the fact that netCDF does not actually provide a way to associate axis data with a variable. It is, however, a naming convention that is widely recognized by software using netCDF files, including cdms.

Another feature to note about this example is that when we issued the command Tsurf = f['temp'], Python did not read in any actual temperature data yet. Tsurf is an example of a 'file variable' which refers to data on disk. Such data is not read into memory until it is actually needed, as in the command T[:,18,0]. This powerful feature of cdms is very convenient when dealing with enormous datasets.

Writing data to a netCDF file

You can write data to a netcdf file by simply opening a file with write permission and writing a cdms variable or a Numeric varible to the file. The following example creates a numeric array and writes it to the file 'test.nc':

import Numeric,cdms
outfile = cdms.open('test.nc','w')
data = Numeric.zeros(3,'float')
data[0] = 1.
data[1] = 5.
data[2] = 10.
outfile.write(data)
outfile.close()

Warning! Opening the file with access code 'w' will delete any existing file of the same name as that you are opening. If you want to append data to an existing file, or otherwise modify it, you need to use the access code 'a' instead.

Now, if we exit Python and check the contents of the file using ncdump, we get:

[Snufkin:~/scratch] rtp1% ncdump test.nc
netcdf test {
dimensions:
axis_0 = 3 ;
bound = 2 ;
variables:
double axis_0(axis_0) ;
axis_0:bounds = "bounds_axis_0" ;
double bounds_axis_0(axis_0, bound) ;
float variable_1(axis_0) ;
variable_1:name = "variable_1" ;

// global attributes:
:Conventions = "CF-1.0" ;
data:

axis_0 = 0, 1, 2 ;

bounds_axis_0 =
-0.5, 0.5,
0.5, 1.5,
1.5, 2.5 ;

variable_1 = 1, 5, 10 ;
}

Note that cdms created some axes and some other utility variables (bounds_axis) that some cdms routines use for data manipulation and plotting. If you don't want to clutter up your file with the bounds data, you can suppress its generation by issuing the command cdms.setAutoBounds('off') before you do any writing. cdms will still create the axis data for you.

The preceding example works, but the data you write winds up with names you have no control over. The easiest way to get better control over the names in the file is to use a Masked Variable (of the sort cdms itself creates when reading in a file). The following example is slightly modified from the previous one, and illustrates the use of masked variables.

import cdms, MV
outfile = cdms.open('test.nc','w')
data = MV.zeros(3,'float')
data[0] = 1.
data[1]=4.
data[2] = 9.
data.id = 'Z1'
data.name = 'squares'
outfile.write(data)
outfile.description = 'This is my test data set which I made myself.'
outfile.close()

This example also shows how to write a global attribute, in this case called "description," to the file.

Note the distinction between the data "id" and its "name," which is a little confusing. The id is actually the name by which netCDF routines refer to the variable. It is usually the most important thing you want to set. The "name" in the above example is used to provide a longer, more descriptive name for the variable. It is also common to provide an even more complete name by setting the attribute "data.long_name". All cdms variables have an "id", but they will only have a "name" or "long_name" attribute if the file provides one or if you set one yourself. Setting the value of the attribute automatically creates it.

The following output from ncdump will make the distinction between the id and the name more clear:

[Snufkin:~/scratch] rtp1% ncdump test.nc
netcdf test {
dimensions:
axis_0 = 3 ;
bound = 2 ;
variables:
double axis_0(axis_0) ;
axis_0:bounds = "bounds_axis_0" ;
double bounds_axis_0(axis_0, bound) ;
float Z1(axis_0) ;
Z1:name = "squares" ;
Z1:missing_value = 1.e+20f ;

// global attributes:
:Conventions = "CF-1.0" ;
:description = "This is my test data set which I made myself." ;
data:

axis_0 = 0, 1, 2 ;

bounds_axis_0 =
-0.5, 0.5,
0.5, 1.5,
1.5, 2.5 ;

Z1 = 1, 4, 9 ;
}

The axis and dimension names are still created by cdms; in the topic "About Axes" we will cover what you need to do if you want to specify your own names for these. The following example session shows how the data we just wrote would be accessed in Python:

>>> import cdms
>>> infile = cdms.open('test.nc')
>>> infile.listvariables()
['bounds_axis_0', 'Z1']
>>> Z = infile['Z1']
>>> Z[:]
Z1
array([ 1., 4., 9.,])
>>> Z.name
'squares'

Generally speaking, performing any manipulation on data creates new variable id's and names, so that if you want the results to have sensible names when you write them out, you need to set these attributes yourself. Unfortunately, MV doesn't always copy over axes to calculated results, even when it could sensibly do so. The following example shows how we could create a new temperature data set in Kelvin, based on the Jones climatology in Celsius:

>>> import cdms
>>> infile = cdms.open('Jones_climatology.nc')
>>> Tsurf = infile['temp']
>>> Tsurf.id # Verify rthe id
'temp'
>>> Tsurf = Tsurf + 273.15 # Convert to degrees K
>>> Tsurf.id # What happened to the name?
'variable_1'
>>> Tsurf.id = 'temp' # Set the id to something more sensible
>>> Tsurf.units = 'Deg Kelvin' #Set the units
>>> outfile = cdms.open('test.nc','w')
>>> outfile.write(Tsurf)
<Variable: temp, file: test.nc, shape: (12, 36, 72)>
>>> outfile.close()
>>> infile.close()

The information on this file returned by ncdump is then:

[Snufkin:~/scratch] rtp1% ncdump -h test.nc
netcdf test {
dimensions:
T = 12 ;
bound = 2 ;
Y = 36 ;
X = 72 ;
variables:
float T(T) ;
T:bounds = "bounds_T" ;
T:gridtype = 1 ;
T:units = "months" ;
T:fullname = "/T" ;
double bounds_T(T, bound) ;
float Y(Y) ;
Y:bounds = "bounds_Y" ;
Y:gridtype = 0 ;
Y:units = "degree_north" ;
Y:fullname = "/Y" ;
double bounds_Y(Y, bound) ;
float X(X) ;
X:bounds = "bounds_X" ;
X:gridtype = 1 ;
X:units = "degree_east" ;
X:fullname = "/X" ;
double bounds_X(X, bound) ;
double temp(T, Y, X) ;
temp:name = "variable_1" ;
temp:units = "Deg Kelvin" ;
temp:missing_value = 1.00000002004088e+20 ;

// global attributes:
:Conventions = "CF-1.0" ;
}

Note that because we didn't specify a name for the variable (by setting Tsurf.name), cdms left the name attribute as the internally generated one. Note also that, in this case, cdms correctly copied over the axes from the original data set. In more complex calculations, it is sometimes necessary to copy over the axes explicitly. Finally, note that the arithmetic operation promoted the temperature data to double precision. To learn how to suppress this behavior, see the explanation of "The Double Gotcha".

Modifying a file by writing to a file variable

When you change any value of a file variable, then the corresponding changes are written immediately to the file, provided the file is open and it is write-enabled. If these conditions are not fulfilled, you will get an error message when you try to change the value. This feature can occasionally be useful when you want to update or otherwise modify some data in a file without having to make a new copy of the whole thing.

The following example shows how this is done. It replaces the January temperature with an array of zeros. This is a rather dumb thing to want to do, but it serves to illustrate the point. In a more realistic application, you might instead be replacing the January values with an array of corrected values you read from another file.

So as to preserve the original data, you should copy Jones_climatology.nc to test.nc first before trying the example.

>>> import cdms,MV
>>> f = cdms.open('test.nc','a')
>>> Tsurf = f['temp']
>>> Tsurf.shape
(12, 36, 72)
>>> Tnew = MV.zeros((36,72),'float32')
>>> Tsurf[0] = Tnew
>>> f.close()

After trying the example, you can do an ncdump on the file to see the changes you have made.

A number of quirks of Python and cdms limit the utility of in-situ modification, and make the process rather unintuitive and error-prone. The first problem is that any arithmetic array operation creates a new local variable rather than writing the result to the file. This means that an operation like Tsurf = Tsurf + 273.15, which one would think would convert the temperatures in the file to degrees K, in fact just creates a new local variable and names it Tsurf, throwing away the file variable that was formally referred to as Tsurf. The following session illustrates this gotcha.

>>> Tsurf = f['temp']
>>> Tsurf
<Variable: temp, file: test.nc, shape: (12, 36, 72)>
>>> Tsurf = Tsurf + 273.15
>>> Tsurf
variable_1
array(
array (12,36,72) , type = d, has 31104 elements)

Note that, as usual, the arithmetic operation has promoted the result to double precision. The promotion to double precision makes it hard to do arithmetic on array cross sections if the file variable is single precision. The following shows one workaround which does achieve the desired result of converting temperature to Kelvin in situ.

>>> f = cdms.open('test.nc','a')
>>> Tsurf = f['temp']
>>> AbsZero = MV.zeros(1,'float32') #Set up a constant as a 1-element array
>>> AbsZero[0] = 273.15
>>> Tsurf[:] = Tsurf[:] + AbsZero
>>> f.close()

In this case, the array operation on Tsurf does not create a new array, because the assignment operates on a cross section and cross-sections always modify the data in the array rather than creating a reference to a new array.

This works, but it is awkward and ugly. One would have to be rather desperate to resort to a trick like this.

Note that in a netCDF file there can be one special dimension which is a "record dimension" (generally thought of as time). The record dimension can be recognized in an ncdump because it's dimension value is 'UNLIMITED'. When modifying a file variable in situ, it is OK to write to index values of the record dimension that are past those currently in the file. This feature is very useful in extending the time series in a file to include new data -- for example if you have a file of temperature up to January 20, and want to add in the data for the rest of the month.

Return to Python Lectures