[CF-metadata] axis attribute

Sebastien Villaume sebastien.villaume at ecmwf.int
Thu Apr 6 15:45:08 MDT 2017

Dear Mark and Jonathan,

thank you for your comments.

the short answer: you can put in principle whatever you want in that variable because in this case it is a dummy variable only there to hold the axis attribute. But please read the long explanation!

the long, boring explanation:
As I understand it, the CF convention does not recognize axis as a valid object on its own like for "dimensions" and the various type of "variables" and the convention seems to make it mandatory to attach to it a variable that becomes a "coordinate" variable. Note that I say that it is the coordinate variable that is attached to the axis and not the opposite.

>From a mathematical point of view, it is perfectly possible to define an axis without a coordinate on it (arguably it is not that useful). The common case is that a 1-D array defines positions on that axis (the coordinate). Then your 1-D data points are positioned with the help of the coordinate, itself attached to the axis. 

If you have one more axis, you can define a new coordinate on it. This creates a 2-D space. Now you have the choice on how you represent your 2-D data points:
if the dataset is totally irregular you will have a 1-D array of "n" data points associated with a 1-D array of "n" positions for the first dimension and a 1-D array of "n" positions for the second dimension. It works, it is still a 2-D dataset stored in a long one dimensional vector.

Imagine that you realize that your dataset is not as irregular as you thought, it is in fact a regular grid! you identify that you only have i possible values of the first coordinate and j possible values for the second coordinate, you also notice that i*j=n. Great you can now represent your dataset with 2 coordinates of length i and j respectively, each of them associated with 2 axes x and y and your data is now a 2-D array of size (i,j). you can position your data using the coordinates, it is mapped using the indices within each coordinate array. Now you have a 2-D spatial dataset sored in a 2-D array with 2 supporting 1-D spatial coordinates stored in one dimensional vectors.

Lets say now that you take this regular grid and you distort it... your regular grid is gone you can no longer use i and j for partitioning! really? well no, nobody says that you can not slice your "n" long vectors into i*j arrays! you could choose whatever you want for i and j as long as i*j=n. Of course if you choose (2)*(n/2) or (n/2)*(2), it is a bit useless, but you can also choose meaningful i and j because even if your grid became irregular, it is not random points, it is still a grid of size i*j . This is exactly my use case! And in that situation your coordinates can be arranged in arrays of size i*j. What I need is 2 axes and 2 coordinates of dimension 2 with lengths i and j. The catch here is that I have 2-D arrays to store one "spatial" dimension! It is another case of overlapped concepts, dimension is used transparently for the dimension of arrays, dimension of the geometrical space, and sometimes for the size of one of the dimensions of an array!!

Anyway, I should be able to define my axes like this:

int x;
    x:axis = "X";
    x:standard_name = "x_axis" ; // no standard name exists...
    x:units = "1" ; // no units, it will come with the coordinate
int y;
    y:axis = "Y";
    y:standard_name = "y_axis" ; // no standard name exists...
    y:units = "1" ; // no units, it will come with the coordinate
float longitude(j,i);
    longitude:standard_name = "longitude" ;
    longitude:units = "degrees" ;
    longitude:positive = "east" ;
    longitude:long_name = "longitude" ;
    longitude:axis_mapping = "X" ;
float latitude(j,i);
    latitude:standard_name = "latitude" ;
    latitude:units = "degrees" ;
    latitude:positive = "north" ;
    latitude:long_name = "latitude" ;
    latitude:axis_mapping = "Y" ;
float sit(j, i) ;
    sit:units = "m" ;
    sit:standard_name = "sea_ice_thickness" ;
    sit:long_name = "Ice thickness" ;
    sit:coordinates = "latitude longitude" ;

several comments:
notice how one could tell on which axis the coordinate should go using for instance a "axis_mapping" attribute. Not a "coordinate" attribute, this one should be used to tell the coordinates of my data variable!
I find this approach clearer and more flexible as it can probably cater for any situation of axes, coordinates, etc.

But because in CF one cannot create bare axis, I follow the rules and creates:

double x(i);
    x:axis = "X";
    x:standard_name = "..." ; // not an axis anymore, give me a standard name
    x:units = "1" ;
    y:long_name = "i-index of mesh grid" ;
double y(j);
    y:axis = "Y";
    y:standard_name = "..." ; // not an axis anymore, give me a standard name
    y:units = "1" ;
    y:long_name = "j-index of mesh grid" ;

and I have the choice of what I put in those arrays since it is somehow artificial.

I could populate the "primary" coordinates with 1 to i and 1 to j which would represent the indices and if I subset the grid, I then retain the information that the domain has been cropped because the indices left will not be 1 to i/j but n to m.
I don' t really like this but what can I do?

If we follow this idea, it means introducing a clear concept from "axis" besides the other types of variables, defining new attribute to "attach" coordinates to axes, etc.

Another solution, much less disturbing, would be to heavily modify the proper chapters in the CF document to: 
- completely decouple the concepts of "axis" and "coordinate": a coordinate is not an axis and vice versa.
- completely decouple the concepts of spatio temporal dimension from array dimension from the size the array dimension
- continue to use the "axis" attribute  but on n-D array coordinates: the array has n-D dimensions but the coordinate map to 1 axis/spatial dimension only!
- Whatever the dimensions of the array for the coordinate, all the values contained in the array must be mapped on one given axis, the one defined in axis attribute. For instance, a 2-D latitude only contains values that are latitudes and will only map on one axis.
- In principle one could have in the same file several coordinates of possibly different "array" dimensions, different sizes and different units defined for one axis. This means that the attribute "axis=z" for instance can appears more than once in the file. The only restriction I see is that 2 data variables can be only plotted simultaneously if all their coordinates share the same units (the coordinate mapped on one axis of the first data variable must have the same units than the coordinate mapped on the same axis for the other data variable). This allow 2 data variables defined on two different grid sharing the same units to be in the same file and plotted together.
- X and Y should be clearly decoupled from longitude and latitude. X and Y are the axes, longitude and latitude are the coordinates!

I think the whole confusion here comes from the overlapping of concepts: axes and coordinates on one hand and dimension of arrays and spatial dimensions on the other hand. If the relevant chapters are rewritten carefully to separate axes from coordinates and array dimensions from spatio-temporal dimensions we are good.  think

@all: Reading more through the Trac tickets system, I noticed the nice Trac ticket 117 about "multiple" time axis. This is a nice example of mixing axes, coordinates, dimensions of arrays, the time dimension, etc!


----- Original Message -----
From: "Jonathan Gregory" <j.m.gregory at reading.ac.uk>
To: cf-metadata at cgd.ucar.edu
Sent: Thursday, 6 April, 2017 16:49:56
Subject: Re: [CF-metadata] axis attribute

Dear Jim and Sebastien

The original intention of axis was to label the independent variables as 1D
xyzt axes of the data variables.  This can be deduced from other attributes,
but it's more effort. It's partly a plotting hint, but also it's because you
might reasonable want to tell software, "give me the z-axis coordinates", or
"calculate a mean over the x-direction". The latter is often a zonal mean, but
it isn't with a rotated-pole or tripolar grid, yet the operation is still
performed sometimes.

It's useful that you've pointed out the confusion of purpose. If it were
regarded as an acceptable backwards-incompatibility, which I'm nervous about,
I'd be happy if we returned "axis" to its original purpose of identifying 1D
axes, and also for scalar coordinate variables (which are equivalent to axes
of size one), and provided another attribute to label aux coords as horizontal.

I agree that if we have 1D x and y, with 2D lat and lon, the 1D variables are
the axes. That's consistent with the original purpose of the axis attribute.

> I also find the units of latitude and longitude confusing: it looks like it was a way to squeeze the direction of the coordinate inside the units. I have the same observation for the time coordinate that has its origin in the units!

This convention was kept in CF for backwards-compatibility with COARDS. CF does
not use units in any other case to identify the quantity or sense.

> It was done correctly for z coordinate using "units" and "positive", probably because there are many types of z coordinates with various origin and directions, and no real consensus. I note however that often the origin is not always clearly defined.

The positive attribute was also kept for backwards-compatibility with COARDS.
It has the advantage of being useful to identify the vertical axis, but this
can also be done with axis="Z". CF standard names provide information which
indicates the sign convention.

If coordinate_index is confusing, I think standard_names containing x_index
or y_index would be OK, provided we change the existing standard names
to remove "_coordinate".

Best wishes

CF-metadata mailing list
CF-metadata at cgd.ucar.edu

More information about the CF-metadata mailing list