[CF-metadata] axis attribute

Sebastien Villaume sebastien.villaume at ecmwf.int
Fri Apr 7 03:10:09 MDT 2017


Dear David,

I see your point and you are probably right that a plotting routine will probably sort this out.

However this is not what I am after, I am more interested in metadata discovery and indexing. 
I need to discover what I have in a file without plotting it, without having a human looking at it to confirm what it is and that it has been plotted correctly. 
I also would like to use these metadata informations to perform actions like merging netCDF files, slicing, cropping, aggregating, interpolating, comparing data in different grids and representations, etc.

I understand that implicit is fine and that explicit is not required for some applications. I have no issue with this. 
My personal point of view is that explicit is better than implicit: I tend to prefer "mandatory" over "optional".

Being implicit means that the assumptions made need to be valid 100% of the time to avoid accidents or corner cases.
I would like to be explicit so I need all the proper mechanisms (variables, semantics, etc.) in place so I can use them.
Right now it feels that I am missing some functionality.

Let me copy below few bits of the terminology section in the CF 1.7 draft document (very similar to 1.6). Please read it keeping in mind what is really an axis, a coordinate, a spatio-temporal dimension and an an array dimension. Each time you read "coordinate2, "dimension" or "dimensional", ask yourself what is implied and if it is not ambiguous:

------------------------
variables
------------------------
auxiliary coordinate variable
    Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).

coordinate variable
    We use this term precisely as it is defined in section 2.3.1 of the NUG . It is a one-dimensional variable with the same name as its dimension [e.g., time(time) ], and it is defined as a numeric data type with values that are ordered monotonically. Missing values are not allowed in coordinate variables.

grid mapping variable
    A variable used as a container for attributes that define a specific grid mapping. The type of the variable is arbitrary since it contains no data.

multidimensional coordinate variable
    An auxiliary coordinate variable that is multidimensional.

scalar coordinate variable
    A scalar variable (i.e. one with no dimensions) that contains coordinate data. Depending on context, it may be functionally equivalent either to a size-one coordinate variable (Section 5.7, "Scalar Coordinate Variables") or to a size-one auxiliary coordinate variable (Section 6.1, "Labels" and Section 9.2, "Collections, instances, and elements").

------------------------
dimensions
------------------------
latitude dimension
    A dimension of a netCDF variable that has an associated latitude coordinate variable.

longitude dimension
    A dimension of a netCDF variable that has an associated longitude coordinate variable.

spatiotemporal dimension
    A dimension of a netCDF variable that is used to identify a location in time and/or space.

time dimension
    A dimension of a netCDF variable that has an associated time coordinate variable.

vertical dimension
    A dimension of a netCDF variable that has an associated vertical coordinate variable.
------------------------

So according to this terminology, I have in my file, 2 auxiliary coordinates variables, but no "real" coordinates variables (according to the NUG) so my auxiliary coordinates are auxiliary to what?
What is a "multidimensional coordinate"? if dimension means spatio-temporal dimension it is a non sense because a coordinate can only reference 1 spatio-temporal dimension, if it is meant to be array-dimensions it is not clear...
What are my 2D array latitude and longitude then? are they latitude and longitude dimension defined in the terminology? not really.... because there are no such things as latitude and longitude dimension: you can define latitude and longitude coordinates, associated with 2 axis that themselves define 2 spatial dimensions... but the coordinates can be defined in whatever n-D array.
I like the definition of "grid mapping variable", I could use a similar variable to be a container for attributes for my "axis variable" with no data!

I know that in the day-to-day life and discussions we don't make the effort to be precise (I don't) and that it is easy to overload the meaning of things but I think that the CF document needs to be very precise, non ambiguous and can not mix axes, coordinates, spatio-temporal and array dimensions.

/Sébastien

----- Original Message -----
From: "David Hassell" <david.hassell at ncas.ac.uk>
To: "Sebastien Villaume" <sebastien.villaume at ecmwf.int>
Cc: "CF Metadata" <cf-metadata at cgd.ucar.edu>, "Jonathan Gregory" <j.m.gregory at reading.ac.uk>
Sent: Friday, 7 April, 2017 08:37:20
Subject: Re: [CF-metadata] axis attribute

Dear Sébastien,

Please bear with me when I ask to right back to the beginning! I am not
sure what the benefit is in labelling the dimensions as X or Y. In the
original tripolar case we have:

dimensions:
    i = 96 ;
    j = 73 ;
variables:
    float latitude(j, i) ;
        latitude:units = "degrees_north" ;
    float longitude(j, i) ;
        longitude:units = "degrees_east" ;
    float sit(j, i) ;
        sit:units = "m" ;
        sit:standard_name = "sea_ice_thickness" ;
        sit:coordinates = "latitude longitude" ;

There is nothing stopping anything from seeing that this is 2-d array of
size i*j, and there is nothing stopping software subpacing the data by i
and j indices.

I don't think a plotting routine would benefit from knowing that the i
dimension was "X", because there are no 1-d coordinates it can use along
that dimension.

Many thanks and all the best,

David

On 6 April 2017 at 22:45, Sebastien Villaume <sebastien.villaume at ecmwf.int>
wrote:

> Dear Mark and Jonathan,
>
> thank you for your comments.
>
> @Mark:
> the short answer: you can put in principle whatever you want in that
> variable because in this case it is a dummy variable only there to hold the
> axis attribute. But please read the long explanation!
>
> the long, boring explanation:
> As I understand it, the CF convention does not recognize axis as a valid
> object on its own like for "dimensions" and the various type of "variables"
> and the convention seems to make it mandatory to attach to it a variable
> that becomes a "coordinate" variable. Note that I say that it is the
> coordinate variable that is attached to the axis and not the opposite.
>
> From a mathematical point of view, it is perfectly possible to define an
> axis without a coordinate on it (arguably it is not that useful). The
> common case is that a 1-D array defines positions on that axis (the
> coordinate). Then your 1-D data points are positioned with the help of the
> coordinate, itself attached to the axis.
>
> If you have one more axis, you can define a new coordinate on it. This
> creates a 2-D space. Now you have the choice on how you represent your 2-D
> data points:
> if the dataset is totally irregular you will have a 1-D array of "n" data
> points associated with a 1-D array of "n" positions for the first dimension
> and a 1-D array of "n" positions for the second dimension. It works, it is
> still a 2-D dataset stored in a long one dimensional vector.
>
> Imagine that you realize that your dataset is not as irregular as you
> thought, it is in fact a regular grid! you identify that you only have i
> possible values of the first coordinate and j possible values for the
> second coordinate, you also notice that i*j=n. Great you can now represent
> your dataset with 2 coordinates of length i and j respectively, each of
> them associated with 2 axes x and y and your data is now a 2-D array of
> size (i,j). you can position your data using the coordinates, it is mapped
> using the indices within each coordinate array. Now you have a 2-D spatial
> dataset sored in a 2-D array with 2 supporting 1-D spatial coordinates
> stored in one dimensional vectors.
>
> Lets say now that you take this regular grid and you distort it... your
> regular grid is gone you can no longer use i and j for partitioning!
> really? well no, nobody says that you can not slice your "n" long vectors
> into i*j arrays! you could choose whatever you want for i and j as long as
> i*j=n. Of course if you choose (2)*(n/2) or (n/2)*(2), it is a bit useless,
> but you can also choose meaningful i and j because even if your grid became
> irregular, it is not random points, it is still a grid of size i*j . This
> is exactly my use case! And in that situation your coordinates can be
> arranged in arrays of size i*j. What I need is 2 axes and 2 coordinates of
> dimension 2 with lengths i and j. The catch here is that I have 2-D arrays
> to store one "spatial" dimension! It is another case of overlapped
> concepts, dimension is used transparently for the dimension of arrays,
> dimension of the geometrical space, and sometimes for the size of one of
> the dimensions of an array!!
>
> Anyway, I should be able to define my axes like this:
>
> int x;
>     x:axis = "X";
>     x:standard_name = "x_axis" ; // no standard name exists...
>     x:units = "1" ; // no units, it will come with the coordinate
> int y;
>     y:axis = "Y";
>     y:standard_name = "y_axis" ; // no standard name exists...
>     y:units = "1" ; // no units, it will come with the coordinate
> float longitude(j,i);
>     longitude:standard_name = "longitude" ;
>     longitude:units = "degrees" ;
>     longitude:positive = "east" ;
>     longitude:long_name = "longitude" ;
>     longitude:axis_mapping = "X" ;
> float latitude(j,i);
>     latitude:standard_name = "latitude" ;
>     latitude:units = "degrees" ;
>     latitude:positive = "north" ;
>     latitude:long_name = "latitude" ;
>     latitude:axis_mapping = "Y" ;
> float sit(j, i) ;
>     sit:units = "m" ;
>     sit:standard_name = "sea_ice_thickness" ;
>     sit:long_name = "Ice thickness" ;
>     sit:coordinates = "latitude longitude" ;
>
> several comments:
> notice how one could tell on which axis the coordinate should go using for
> instance a "axis_mapping" attribute. Not a "coordinate" attribute, this one
> should be used to tell the coordinates of my data variable!
> I find this approach clearer and more flexible as it can probably cater
> for any situation of axes, coordinates, etc.
>
> But because in CF one cannot create bare axis, I follow the rules and
> creates:
>
> double x(i);
>     x:axis = "X";
>     x:standard_name = "..." ; // not an axis anymore, give me a standard
> name
>     x:units = "1" ;
>     y:long_name = "i-index of mesh grid" ;
> double y(j);
>     y:axis = "Y";
>     y:standard_name = "..." ; // not an axis anymore, give me a standard
> name
>     y:units = "1" ;
>     y:long_name = "j-index of mesh grid" ;
>
> and I have the choice of what I put in those arrays since it is somehow
> artificial.
>
> I could populate the "primary" coordinates with 1 to i and 1 to j which
> would represent the indices and if I subset the grid, I then retain the
> information that the domain has been cropped because the indices left will
> not be 1 to i/j but n to m.
> I don' t really like this but what can I do?
>
> If we follow this idea, it means introducing a clear concept from "axis"
> besides the other types of variables, defining new attribute to "attach"
> coordinates to axes, etc.
>
> Another solution, much less disturbing, would be to heavily modify the
> proper chapters in the CF document to:
> - completely decouple the concepts of "axis" and "coordinate": a
> coordinate is not an axis and vice versa.
> - completely decouple the concepts of spatio temporal dimension from array
> dimension from the size the array dimension
> - continue to use the "axis" attribute  but on n-D array coordinates: the
> array has n-D dimensions but the coordinate map to 1 axis/spatial dimension
> only!
> - Whatever the dimensions of the array for the coordinate, all the values
> contained in the array must be mapped on one given axis, the one defined in
> axis attribute. For instance, a 2-D latitude only contains values that are
> latitudes and will only map on one axis.
> - In principle one could have in the same file several coordinates of
> possibly different "array" dimensions, different sizes and different units
> defined for one axis. This means that the attribute "axis=z" for instance
> can appears more than once in the file. The only restriction I see is that
> 2 data variables can be only plotted simultaneously if all their
> coordinates share the same units (the coordinate mapped on one axis of the
> first data variable must have the same units than the coordinate mapped on
> the same axis for the other data variable). This allow 2 data variables
> defined on two different grid sharing the same units to be in the same file
> and plotted together.
> - X and Y should be clearly decoupled from longitude and latitude. X and Y
> are the axes, longitude and latitude are the coordinates!
>
>
> @Jonathan:
> I think the whole confusion here comes from the overlapping of concepts:
> axes and coordinates on one hand and dimension of arrays and spatial
> dimensions on the other hand. If the relevant chapters are rewritten
> carefully to separate axes from coordinates and array dimensions from
> spatio-temporal dimensions we are good.  think
>
> @all: Reading more through the Trac tickets system, I noticed the nice
> Trac ticket 117 about "multiple" time axis. This is a nice example of
> mixing axes, coordinates, dimensions of arrays, the time dimension, etc!
>
>
> /Sébastien
>
> ----- Original Message -----
> From: "Jonathan Gregory" <j.m.gregory at reading.ac.uk>
> To: cf-metadata at cgd.ucar.edu
> Sent: Thursday, 6 April, 2017 16:49:56
> Subject: Re: [CF-metadata] axis attribute
>
> Dear Jim and Sebastien
>
> The original intention of axis was to label the independent variables as 1D
> xyzt axes of the data variables.  This can be deduced from other
> attributes,
> but it's more effort. It's partly a plotting hint, but also it's because
> you
> might reasonable want to tell software, "give me the z-axis coordinates",
> or
> "calculate a mean over the x-direction". The latter is often a zonal mean,
> but
> it isn't with a rotated-pole or tripolar grid, yet the operation is still
> performed sometimes.
>
> It's useful that you've pointed out the confusion of purpose. If it were
> regarded as an acceptable backwards-incompatibility, which I'm nervous
> about,
> I'd be happy if we returned "axis" to its original purpose of identifying
> 1D
> axes, and also for scalar coordinate variables (which are equivalent to
> axes
> of size one), and provided another attribute to label aux coords as
> horizontal.
>
> I agree that if we have 1D x and y, with 2D lat and lon, the 1D variables
> are
> the axes. That's consistent with the original purpose of the axis
> attribute.
>
> > I also find the units of latitude and longitude confusing: it looks like
> it was a way to squeeze the direction of the coordinate inside the units. I
> have the same observation for the time coordinate that has its origin in
> the units!
>
> This convention was kept in CF for backwards-compatibility with COARDS. CF
> does
> not use units in any other case to identify the quantity or sense.
>
> > It was done correctly for z coordinate using "units" and "positive",
> probably because there are many types of z coordinates with various origin
> and directions, and no real consensus. I note however that often the origin
> is not always clearly defined.
>
> The positive attribute was also kept for backwards-compatibility with
> COARDS.
> It has the advantage of being useful to identify the vertical axis, but
> this
> can also be done with axis="Z". CF standard names provide information which
> indicates the sign convention.
>
> If coordinate_index is confusing, I think standard_names containing x_index
> or y_index would be OK, provided we change the existing standard names
>   magnitude_of_derivative_of_position_wrt_x_coordinate_index
>   magnitude_of_derivative_of_position_wrt_y_coordinate_index
> to remove "_coordinate".
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>



-- 
David Hassell
National Centre for Atmospheric Science
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243, Reading RG6 6BB
Tel: +44 118 378 5613
http://www.met.reading.ac.uk/



More information about the CF-metadata mailing list