[CF-metadata] example H4 (was Pre-proposal for "charset")

Jonathan Gregory j.m.gregory at reading.ac.uk
Thu Mar 2 12:26:57 MST 2017


Dear Bob et al.

Yes, this thread is currently not about your charset proposal, but about H4,
because of its being an example. My request for an example arises because I
guess that when CF metadata contains char arrays, it should already be clear
from their purpose (within CF) whether they are characters or strings.

However that doesn't seem to work in the case of H4, as you say. As David says,
> I think that such a variable contains spatial information for the
> implied instance dimension. Therefore I think that the "cf_role" variable
> should be an auxiliary coordinate variable and treated accordingly, thus
> removing the ambiguity that Jonathan points out.
... but it isn't. I recall discussing this point with Steve Hankin and John
Caron when ch 9 was being drafted, but I hadn't seen the full implications.
John argued that we shouldn't require all station variables to be listed in
the coordinates attribute, thus making them formally auxiliary coordinate
variables, in case there were lots of them. He argued that association by
sharing the same dimension is sufficient. Although it's not like the rest of
the CF convention, I would say, association in this way *is* sufficient, when
there *is* a station dimension. In this case there isn't!

What if you had, in the same file, two data variables, containing a timeseries
timeseries from a single locations, with no station dimension, so each
timeseries has just a time-dimension. You can include some station identifier
with cf_role="timeseries_id", but there's no way to know which station it
identifies, so this is useless. As far as I know it's legal - or have I over-
looked something? If it's legal now, I would say this is the first case we've
had where the CF convention is actually defective - I mean, not just the text,
but the design of the convention - so we must change it.

It could be fixed by any of these methods:

* require there to be a station dimension if you want to include any station-
related variables in the file which aren't listed in the coordinates attribute.

* require all station variables to be listed in the coordinates attribute (if
there is no station dimension).

* require all data variables in the file to apply to the same station, so that
the file deals with only one location (if there is no station dimension).

Of these, I dislike the last one, because it puts a requirement on the file as
a whole, whereas most of CF deals with data variable, for which files are just
containers.

Best wishes

Jonathan


On Wed, Mar 01, 2017 at 08:39:10AM +0000, David Hassell wrote:
> Date: Wed, 1 Mar 2017 08:39:10 +0000
> From: David Hassell <david.hassell at ncas.ac.uk>
> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
> CC: Jonathan Gregory <j.m.gregory at reading.ac.uk>, CF Metadata
>  <cf-metadata at cgd.ucar.edu>, "Signell, Richard" <rsignell at usgs.gov>
> Subject: Re: [CF-metadata] example Re: Pre-proposal for "charset"
> 
> Hello,
> 
> On the validity of H4 ...
> 
> The problem for me (in H4 and Bob's file) is that the "cf_role" variable is
> not listed by the coordinates attribute of any of the data variables to
> which it applies. But should it be? To my reading the conventions are a
> little vague. On one hand it says that *all* spatiotemporal coordinates
> should be listed in this fashion (section 9.5), but also doesn't say
> explicitly that the "cf_role" variable should an auxiliary coordinate
> "station_info" variable in H4.)
> 
> This sounds like a defect to me. The conventions could perhaps replace
> "Where feasible a variable with the attribute cf_role should be included"
> with "Where feasible *an auxiliary coordinate *variable with the attribute
> cf_role should be included".
> 
> All the best,
> 
> David
> 
> On 28 February 2017 at 21:22, Bob Simons - NOAA Federal <bob.simons at noaa.gov
> > wrote:
> 
> > [This has degenerated into a debate about whether a given file is a valid
> > CF DSG file, but I'll continue.]
> >
> > Perhaps I am misunderstanding this, but it is very hard for me to
> > interpret H4 as "defective", as if it were a minor error within an
> > otherwise valid CF DSG file.
> >
> > The description right above it says "When the intention of a data variable
> > is to contain only a single time series, *the preferred encoding* is a
> > special case of the multidimensional array representation." [emphasis mine]
> >
> > That reference to an encoding that is a special case of the
> > multidimensional array representation is almost certainly a reference to an
> > entire paragraph in section 9.2 which starts with
> > "If there is only a single feature to be stored in a data variable, there
> > is no need for an instance dimension and it is permitted to omit it." and
> > then discusses that.
> >
> > So the whole point of H4 is to give an example where there is an *implied*
> > time series dimension, not an actual one.  [I'm not happy about this type
> > of file either, but CF DSG seems to explicitly allow it, and even actively
> > encourage it. I'm just trying to follow the rules.]
> >
> > ---
> > And if H4 is defective, several groups that now make this type of file are
> > going to be surprised to hear it.
> >
> >
> > On Tue, Feb 28, 2017 at 12:16 PM, Jonathan Gregory <
> > j.m.gregory at reading.ac.uk> wrote:
> >
> >> Dear Rich, Bob
> >>
> >> Rich asked
> >> > What is the difference that makes example CF H.4 okay, but not Bob's
> >> example?
> >>
> >> That's a good question. You're right, I think example H4 has the same
> >> problem! I didn't read it carefully enough - sorry. The
> >> station_name:cf_role
> >> attribute in H4 says it's a station ID, but there's no association with
> >> the
> >> data variable. If you had more than one data variable in the file, as in
> >> Bob's
> >> example as he gave it originally, you couldn't tell which one it belonged
> >> to,
> >> so it can't be used as identification. In examples H6 and H7, however,
> >> where
> >> there are several timeseries, there is an instance (station) dimension,
> >> and the
> >> timeseries_id variable is station_name(station, name_strlen). So here's
> >> it's
> >> clear that the station_name belongs to the stations, and you can infer
> >> that
> >> it's an array of strings, not a 2D character array.
> >>
> >> Therefore I think H4 and H5 are defective.
> >>
> >> In Bob's example, he has several data variables (five, I think) each of
> >> size
> >> 996, and a variable
> >>
> >>     char timeseries(timeseries=10);
> >>       :cf_role = "timeseries_id";
> >>       :long_name = "timeseries";
> >>
> >> I don't know what this refers to - that's my problem. Does it belong to
> >> any
> >> of the data variables? The dimension timeseries is not otherwise used.
> >>
> >> Best wishes
> >>
> >> Jonathan
> >>
> >>
> >> On Mon, Feb 27, 2017 at 02:41:49PM -0500, Signell, Richard wrote:
> >> > Date: Mon, 27 Feb 2017 14:41:49 -0500
> >> > From: "Signell, Richard" <rsignell at usgs.gov>
> >> > To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> >> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> >> >
> >> > On Mon, Feb 27, 2017 at 1:11 PM, Jonathan Gregory
> >> > <j.m.gregory at reading.ac.uk> wrote:
> >> > > Dear Bob
> >> > >
> >> > > That's right, there doesn't have to be an instance dimension. The
> >> problem with
> >> > > the file is that the variable you're concerned with (timeseries)
> >> isn't linked
> >> > > to any of the other variables, so its purpose is not clear.
> >> > >
> >> >
> >> > Jonathan,
> >> >
> >> >
> >> >
> >> > #1 CF Example H.4
> >> > http://cfconventions.org/cf-conventions/v1.6.0/cf-convention
> >> s.html#example-h.4
> >> >
> >> > dimensions:
> >> >       time = 100233 ;
> >> >       name_strlen = 23 ;
> >> >
> >> >    variables:
> >> >       float lon ;
> >> >           lon:standard_name = "longitude";
> >> >           lon:long_name = "station longitude";
> >> >           lon:units = "degrees_east";
> >> >       float lat ;
> >> >           lat:standard_name = "latitude";
> >> >           lat:long_name = "station latitude" ;
> >> >           lat:units = "degrees_north" ;
> >> >       float alt ;
> >> >           alt:long_name = "vertical distance above the surface" ;
> >> >           alt:standard_name = "height" ;
> >> >           alt:units = "m";
> >> >           alt:positive = "up";
> >> >           alt:axis = "Z";
> >> >       char station_name(name_strlen) ;
> >> >           station_name:long_name = "station name" ;
> >> >           station_name:cf_role = "timeseries_id";
> >> >
> >> >       double time(time) ;
> >> >           time:standard_name = "time";
> >> >           time:long_name = "time of measurement" ;
> >> >           time:units = "days since 1970-01-01 00:00:00" ;
> >> >           time:missing_value = -999.9;
> >> >       float humidity(time) ;
> >> >           humidity:standard_name = “specific_humidity” ;
> >> >           humidity:coordinates = "time lat lon alt" ;
> >> >           humidity:_FillValue = -999.9;
> >> >       float temp(time) ;
> >> >           temp:standard_name = “air_temperature” ;
> >> >           temp:units = "Celsius" ;
> >> >           temp:coordinates = "time lat lon alt" ;
> >> >           temp:_FillValue = -999.9;
> >> >
> >> >    attributes:
> >> >           :featureType = "timeSeries";
> >> >
> >> >
> >> > #2 Bob's example:
> >> >
> >> > netcdf summary_allTB2007.nc {
> >> >   dimensions:
> >> >     timeseries = 10;
> >> >     time = 996;
> >> >   variables:
> >> >     char timeseries(timeseries=10);
> >> >       :cf_role = "timeseries_id";
> >> >       :long_name = "timeseries";
> >> >
> >> >     double time(time=996);
> >> >       :units = "seconds since 1970-01-01T00:00:00Z";
> >> >       :standard_name = "time";
> >> >       :long_name = "time";
> >> >       :calendar = "gregorian";
> >> >       :axis = "T";
> >> >
> >> >     double latitude;
> >> >       :valid_min = -90.0; // double
> >> >       :valid_max = 90.0; // double
> >> >       :axis = "Y";
> >> >       :long_name = "latitude";
> >> >       :standard_name = "latitude";
> >> >       :units = "degrees_north";
> >> >
> >> >     double longitude;
> >> >       :valid_min = -180.0; // double
> >> >       :valid_max = 180.0; // double
> >> >       :axis = "X";
> >> >       :long_name = "longitude";
> >> >       :standard_name = "longitude";
> >> >       :units = "degrees_east";
> >> >
> >> >     double depth;
> >> >       :positive = "down";
> >> >       :axis = "Z";
> >> >       :valid_min = 0.0; // double
> >> >       :valid_max = 10971.0; // double
> >> >       :long_name = "depth";
> >> >       :standard_name = "depth";
> >> >       :units = "m";
> >> >
> >> >     char platform;
> >> >       :long_name = "MVCO ASIT";
> >> >
> >> >     char instrument;
> >> >       :long_name = "Imaging FlowCytobot";
> >> >
> >> >     double crs;
> >> >       :grid_mapping_name = "latitude_longitude";
> >> >       :longitude_of_prime_meridian = 0.0; // double
> >> >       :semi_major_axis = 6378137.0; // double
> >> >       :inverse_flattening = 298.257223563; // double
> >> >       :epsg_code = "EPSG:4326";
> >> >
> >> >     double Phaeocystis(time=996);
> >> >       :_FillValue = -9999.9; // double
> >> >       :long_name = "Phaeocystis";
> >> >       :standard_name = "Phaeocystis";
> >> >       :units = "1";
> >> >       :coordinates = "time depth latitude longitude";
> >> >       :grid_mapping = "crs";
> >> >       :platform = "platform";
> >> >       :instrument = "instrument";
> >> >
> >> >   // global attributes:
> >> >   :featureType = "timeSeries";
> >> >   :cdm_data_type = "TimeSeries";
> >> >   :Conventions = "CF-1.6";
> >> >   :summary = "Phytoplankton concentration by class derived from images
> >> > collected by Imaging FlowCytobot\n";
> >> >   :institution = "WHOI";
> >> >   :title = "Phytoplankton concentration by class";
> >> > }
> >> >
> >> > --
> >> > Dr. Richard P. Signell   (508) 457-2229
> >> > USGS, 384 Woods Hole Rd.
> >> > Woods Hole, MA 02543-1598
> >>
> >
> >
> >
> > --
> > Sincerely,
> >
> > Bob Simons
> > IT Specialist
> > Environmental Research Division
> > NOAA Southwest Fisheries Science Center
> > 99 Pacific St., Suite 255A      (New!)
> > Monterey, CA 93940               (New!)
> > Phone: (831)333-9878 <(831)%20333-9878>            (New!)
> > Fax:   (831)648-8440 <(831)%20648-8440>
> > Email: bob.simons at noaa.gov
> >
> > The contents of this message are mine personally and
> > do not necessarily reflect any position of the
> > Government or the National Oceanic and Atmospheric Administration.
> > <>< <>< <>< <>< <>< <>< <>< <>< <><
> >
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
> >
> 
> 
> -- 
> David Hassell
> National Centre for Atmospheric Science
> Department of Meteorology, University of Reading,
> Earley Gate, PO Box 243, Reading RG6 6BB
> Tel: +44 118 378 5613
> http://www.met.reading.ac.uk/



More information about the CF-metadata mailing list