[CF-metadata] example H4 (was Pre-proposal for "charset")

Jonathan Gregory j.m.gregory at reading.ac.uk
Fri Mar 3 07:25:18 MST 2017


Dear David

As Bob has separately pointed out, my last posting might be less clear than it
should be because the sentence starting "What if you had" was garbled. I think
H4 is not ambiguous if that's the entire contents of the file, but mostly the
CF examples don't imply that. Is there a rule that you can't have data vars
from more that one station in the file? If it's legal to do that, consider:

dimensions:
  time = 100233 ;
  name_strlen = 23 ;
variables:
  float lon1 ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat1 ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  float lon2 ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat2 ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  char station_name(name_strlen) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
    time:missing_value = -999.9;
  float temp1(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lat1 lon1" ;
    temp:_FillValue = -999.9;
  float temp2(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lat2 lon2" ;
    temp:_FillValue = -999.9;
attributes:
  :featureType = "timeSeries";

You cannot tell which of the locations is described by the station_name. That
is my objection to example H4.

Best wishes

Jonathan


----- Forwarded message from David Hassell <david.hassell at ncas.ac.uk> -----

> Date: Fri, 3 Mar 2017 09:02:01 +0000
> From: David Hassell <david.hassell at ncas.ac.uk>
> To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> CC: Bob Simons - NOAA Federal <bob.simons at noaa.gov>, CF Metadata
> 	<cf-metadata at cgd.ucar.edu>, "Signell, Richard" <rsignell at usgs.gov>
> Subject: Re: example H4 (was Pre-proposal for "charset")
> 
> Hello,
> 
> I still think that H4 is OK: The featureType attribute is set and the
> station_name variable has the cf_role attribute, so I think that it is
> clear that it applies to all of the variables dimensioned "time" - just as
> clear as if there were more an explicit station dimension. Even if the
> featureType were not set in the file (which is allowed for orthogonal
> multidimensional array representations) then the presence of cf_role
> attribute would still be enough, I think.
> 
> That said, I still support Jonathan's first suggestion:
> 
> > require there to be a station dimension if you want to include any
> station-
> > related variables in the file which aren't listed in the coordinates
> attribute.
> 
> I am currently writing software to parse DGSs, and having infer these
> relationships (rather than getting them directly from the coordinates
> attribute) is, for me, challenging the goal that the conventions make it
> "practical to write software" (section1.1)!
> 
> All the best,
> 
> David
> 
> On 2 March 2017 at 19:26, Jonathan Gregory <j.m.gregory at reading.ac.uk>
> wrote:
> 
> > Dear Bob et al.
> >
> > Yes, this thread is currently not about your charset proposal, but about
> > H4,
> > because of its being an example. My request for an example arises because I
> > guess that when CF metadata contains char arrays, it should already be
> > clear
> > from their purpose (within CF) whether they are characters or strings.
> >
> > However that doesn't seem to work in the case of H4, as you say. As David
> > says,
> > > I think that such a variable contains spatial information for the
> > > implied instance dimension. Therefore I think that the "cf_role" variable
> > > should be an auxiliary coordinate variable and treated accordingly, thus
> > > removing the ambiguity that Jonathan points out.
> > ... but it isn't. I recall discussing this point with Steve Hankin and John
> > Caron when ch 9 was being drafted, but I hadn't seen the full implications.
> > John argued that we shouldn't require all station variables to be listed in
> > the coordinates attribute, thus making them formally auxiliary coordinate
> > variables, in case there were lots of them. He argued that association by
> > sharing the same dimension is sufficient. Although it's not like the rest
> > of
> > the CF convention, I would say, association in this way *is* sufficient,
> > when
> > there *is* a station dimension. In this case there isn't!
> >
> > What if you had, in the same file, two data variables, containing a
> > timeseries
> > timeseries from a single locations, with no station dimension, so each
> > timeseries has just a time-dimension. You can include some station
> > identifier
> > with cf_role="timeseries_id", but there's no way to know which station it
> > identifies, so this is useless. As far as I know it's legal - or have I
> > over-
> > looked something? If it's legal now, I would say this is the first case
> > we've
> > had where the CF convention is actually defective - I mean, not just the
> > text,
> > but the design of the convention - so we must change it.
> >
> > It could be fixed by any of these methods:
> >
> > * require there to be a station dimension if you want to include any
> > station-
> > related variables in the file which aren't listed in the coordinates
> > attribute.
> >
> > * require all station variables to be listed in the coordinates attribute
> > (if
> > there is no station dimension).
> >
> > * require all data variables in the file to apply to the same station, so
> > that
> > the file deals with only one location (if there is no station dimension).
> >
> > Of these, I dislike the last one, because it puts a requirement on the
> > file as
> > a whole, whereas most of CF deals with data variable, for which files are
> > just
> > containers.
> >
> > Best wishes
> >
> > Jonathan
> >
> >
> > On Wed, Mar 01, 2017 at 08:39:10AM +0000, David Hassell wrote:
> > > Date: Wed, 1 Mar 2017 08:39:10 +0000
> > > From: David Hassell <david.hassell at ncas.ac.uk>
> > > To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
> > > CC: Jonathan Gregory <j.m.gregory at reading.ac.uk>, CF Metadata
> > >  <cf-metadata at cgd.ucar.edu>, "Signell, Richard" <rsignell at usgs.gov>
> > > Subject: Re: [CF-metadata] example Re: Pre-proposal for "charset"
> > >
> > > Hello,
> > >
> > > On the validity of H4 ...
> > >
> > > The problem for me (in H4 and Bob's file) is that the "cf_role" variable
> > is
> > > not listed by the coordinates attribute of any of the data variables to
> > > which it applies. But should it be? To my reading the conventions are a
> > > little vague. On one hand it says that *all* spatiotemporal coordinates
> > > should be listed in this fashion (section 9.5), but also doesn't say
> > > explicitly that the "cf_role" variable should an auxiliary coordinate
> > > "station_info" variable in H4.)
> > >
> > > This sounds like a defect to me. The conventions could perhaps replace
> > > "Where feasible a variable with the attribute cf_role should be included"
> > > with "Where feasible *an auxiliary coordinate *variable with the
> > attribute
> > > cf_role should be included".
> > >
> > > All the best,
> > >
> > > David
> > >
> > > On 28 February 2017 at 21:22, Bob Simons - NOAA Federal <
> > bob.simons at noaa.gov
> > > > wrote:
> > >
> > > > [This has degenerated into a debate about whether a given file is a
> > valid
> > > > CF DSG file, but I'll continue.]
> > > >
> > > > Perhaps I am misunderstanding this, but it is very hard for me to
> > > > interpret H4 as "defective", as if it were a minor error within an
> > > > otherwise valid CF DSG file.
> > > >
> > > > The description right above it says "When the intention of a data
> > variable
> > > > is to contain only a single time series, *the preferred encoding* is a
> > > > special case of the multidimensional array representation." [emphasis
> > mine]
> > > >
> > > > That reference to an encoding that is a special case of the
> > > > multidimensional array representation is almost certainly a reference
> > to an
> > > > entire paragraph in section 9.2 which starts with
> > > > "If there is only a single feature to be stored in a data variable,
> > there
> > > > is no need for an instance dimension and it is permitted to omit it."
> > and
> > > > then discusses that.
> > > >
> > > > So the whole point of H4 is to give an example where there is an
> > *implied*
> > > > time series dimension, not an actual one.  [I'm not happy about this
> > type
> > > > of file either, but CF DSG seems to explicitly allow it, and even
> > actively
> > > > encourage it. I'm just trying to follow the rules.]
> > > >
> > > > ---
> > > > And if H4 is defective, several groups that now make this type of file
> > are
> > > > going to be surprised to hear it.
> > > >
> > > >
> > > > On Tue, Feb 28, 2017 at 12:16 PM, Jonathan Gregory <
> > > > j.m.gregory at reading.ac.uk> wrote:
> > > >
> > > >> Dear Rich, Bob
> > > >>
> > > >> Rich asked
> > > >> > What is the difference that makes example CF H.4 okay, but not Bob's
> > > >> example?
> > > >>
> > > >> That's a good question. You're right, I think example H4 has the same
> > > >> problem! I didn't read it carefully enough - sorry. The
> > > >> station_name:cf_role
> > > >> attribute in H4 says it's a station ID, but there's no association
> > with
> > > >> the
> > > >> data variable. If you had more than one data variable in the file, as
> > in
> > > >> Bob's
> > > >> example as he gave it originally, you couldn't tell which one it
> > belonged
> > > >> to,
> > > >> so it can't be used as identification. In examples H6 and H7, however,
> > > >> where
> > > >> there are several timeseries, there is an instance (station)
> > dimension,
> > > >> and the
> > > >> timeseries_id variable is station_name(station, name_strlen). So
> > here's
> > > >> it's
> > > >> clear that the station_name belongs to the stations, and you can infer
> > > >> that
> > > >> it's an array of strings, not a 2D character array.
> > > >>
> > > >> Therefore I think H4 and H5 are defective.
> > > >>
> > > >> In Bob's example, he has several data variables (five, I think) each
> > of
> > > >> size
> > > >> 996, and a variable
> > > >>
> > > >>     char timeseries(timeseries=10);
> > > >>       :cf_role = "timeseries_id";
> > > >>       :long_name = "timeseries";
> > > >>
> > > >> I don't know what this refers to - that's my problem. Does it belong
> > to
> > > >> any
> > > >> of the data variables? The dimension timeseries is not otherwise used.
> > > >>
> > > >> Best wishes
> > > >>
> > > >> Jonathan
> > > >>
> > > >>
> > > >> On Mon, Feb 27, 2017 at 02:41:49PM -0500, Signell, Richard wrote:
> > > >> > Date: Mon, 27 Feb 2017 14:41:49 -0500
> > > >> > From: "Signell, Richard" <rsignell at usgs.gov>
> > > >> > To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> > > >> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> > > >> >
> > > >> > On Mon, Feb 27, 2017 at 1:11 PM, Jonathan Gregory
> > > >> > <j.m.gregory at reading.ac.uk> wrote:
> > > >> > > Dear Bob
> > > >> > >
> > > >> > > That's right, there doesn't have to be an instance dimension. The
> > > >> problem with
> > > >> > > the file is that the variable you're concerned with (timeseries)
> > > >> isn't linked
> > > >> > > to any of the other variables, so its purpose is not clear.
> > > >> > >
> > > >> >
> > > >> > Jonathan,
> > > >> >
> > > >> >
> > > >> >
> > > >> > #1 CF Example H.4
> > > >> > http://cfconventions.org/cf-conventions/v1.6.0/cf-convention
> > > >> s.html#example-h.4
> > > >> >
> > > >> > dimensions:
> > > >> >       time = 100233 ;
> > > >> >       name_strlen = 23 ;
> > > >> >
> > > >> >    variables:
> > > >> >       float lon ;
> > > >> >           lon:standard_name = "longitude";
> > > >> >           lon:long_name = "station longitude";
> > > >> >           lon:units = "degrees_east";
> > > >> >       float lat ;
> > > >> >           lat:standard_name = "latitude";
> > > >> >           lat:long_name = "station latitude" ;
> > > >> >           lat:units = "degrees_north" ;
> > > >> >       float alt ;
> > > >> >           alt:long_name = "vertical distance above the surface" ;
> > > >> >           alt:standard_name = "height" ;
> > > >> >           alt:units = "m";
> > > >> >           alt:positive = "up";
> > > >> >           alt:axis = "Z";
> > > >> >       char station_name(name_strlen) ;
> > > >> >           station_name:long_name = "station name" ;
> > > >> >           station_name:cf_role = "timeseries_id";
> > > >> >
> > > >> >       double time(time) ;
> > > >> >           time:standard_name = "time";
> > > >> >           time:long_name = "time of measurement" ;
> > > >> >           time:units = "days since 1970-01-01 00:00:00" ;
> > > >> >           time:missing_value = -999.9;
> > > >> >       float humidity(time) ;
> > > >> >           humidity:standard_name = “specific_humidity” ;
> > > >> >           humidity:coordinates = "time lat lon alt" ;
> > > >> >           humidity:_FillValue = -999.9;
> > > >> >       float temp(time) ;
> > > >> >           temp:standard_name = “air_temperature” ;
> > > >> >           temp:units = "Celsius" ;
> > > >> >           temp:coordinates = "time lat lon alt" ;
> > > >> >           temp:_FillValue = -999.9;
> > > >> >
> > > >> >    attributes:
> > > >> >           :featureType = "timeSeries";
> > > >> >
> > > >> >
> > > >> > #2 Bob's example:
> > > >> >
> > > >> > netcdf summary_allTB2007.nc {
> > > >> >   dimensions:
> > > >> >     timeseries = 10;
> > > >> >     time = 996;
> > > >> >   variables:
> > > >> >     char timeseries(timeseries=10);
> > > >> >       :cf_role = "timeseries_id";
> > > >> >       :long_name = "timeseries";
> > > >> >
> > > >> >     double time(time=996);
> > > >> >       :units = "seconds since 1970-01-01T00:00:00Z";
> > > >> >       :standard_name = "time";
> > > >> >       :long_name = "time";
> > > >> >       :calendar = "gregorian";
> > > >> >       :axis = "T";
> > > >> >
> > > >> >     double latitude;
> > > >> >       :valid_min = -90.0; // double
> > > >> >       :valid_max = 90.0; // double
> > > >> >       :axis = "Y";
> > > >> >       :long_name = "latitude";
> > > >> >       :standard_name = "latitude";
> > > >> >       :units = "degrees_north";
> > > >> >
> > > >> >     double longitude;
> > > >> >       :valid_min = -180.0; // double
> > > >> >       :valid_max = 180.0; // double
> > > >> >       :axis = "X";
> > > >> >       :long_name = "longitude";
> > > >> >       :standard_name = "longitude";
> > > >> >       :units = "degrees_east";
> > > >> >
> > > >> >     double depth;
> > > >> >       :positive = "down";
> > > >> >       :axis = "Z";
> > > >> >       :valid_min = 0.0; // double
> > > >> >       :valid_max = 10971.0; // double
> > > >> >       :long_name = "depth";
> > > >> >       :standard_name = "depth";
> > > >> >       :units = "m";
> > > >> >
> > > >> >     char platform;
> > > >> >       :long_name = "MVCO ASIT";
> > > >> >
> > > >> >     char instrument;
> > > >> >       :long_name = "Imaging FlowCytobot";
> > > >> >
> > > >> >     double crs;
> > > >> >       :grid_mapping_name = "latitude_longitude";
> > > >> >       :longitude_of_prime_meridian = 0.0; // double
> > > >> >       :semi_major_axis = 6378137.0; // double
> > > >> >       :inverse_flattening = 298.257223563; // double
> > > >> >       :epsg_code = "EPSG:4326";
> > > >> >
> > > >> >     double Phaeocystis(time=996);
> > > >> >       :_FillValue = -9999.9; // double
> > > >> >       :long_name = "Phaeocystis";
> > > >> >       :standard_name = "Phaeocystis";
> > > >> >       :units = "1";
> > > >> >       :coordinates = "time depth latitude longitude";
> > > >> >       :grid_mapping = "crs";
> > > >> >       :platform = "platform";
> > > >> >       :instrument = "instrument";
> > > >> >
> > > >> >   // global attributes:
> > > >> >   :featureType = "timeSeries";
> > > >> >   :cdm_data_type = "TimeSeries";
> > > >> >   :Conventions = "CF-1.6";
> > > >> >   :summary = "Phytoplankton concentration by class derived from
> > images
> > > >> > collected by Imaging FlowCytobot\n";
> > > >> >   :institution = "WHOI";
> > > >> >   :title = "Phytoplankton concentration by class";
> > > >> > }
> > > >> >
> > > >> > --
> > > >> > Dr. Richard P. Signell   (508) 457-2229
> > > >> > USGS, 384 Woods Hole Rd.
> > > >> > Woods Hole, MA 02543-1598
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely,
> > > >
> > > > Bob Simons
> > > > IT Specialist
> > > > Environmental Research Division
> > > > NOAA Southwest Fisheries Science Center
> > > > 99 Pacific St., Suite 255A      (New!)
> > > > Monterey, CA 93940               (New!)
> > > > Phone: (831)333-9878 <(831)%20333-9878>            (New!)
> > > > Fax:   (831)648-8440 <(831)%20648-8440>
> > > > Email: bob.simons at noaa.gov
> > > >
> > > > The contents of this message are mine personally and
> > > > do not necessarily reflect any position of the
> > > > Government or the National Oceanic and Atmospheric Administration.
> > > > <>< <>< <>< <>< <>< <>< <>< <>< <><
> > > >
> > > >
> > > > _______________________________________________
> > > > CF-metadata mailing list
> > > > CF-metadata at cgd.ucar.edu
> > > > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> > > >
> > > >
> > >
> > >
> > > --
> > > David Hassell
> > > National Centre for Atmospheric Science
> > > Department of Meteorology, University of Reading,
> > > Earley Gate, PO Box 243, Reading RG6 6BB
> > > Tel: +44 118 378 5613
> > > http://www.met.reading.ac.uk/
> >
> 
> 
> 
> -- 
> David Hassell
> National Centre for Atmospheric Science
> Department of Meteorology, University of Reading,
> Earley Gate, PO Box 243, Reading RG6 6BB
> Tel: +44 118 378 5613
> http://www.met.reading.ac.uk/

----- End forwarded message -----



More information about the CF-metadata mailing list