[CF-metadata] Pre-proposal for "charset"

Karl Taylor taylor13 at llnl.gov
Thu Mar 9 14:44:25 MST 2017


Dear Bob and all,

I have not had time to follow this thread in detail, but a remark (in 
the most recent email) that seemed unnecessarily sarcastic caught my 
eye, and impelled me to look into what might have led to this dip in our 
normally courteous, respectful discourse.  As Chair of the CF Governance 
Panel, I feel obliged to remind everyone that the success (and fun!) of 
our endeavor depends on enthusiastic engagement, which is only 
discouraged if what ought to be earnest debate and substantive argument 
is disrupted (however briefly and unintentionally) by remarks that could 
be interpreted as being even slightly derogatory.  There are no "supreme 
authorities" here (although some are much more knowledgeable than 
others).  We progress by consensus, and only respectful contributions to 
the discussion can be tolerated.

Enough said about that.

Addressing only the part of the "pre-proposal" that suggests there is a 
need to explicitly distinguish strings from characters (not the part of 
the proposal that deals with the flavor of the 7 or 8 bit representation 
of characters):

1)  Note that the H.4 example being discussed was slightly modified (I 
think on 29th February 2016), and now includes "station_name" in the 
list of coordinates, thus explicitly linking it to the humidity and temp 
variables.  This along with the fact that station_name is *not* included 
as a dimension for these variables allows you to infer that this is a 
*single* station described by a character string of length 23, and not 
23 stations with single character i.d.'s.

2)  If the coordinate dimension is *required* in this case (and 
currently it may not be), then software should be able to unambiguously 
interpret things.  [This requirement was suggestion 2 (of 3), made my 
Jonathan in one of his earlier comments.]

best regards,
Karl



On 3/7/17 11:08 AM, Bob Simons - NOAA Federal wrote:
> Jonathan, I believe that you place an unreasonable burden on 
> general-purpose software readers of netcdf-3 files, which you expect 
> to include AI-like code which completely "understands" all possible CF 
> files, just so it can tell the difference between char variables meant 
> to be interpreted as chars and char variables meant to be interpreted 
> as strings (by collapsing the rightmost dimension). The supreme 
> authorities, you and David Hassell (your own employee?!), couldn't 
> even agree on whether H4 was a valid CF file. How can you then demand 
> that software do better?
>
> It is easy/trivial for software reading a netcdf-4 file (as defined in 
> NUG) to distinguish char variables and String variables, why is it so 
> wrong to ask for the same ease/clarity with netcdf-3 files?
> Part of my effort here was to start dealing with the massive rift 
> between CF (which only covers netcdf-3 files) and NUG (which covers 
> netcdf-3 and netcdf-4 files). Isn't that a reasonable goal?
>
> And even if you ignore the issue of distinguishing chars from strings, 
> there is still no attribute in CF to specify the character set for 
> char scalars and char arrays that are to be interpreted as chars.
> You can't say "_Encoding" because the default for _Encoding is 
> "UTF-8", which is not a valid option for char scalars and char arrays 
> because it may span multiple chars. The list of valid character sets 
> for char scalars and char arrays (in netcdf-3 and netcdf-4 files) must 
> be different from the list of valid _Encodings for strings. A 
> different attribute, e.g., charset, is needed for chars (as opposed to 
> strings) in netcdf-3 and netcdf-4 files.
>
>
>
> On Tue, Mar 7, 2017 at 9:03 AM, Jonathan Gregory 
> <j.m.gregory at reading.ac.uk <mailto:j.m.gregory at reading.ac.uk>> wrote:
>
>     Dear Chris
>
>     > We need to be "clear" about what we mean by "the intent is
>     clear". I think
>     > that much of the point of CF is to be as explicit as possible,
>     -- i.e. the
>     > reader of a CF file should not have to know anything about how
>     given data
>     > tends to be used in order to determine what data type an array
>     should be
>     > (or what shape it should be).
>
>     Yes, I agree with that. However, if you're reading a CF file, you
>     aren't
>     just reading plain variables. If you're using/writing software
>     which knows
>     how to interpret the file following the CF convention, it should
>     know what
>     the "intent" is, in a CF context, of each of the variables of
>     interest.
>     For example, you know that an auxiliary coordinate variable of
>     char data must
>     be a vector of strings, and the trailing or only dimension is the
>     max string
>     length. If you came across this variable when scanning all the
>     variables in
>     a netCDF file, with no interest in CF, you wouldn't know that it
>     was an array
>     of strings, but if you are using it as a CF aux coord var, you do
>     know that,
>     so I don't think any further signal is needed - it would be redundant.
>
>     Best wishes
>
>     Jonathan
>
>     ----- Forwarded message from Chris Barker <chris.barker at noaa.gov
>     <mailto:chris.barker at noaa.gov>> -----
>
>     > Date: Mon, 6 Mar 2017 11:16:35 -0800
>     > From: Chris Barker <chris.barker at noaa.gov
>     <mailto:chris.barker at noaa.gov>>
>     > To: Jonathan Gregory <j.m.gregory at reading.ac.uk
>     <mailto:j.m.gregory at reading.ac.uk>>
>     > CC: "cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>"
>     <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
>     > Subject: Re: [CF-metadata] Pre-proposal for "charset"
>     >
>     > On Mon, Mar 6, 2017 at 9:47 AM, Jonathan Gregory
>     <j.m.gregory at reading.ac.uk <mailto:j.m.gregory at reading.ac.uk>>
>     > wrote:
>     >
>     > > Yes, we can reopen the ticket. I think the _Encoding for char
>     is a good
>     > > idea,
>     > > especially if it's an NUG convention.
>     >
>     >
>     > so let's do that part at least.
>     >
>     > > Are there any files out in the wild that DO use ND arrays of
>     NC_CHAR that
>     > > > are not intended to be interpreted as a (N-1)D array of Strings?
>     > >
>     > > That is the question. In particular, since this the CF
>     convention we're
>     > > talking about, are there any char arrays which are part of CF,
>     >
>     >
>     > indeed.
>     >
>     >
>     > > where the
>     > > intent is not clear?
>     > >
>     > We need to be "clear" about what we mean by "the intent is
>     clear". I think
>     > that much of the point of CF is to be as explicit as possible,
>     -- i.e. the
>     > reader of a CF file should not have to know anything about how
>     given data
>     > tends to be used in order to determine what data type an array
>     should be
>     > (or what shape it should be).
>     >
>     > I saw this an an author of sometimes generic tools -- the tool
>     should be
>     > able to read the file, and produce the appropriate native array
>     for the
>     > task at hand, without knowing something like: "ahh, this is the
>     ID of a
>     > Acme-ocean-widget -- those use char IDs -- so this must be a
>     char" --
>     > Humans can do that -- software can't (not easily anyway!)
>     >
>     > And clearly specifying whether a char array is a char array or a
>     string
>     > array will better unify netcdf3 and netcdf4.
>     >
>     > netcdf4 can be explicit about it -- netcdf3 can't -- so it'd be
>     nice if CF
>     > could fill that gap.
>     >
>     > Now that I think about it, this really should be a netcdf
>     convention --
>     > like _FillValue, but this is a CF list....
>     >
>     > -CHB
>     >
>     > --
>     >
>     > Christopher Barker, Ph.D.
>     > Oceanographer
>     >
>     > Emergency Response Division
>     > NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959>   voice
>     > 7600 Sand Point Way NE (206) 526-6329
>     <tel:%28206%29%20526-6329>   fax
>     > Seattle, WA  98115 (206) 526-6317 <tel:%28206%29%20526-6317> 
>      main reception
>     >
>     > Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
>
>     ----- End forwarded message -----
>     _______________________________________________
>     CF-metadata mailing list
>     CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>     http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>     <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>
>
>
>
> -- 
> Sincerely,
>
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A      (New!)
> Monterey, CA 93940               (New!)
> Phone: (831)333-9878            (New!)
> Fax:   (831)648-8440
> Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
>
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170309/ae8c64f3/attachment-0001.html>


More information about the CF-metadata mailing list