[CF-metadata] Pre-proposal for "charset"

Bob Simons - NOAA Federal bob.simons at noaa.gov
Tue Mar 7 12:08:13 MST 2017


Jonathan, I believe that you place an unreasonable burden on
general-purpose software readers of netcdf-3 files, which you expect to
include AI-like code which completely "understands" all possible CF files,
just so it can tell the difference between char variables meant to be
interpreted as chars and char variables meant to be interpreted as strings
(by collapsing the rightmost dimension). The supreme authorities, you and
David Hassell (your own employee?!), couldn't even agree on whether H4 was
a valid CF file. How can you then demand that software do better?

It is easy/trivial for software reading a netcdf-4 file (as defined in NUG)
to distinguish char variables and String variables, why is it so wrong to
ask for the same ease/clarity with netcdf-3 files?
Part of my effort here was to start dealing with the massive rift between
CF (which only covers netcdf-3 files) and NUG (which covers netcdf-3 and
netcdf-4 files). Isn't that a reasonable goal?

And even if you ignore the issue of distinguishing chars from strings,
there is still no attribute in CF to specify the character set for char
scalars and char arrays that are to be interpreted as chars.
You can't say "_Encoding" because the default for _Encoding is "UTF-8",
which is not a valid option for char scalars and char arrays because it may
span multiple chars. The list of valid character sets for char scalars and
char arrays (in netcdf-3 and netcdf-4 files) must be different from the
list of valid _Encodings for strings. A different attribute, e.g., charset,
is needed for chars (as opposed to strings) in netcdf-3 and netcdf-4 files.



On Tue, Mar 7, 2017 at 9:03 AM, Jonathan Gregory <j.m.gregory at reading.ac.uk>
wrote:

> Dear Chris
>
> > We need to be "clear" about what we mean by "the intent is clear". I
> think
> > that much of the point of CF is to be as explicit as possible, -- i.e.
> the
> > reader of a CF file should not have to know anything about how given data
> > tends to be used in order to determine what data type an array should be
> > (or what shape it should be).
>
> Yes, I agree with that. However, if you're reading a CF file, you aren't
> just reading plain variables. If you're using/writing software which knows
> how to interpret the file following the CF convention, it should know what
> the "intent" is, in a CF context, of each of the variables of interest.
> For example, you know that an auxiliary coordinate variable of char data
> must
> be a vector of strings, and the trailing or only dimension is the max
> string
> length. If you came across this variable when scanning all the variables in
> a netCDF file, with no interest in CF, you wouldn't know that it was an
> array
> of strings, but if you are using it as a CF aux coord var, you do know
> that,
> so I don't think any further signal is needed - it would be redundant.
>
> Best wishes
>
> Jonathan
>
> ----- Forwarded message from Chris Barker <chris.barker at noaa.gov> -----
>
> > Date: Mon, 6 Mar 2017 11:16:35 -0800
> > From: Chris Barker <chris.barker at noaa.gov>
> > To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> > CC: "cf-metadata at cgd.ucar.edu" <cf-metadata at cgd.ucar.edu>
> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> >
> > On Mon, Mar 6, 2017 at 9:47 AM, Jonathan Gregory <
> j.m.gregory at reading.ac.uk>
> > wrote:
> >
> > > Yes, we can reopen the ticket. I think the _Encoding for char is a good
> > > idea,
> > > especially if it's an NUG convention.
> >
> >
> > so let's do that part at least.
> >
> > > Are there any files out in the wild that DO use ND arrays of NC_CHAR
> that
> > > > are not intended to be interpreted as a (N-1)D array of Strings?
> > >
> > > That is the question. In particular, since this the CF convention we're
> > > talking about, are there any char arrays which are part of CF,
> >
> >
> > indeed.
> >
> >
> > > where the
> > > intent is not clear?
> > >
> > We need to be "clear" about what we mean by "the intent is clear". I
> think
> > that much of the point of CF is to be as explicit as possible, -- i.e.
> the
> > reader of a CF file should not have to know anything about how given data
> > tends to be used in order to determine what data type an array should be
> > (or what shape it should be).
> >
> > I saw this an an author of sometimes generic tools -- the tool should be
> > able to read the file, and produce the appropriate native array for the
> > task at hand, without knowing something like: "ahh, this is the ID of a
> > Acme-ocean-widget -- those use char IDs -- so this must be a char" --
> > Humans can do that -- software can't (not easily anyway!)
> >
> > And clearly specifying whether a char array is a char array or a string
> > array will better unify netcdf3 and netcdf4.
> >
> > netcdf4 can be explicit about it -- netcdf3 can't -- so it'd be nice if
> CF
> > could fill that gap.
> >
> > Now that I think about it, this really should be a netcdf convention --
> > like _FillValue, but this is a CF list....
> >
> > -CHB
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R            (206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115       (206) 526-6317   main reception
> >
> > Chris.Barker at noaa.gov
>
> ----- End forwarded message -----
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>



-- 
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170307/8f59f95e/attachment-0001.html>


More information about the CF-metadata mailing list