[CF-metadata] Pre-proposal for "charset"

Bob Simons - NOAA Federal bob.simons at noaa.gov
Wed Feb 22 10:58:11 MST 2017


I agree that writing rules for determining if the chars should be
interpreted as chars or a String based on the dimensions would be very
difficult. I don't intend to even try. I just want to add data_type as a
strongly recommended way to specify how the chars should be interpreted
(chars vs string) and charset to specify the character encoding.

On Wed, Feb 22, 2017 at 1:01 AM, David Hassell <david.hassell at ncas.ac.uk>
wrote:

> Hello Bob,
>
> Thanks for the example. I fully agree that this case is hard for humans as
> well as computers. Since the timeseries variable has the same name as the
> timeseries dimension (and therefore also identifies it is a size 10
> coordinate variable) it makes it very hard, even with the hint given by the
> cf_role attribute. There are examples elsewhere in CF, of hard things made
> easier, such as the axis attribute for coordinates, and I think that this
> is a good case for another one.
>
> The default interpretation of char arrays (section 2.2) would need
> tightening up, of course. It might be enough to say that an array of chars
> may be interpreted as strings if "the dimensions imply it", or if the
> data_type is set to string.
>
> It might, however, be hard to write down rules for "if the dimensions
> imply it". You could start with "if it is not a data variable and the its
> trailing dimension is not spanned the data variable which requires it", but
> life gets more complicated when you start with a DSG file, which may
> contain a char variable that spans a dimension which a data variable spans
> only implicitly (e.g. in your example the timeseries variable spans the
> implied instance dimension of size 1).
>
> Thanks,
>
> David
>
> On 21 February 2017 at 23:31, Bob Simons - NOAA Federal <
> bob.simons at noaa.gov> wrote:
>
>> You requested a sample file which demonstrates the need for a "data_type"
>> attribute for char variables to distinguish Strings from true chars.,,
>>
>> Here is a file that I was just given which is a good example.
>> It is a valid CF DSG file.
>> The cf_role=timeseries_id variable appears as char[10].
>> So just looking at that variable: is it one string (with 10 characters),
>> or an array with 10 values (each a char)?
>> Yes, a human can think about the whole file and come to a conclusion of
>> which it "must" be, but that is very, very hard or impossible for a
>> computer program to figure out (I could be wrong).
>>
>> But if the timeseries variable had the proposed attribute
>>   :data_type="string"
>> it would be trivial for the software to know that this variable should be
>> interpreted as one string (not 10 separate chars).
>>
>> I hope that was what you were looking for. If not, please tell me why not
>> and I'll find another example,
>>
>> netcdf summary_allTB2007.nc {
>>   dimensions:
>>     timeseries = 10;
>>     time = 996;
>>   variables:
>>     char timeseries(timeseries=10);
>>       :cf_role = "timeseries_id";
>>       :long_name = "timeseries";
>>
>>     double time(time=996);
>>       :units = "seconds since 1970-01-01T00:00:00Z";
>>       :standard_name = "time";
>>       :long_name = "time";
>>       :calendar = "gregorian";
>>       :axis = "T";
>>
>>     double latitude;
>>       :valid_min = -90.0; // double
>>       :valid_max = 90.0; // double
>>       :axis = "Y";
>>       :long_name = "latitude";
>>       :standard_name = "latitude";
>>       :units = "degrees_north";
>>
>>     double longitude;
>>       :valid_min = -180.0; // double
>>       :valid_max = 180.0; // double
>>       :axis = "X";
>>       :long_name = "longitude";
>>       :standard_name = "longitude";
>>       :units = "degrees_east";
>>
>>     double depth;
>>       :positive = "down";
>>       :axis = "Z";
>>       :valid_min = 0.0; // double
>>       :valid_max = 10971.0; // double
>>       :long_name = "depth";
>>       :standard_name = "depth";
>>       :units = "m";
>>
>>     char platform;
>>       :long_name = "MVCO ASIT";
>>
>>     char instrument;
>>       :long_name = "Imaging FlowCytobot";
>>
>>     double crs;
>>       :grid_mapping_name = "latitude_longitude";
>>       :longitude_of_prime_meridian = 0.0; // double
>>       :semi_major_axis = 6378137.0; // double
>>       :inverse_flattening = 298.257223563; // double
>>       :epsg_code = "EPSG:4326";
>>
>>     double Asterionellopsis(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Asterionellopsis";
>>       :standard_name = "Asterionellopsis";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>     double Cerataulina(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Cerataulina";
>>       :standard_name = "Cerataulina";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>     double Ceratium(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Ceratium";
>>       :standard_name = "Ceratium";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>     double Chaetoceros(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Chaetoceros";
>>       :standard_name = "Chaetoceros";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>     double Corethron(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Corethron";
>>       :standard_name = "Corethron";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>     double Coscinodiscus(time=996);
>>       :_FillValue = -9999.9; // double
>>       :long_name = "Coscinodiscus";
>>       :standard_name = "Coscinodiscus";
>>       :units = "1";
>>       :coordinates = "time depth latitude longitude";
>>       :grid_mapping = "crs";
>>       :platform = "platform";
>>       :instrument = "instrument";
>>
>>   // global attributes:
>>   :featureType = "timeSeries";
>>   :Conventions = "CF-1.6";
>>   :institution = "Obfuscated";
>>   :title = "Obfuscated";
>>  data:
>> }
>>
>>
>>> Date: Fri, 17 Feb 2017 17:46:45 +0000
>>> From: Jonathan Gregory <j.m.gregory at reading.ac.uk>
>>> To: cf-metadata at cgd.ucar.edu
>>> Subject: Re: [CF-metadata] Pre-proposal for "charset"
>>> Message-ID: <20170217174645.GA9244 at met.reading.ac.uk>
>>> Content-Type: text/plain; charset=us-ascii
>>>
>>> Dear Bob
>>>
>>> I agree that sometimes char data is characters and sometimes strings,
>>> and one
>>> can't tell which it is without knowing the intended use of the array
>>> concerned.
>>> When you do know the role of this array e.g. as a quality flag data
>>> variable,
>>> or a string-valued auxiliary coordinary variable, then you know also
>>> whether
>>> it's a string or an array of characters. Can you give an example where
>>> one
>>> needs to know how a char array should be interpreted but you *don't*
>>> know what
>>> its purpose is within the CF-netCDF file?
>>>
>>> Best wishes
>>>
>>> Jonathan
>>>
>>> ----- Forwarded message from Bob Simons - NOAA Federal <
>>> bob.simons at noaa.gov> -----
>>>
>>> > Date: Wed, 8 Feb 2017 10:00:32 -0800
>>> > From: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
>>> > To: CF Metadata <CF-metadata at cgd.ucar.edu>
>>> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
>>> >
>>> > I think my original pre-proposal has a significant flaw and needs to be
>>> > revised.
>>> > The problem is: charset needs to be specifiable for all char arrays,
>>> > regardless of whether the values should be interpreted as Strings or
>>> > individual chars.
>>> >
>>> > I see two basic solutions:
>>> >
>>> > 1) Two attributes, but a given variable would only use one of them. The
>>> > first part of the attribute name specifies the data type:
>>> >   char_charset = "ISO-8859-1";   //identifies a char variable using
>>> > ISO-8859-1
>>> > or
>>> >   string_charset = "ISO-8859-1";   //identifies a String variable using
>>> > ISO-8859-1
>>> >
>>> > 2) Two attributes that would both be specified for every char/String
>>> > variable, e.g.,
>>> >   charset = "ISO-8859-1";
>>> >   data_type = "String";             //or "char"
>>> >
>>> > In either case, the charsets allowed for char (not String) data must be
>>> > restricted to single code page (e.g, "ISO-8859-1") because other
>>> encodings
>>> > (e.g., "UTF-8") need multiple bytes for some characters..
>>> >
>>> > ---
>>> > I have a slight preference (2), because it is cleaner and might be
>>> better
>>> > in the future (I don't know the implications for nc4 and CF2).
>>> >
>>> > Thoughts? Votes?
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Feb 6, 2017 at 3:08 PM, Bob Simons - NOAA Federal <
>>> > bob.simons at noaa.gov> wrote:
>>> >
>>> > > Before I make a formal CF proposal for a "charset" attribute, I
>>> would like
>>> > > to get comments and suggestions from all of you.
>>> > >
>>> > > This is a proposal to solve the problem of distinguishing strings
>>> from
>>> > > arrays of characters and the problem of identifying the string's
>>> character
>>> > > encoding. Presumably, it would be appended to section 2.2.
>>> > >
>>> > > An example of actual need is: Many/most current uses of
>>> multidimensional
>>> > > char arrays are intended to be interpreted as Strings. But some
>>> files,
>>> > > e.g., Argo profile float profiles, have single char data that are
>>> stored in
>>> > > char arrays.
>>> > >
>>> > > Another example, while most nc files just use 7-bit ASCII characters
>>> in
>>> > > strings, some use 8-bit characters. Some such files appear to use
>>> > > charset=Windows-1252, others use Mac OS Roman, others use
>>> ISO-8859-1, but
>>> > > the the charset is not specified and there is currently no official
>>> CF way
>>> > > to specify it.
>>> > >
>>> > > Another advantage of this proposal is that it provides a way to
>>> support
>>> > > Unicode (and thus all of the world's languages) via the UTF-8
>>> encoding
>>> > > which is useful as we increasingly work with people from non-US,
>>> > > non-European countries.
>>> > >
>>> > > A possible extension of this is to allow a few special additional
>>> > > pseudo-charset names:
>>> > > * "HTML" - the chars are to be interpreted as an array of Strings
>>> with
>>> > > HTML content, using the ISO-8859-1 charset. Non-ISO-8859-1  must be
>>> encoded
>>> > > using the &#d; format where d is the decimal number of a Unicode
>>> character.
>>> > > * "XML" -  the chars are to be interpreted as a an array of Strings
>>> with
>>> > > XML content, using the ISO-8859-1 charset. Non-ISO-8859-1 characters
>>> must
>>> > > be encoded using the &#d; format where d is the decimal number of a
>>> Unicode
>>> > > character.
>>> > >
>>> > > Thank you for considering this.
>>> > >
>>> > >
>>> > > --- The Actual Pre-Proposal
>>> > > Use the "charset" attribute to indicate that a multidimensional
>>> > > char array should be interpreted as an array of Strings,
>>> > > not an array of individual characters.
>>> > > The value of "charset" also serves to specify the character set
>>> > > used to encode the strings
>>> > > and must be the name of one of the 8-bit encodings
>>> > > (since CF chars are 8-bits) listed at
>>> > > http://www.iana.org/assignments/character-sets/character-sets.xhtml
>>> .
>>> > > Charset names are case-insensitive.
>>> > > The only charsets which are recommended are "ISO-8859-1" and "UTF-8".
>>> > > For backwards compatibility, if "charset" is not defined,
>>> > > it remains ambiguous whether a char array should be interpreted as
>>> > > holding an array of individual characters or an array of Strings.
>>> > >
>>> > >
>>> > > --- An Example: Encoding three Strings: "It", "Book", and "5 €".
>>> > > The Unicode code point for the Euro symbol is 20AC (in hexadecimal),
>>> > > which is 8364 (in decimal).
>>> > > The Euro symbol is encoded in UTF-8 as 3 bytes: E2 82 AC (in
>>> hexadecimal).
>>> > > So a file would store these strings in a char array as:
>>> > >   dimensions
>>> > >     words = 3;
>>> > >     strLen = 5;
>>> > >   char myWords[words][strLen] = "It[0][0][0]", "Book[0]", "5
>>> [E2][82][AC]";
>>> > >     charset = "UTF-8";
>>> > >
>>> > >
>>> > > --
>>> > > Sincerely,
>>> > >
>>> > > Bob Simons
>>> > > IT Specialist
>>> > > Environmental Research Division
>>> > > NOAA Southwest Fisheries Science Center
>>> > > 99 Pacific St., Suite 255A      (New!)
>>> > > Monterey, CA 93940               (New!)
>>> > > Phone: (831)333-9878 <(831)%20333-9878> <(831)%20333-9878>
>>>   (New!)
>>> > > Fax:   (831)648-8440 <(831)%20648-8440> <(831)%20648-8440>
>>> > > Email: bob.simons at noaa.gov
>>> > >
>>> > > The contents of this message are mine personally and
>>> > > do not necessarily reflect any position of the
>>> > > Government or the National Oceanic and Atmospheric Administration.
>>> > > <>< <>< <>< <>< <>< <>< <>< <>< <><
>>> > >
>>> > >
>>> >
>>> >
>>> > --
>>> > Sincerely,
>>> >
>>> > Bob Simons
>>> > IT Specialist
>>> > Environmental Research Division
>>> > NOAA Southwest Fisheries Science Center
>>> > 99 Pacific St., Suite 255A      (New!)
>>> > Monterey, CA 93940               (New!)
>>> > Phone: (831)333-9878            (New!)
>>> > Fax:   (831)648-8440
>>> > Email: bob.simons at noaa.gov
>>> >
>>> > The contents of this message are mine personally and
>>> > do not necessarily reflect any position of the
>>> > Government or the National Oceanic and Atmospheric Administration.
>>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
>>>
>>> > _______________________________________________
>>> > CF-metadata mailing list
>>> > CF-metadata at cgd.ucar.edu
>>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>>
>>>
>>> ----- End forwarded message -----
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Fri, 17 Feb 2017 13:22:29 -0600
>>> From: David Blodgett <dblodgett at usgs.gov>
>>> To: CF Metadata <cf-metadata at cgd.ucar.edu>
>>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>>>         for Simple Features
>>> Message-ID: <D53CABCC-2BEF-4D8C-8551-93A3D967B7CB at usgs.gov>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> All,
>>>
>>> I haven?t heard much follow up, but here?s a doodle to coordinate a
>>> phone conversation about this. I think we have west-coast US participants
>>> and EU participants, so I chose times mid to late morning for me (midwest
>>> US).
>>>
>>> http://doodle.com/poll/eikarnt35tdm7igd <http://doodle.com/poll/eikarn
>>> t35tdm7igd>
>>>
>>> Will make a call once a few people have expressed interest and we have a
>>> clear day/time.
>>>
>>> Regards,
>>>
>>> - Dave
>>>
>>> > On Feb 6, 2017, at 11:29 AM, David Blodgett <dblodgett at usgs.gov>
>>> wrote:
>>> >
>>> > Dear CF,
>>> >
>>> > I want to follow up on the conversation here with an alternative
>>> approach suggested off list primarily between Jonathan and I. For this, I?m
>>> going to focus on use cases satisfied and simplification of the proposal
>>> allowed by not supporting those use cases. The changes below are largely
>>> driven by a desire to better align this proposal with the technical details
>>> of the prior art that is CF.
>>> >
>>> > If we:
>>> > 1) don?t support node sharing, we can remove the complication of node
>>> - coordinate indexing / indirection, simplifying the proposal pretty
>>> significantly.
>>> > 2) don?t use ?break values? to indicate the separation between
>>> multi-part geometries and polygon holes, we end up with a data model with
>>> an extra dimension, but the NetCDF dimensions align with the natural
>>> dimensions of the data.
>>> > 3) use ?count? instead of a ?start pointer? approach, we are better
>>> aligned with the existing DSG contiguous ragged array approach.
>>> >
>>> > Coming back to the three directions we could take this proposal from
>>> my cover letter on February 2nd.
>>> >> Direct use of Well-Known Text (WKT). In this approach, well known
>>> text strings would be encoded using character arrays following a contiguous
>>> ragged array approach to index the character array by geometry (or instance
>>> in DSG parlance).
>>> >> Implement the WKT approach using a NetCDF binary array. In this
>>> approach, well known text separators (brackets, commas and spaces) for
>>> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
>>> break type separator values like -1 for multiparts and -2 for holes.
>>> >> Implement the fundamental dimensions of geometry data in NetCDF. In
>>> this approach, additional dimensions and variables along those dimensions
>>> would be introduced to represent geometries, geometry parts, geometry
>>> nodes, and unique (potentially shared) coordinate locations for nodes to
>>> reference.
>>> > The alternative I?m outlining here moves in the direction of 3. We had
>>> originally discounted it because it becomes very verbose and seems overly
>>> complicated if support for coordinate sharing is a requirement. If the
>>> three simplifications described above are used, then the third approach
>>> seems more tenable.
>>> >
>>> > Jonathan has also suggested that: (these are in reaction to the CDL in
>>> my letter from February 2nd)
>>> > 1) Rename geom_coordinates as node_coordinates, for consistency with
>>> UGRID.
>>> > 2) Omit node_dimension. This is redundant, since the dimension can be
>>> found by
>>> > examining the node coordinate variables.
>>> > 3) Prescribe numerous ?codes? and assumptions in the specification
>>> instead of letting them be described with attribute values.
>>> > 4) It would be more consistent with CF and UGRID to use a single
>>> container variable to hang all the topology/geometry information from.
>>> >
>>> > Which I, personally, am happy to accept if others don?t object.
>>> >
>>> > A couple other suggestions from Jonathan I want to discuss a bit more:
>>> > 1) Rename geometry as topology and geom_type as topology_type.
>>> >       While I?d be open to something other than geom, topology is odd.
>>> If this is really ?node_collection_topology_type? I guess I could be
>>> convinced, but would be curious how people react to this. (Especially in
>>> relation to UGRID)
>>> > 2) This extension is more appropriate as an extension to the concept
>>> of cell bounds than the addition of a complex time-invariate type of
>>> discrete sampling geometry.
>>> >       Having just re-read the cell bounds chapter, I think it would
>>> over complicate the cell bounds to include this material. My basic issue
>>> here is that these geometries do not necessarily have a reference location.
>>> They are, rather, first order entities that need to be treated as such.
>>> That said, it makes sense that these geometries are not necessarily a good
>>> fit for the original intent of Discrete Sampling Geometries. Jonathan
>>> suggested they may belong in their own chapter, which may be a good
>>> alternative? MY suggested CDL below might lead us in the direction of this
>>> being a special type of auxiliary coordinate variable.
>>> >
>>> > This alternative starts to look like the CDL pasted below.
>>> >
>>> > Note that the issue of coordinates is sticking out like a sore thumb.
>>> Below, I?ve attempted to reconcile Jonathan?s ideas regarding coordinates
>>> with my thoughts about how these geometries are ?first order entities? that
>>> don?t have a single representative x and y. The spatial coordinates can be
>>> said to reside in the system of geometries described in the ?sf? container
>>> variable? I realize this goes against the idea of coordinates a bit, but I
>>> think it is holding with the spirit of the attribute?
>>> >
>>> > Finally, I?m glad to continue answering questions and debating things
>>> via the list to a point, but I think it would be in our interest to arrange
>>> a telecom to discuss this stuff further with a list of interested parties.
>>> Feel free to follow up on list, but for decision making, let?s not let this
>>> rabbit hole go too deep. I?ll plan on letting this and the other recent
>>> action on this proposal settle with people for a week or two then start to
>>> bring together a conference call (or calls depending on time zones). Please
>>> respond to me off list if you are interested in being part of a call to
>>> discuss.
>>> >
>>> > Regards,
>>> >
>>> > - Dave
>>> >
>>> > netcdf multipolygon_example {
>>> > dimensions:
>>> >  node = 47 ;
>>> >  part = 9 ;
>>> >  instance = 3 ;
>>> >  time = 5 ;
>>> >  strlen = 5 ;
>>> > variables:
>>> >  char instance_name(instance, strlen) ;
>>> >    instance_name:cf_role = "timeseries_id" ;
>>> >  double someVariable(instance) ;
>>> >    someVariable:long_name = "a variable describing a single-valued
>>> attribute of a polygon" ;
>>> >    someVariable:coordinates = "sf" ; // or "instance_name"?
>>> >  int time(time) ;
>>> >    time:units = "days since 2000-01-01" ;
>>> >  double someData(instance, time) ;
>>> >    someData:coordinates = "time sf" ; // or "time instance_name"?
>>> >    someData:featureType = "timeSeries" ;
>>> >    someData:geometry="sf";
>>> >  int sf; // containing variable -- datatype irrelevant because no data
>>> >    sf:geom_type = "multipolygon" ; // could be node_topology_type?
>>> >    sf:node_count_variable="node_count";
>>> >    sf:node_coordinates = "x y" ;
>>> >    sf:part_count = "part_node_count" ;
>>> >    sf:part_type = "part_type" ; // Note required unless polygons with
>>> holes present.
>>> >    sf:outer_ring_order = "anticlockwise" ; // not required if written
>>> in spec?
>>> >    sf:closure_convention = "last_node_equals_first" ; // not required
>>> if written in spec?
>>> >    sf:outer_type_code = 0 ; // not required if written in spec?
>>> >    sf:inner_type_code = 1 ; // not required if written in spec?
>>> >  int node_count(instance);
>>> >    node_count:long_name = ?count of coordinates in each instance
>>> geometry" ;
>>> >  int part_node_count(part) ;
>>> >    part_node_count:long_name = ?count of coordinates in each geometry
>>> part" ;
>>> >  int part_type(part) ;
>>> >    part_type:long_name = ?type of each geometry part" ;
>>> >  double x(node) ;
>>> >    x:units = "degrees_east" ;
>>> >    x:standard_name = "longitude" ; // or projection_x_coordinate
>>> >    X:cf_role = "geometry_x_node" ;
>>> >  double y(node) ;
>>> >    y:units = "degrees_north" ;
>>> >    y:standard_name = ?latitude? ; // or projection_y_coordinate
>>> >    y:cf_role = "geometry_y_node"
>>> > // global attributes:
>>> >     :Conventions = "CF-1.8" ;
>>> >
>>> > data:
>>> >
>>> >  instance_name =
>>> >   "flash",
>>> >   "bang",
>>> >   "pow" ;
>>> >
>>> >  someVariable = 1, 2, 3 ;
>>> >
>>> >  time = 1, 2, 3, 4, 5 ;
>>> >
>>> >  someData =
>>> >   1, 2, 3, 4, 5,
>>> >   1, 2, 3, 4, 5,
>>> >   1, 2, 3, 4, 5 ;
>>> >
>>> >  node_count = 25, 15, 7 ;
>>> >
>>> >  part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
>>> >
>>> >  part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
>>> >
>>> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9,
>>> 7,
>>> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45,
>>> -20, -30, -20, -20, -30, 30,
>>> >     45, 10, 30, 25, 50, 30, 25 ;
>>> >
>>> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25,
>>> 25, 29,
>>> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20,
>>> -35, -20, -15, -25, -20, 20,
>>> >     40, 40, 20, 5, 10, 15, 5 ;
>>> > }
>>> >
>>> >
>>> >
>>> >> On Feb 4, 2017, at 8:07 AM, David Blodgett <dblodgett at usgs.gov
>>> <mailto:dblodgett at usgs.gov>> wrote:
>>> >>
>>> >> Dear Chris,
>>> >>
>>> >> Thanks for your thorough treatment of these issues. We have gone
>>> through a similar thought process to arrive at the proposal we came up
>>> with. I?ll answer as briefly as I can.
>>> >>
>>> >> 1) how would you translate between netcdf geometries and, say geo
>>> JSON?
>>> >>
>>> >> The thinking is that node coordinate sharing is optional. If the
>>> writer wants to check or already knows that nodes share coordinates, then
>>> it?s possible. Otherwise, it doesn?t have to be used. I?ve always felt that
>>> this was important, but maybe not critical for a core NetCDF-CF data model.
>>> Some offline conversation has led to an example that does not use it that
>>> may be a good alternative, more on that later.
>>> >>
>>> >> 2) Break Values
>>> >>
>>> >> You really do have to hold your nose on the break values. The issue
>>> is that you have to store that information somehow and it is almost worse
>>> to create new variables to store the multi-part and hole/not hole
>>> information. The alternative approach that?s forming up as mentioned above
>>> does break the information out into additional variables but simplifies
>>> things otherwise. In that case it doesn?t feel overly complex to me? so
>>> stay tuned for more on this front.
>>> >>
>>> >> 3) Ragged Indexing
>>> >>
>>> >> Your thought process follows ours exactly. The key is that you either
>>> have to create the ?pointer? array as a first order of business or loop
>>> over the counts ad nauseam. I?m actually leaning toward the counts for two
>>> reasons. First, the counts approach is already in CF so is a natural fit
>>> and will be familiar to developers in this space. Second, the issue of 0 vs
>>> 1 indexing is annoying. In our proposal, we settled on 0 indexing because
>>> it aligns with the idea of an offset, but it is still annoying and some
>>> applications would always have to adjust that pointer array as a first
>>> order of business.
>>> >>
>>> >> On to Bob?s comments.
>>> >>
>>> >> Regarding aligning with other data models / encodings, I guess this
>>> needs to be unpacked a bit.
>>> >>
>>> >> 1) In this setting, simple features is a data model, not an encoding.
>>> An encoding can implement part or all of a data model as is needed by the
>>> use case(s) at hand. There is no problem with partial implementations you
>>> still get interoperability for the intended use cases.
>>> >> 2) Attempting to align with other encoding standards UGRID and
>>> NetCDF-CF are the primary ones here, is simply to keep the implementation
>>> patterns similar and familiar. This may be a fools errand, but is
>>> presumably good for adoptability and consistency.
>>> >> So, I don?t see a problem with implementing important simple features
>>> types in a way that aligns with the way the existing community standards
>>> work.
>>> >>
>>> >> I don?t see this as ignoring existing standards at all. There is no
>>> open community standard for binary encoding of geometries and related data
>>> that passes the CF requirements of human readability and self-description.
>>> We are adopting the appropriate data model and suggesting a new encoding
>>> that will solve a lot of problems in the environmental modeling space.
>>> >>
>>> >> As we?ve discussed before, your "different approach? sounds great,
>>> but seems like an exercise for a future effort that doesn?t attempt to
>>> align with CF 1.7. Maybe what you suggest is a path forward for variable
>>> length arrays in the CF 2.0 ?vision in the mist?, but I don?t see it as a
>>> tenable solution for CF 1.*.
>>> >>
>>> >> Best Regards,
>>> >>
>>> >> - Dave
>>> >>
>>> >>
>>> >>> On Feb 3, 2017, at 3:31 PM, Chris Barker <chris.barker at noaa.gov
>>> <mailto:chris.barker at noaa.gov>> wrote:
>>> >>>
>>> >>> a few thoughts. First, I think there are three core "issues" that
>>> need to be resolved:
>>> >>>
>>> >>> 1) Coordinate indexing (indirection)
>>> >>>
>>> >>> the question of whether you have an array of "vertices" that the
>>> geomotry types index into to get thier data:
>>> >>>
>>> >>> Advantages:
>>> >>>  - if a number of geometries share a lot of vertices, it can be more
>>> efficient
>>> >>>  - the relationship between geometries that share vertices (i.e.
>>> polygons that share a boundary) etc. is well defined. you dopnt need to
>>> check for closeness, and maybe have a tolerance, etc.
>>> >>>
>>> >>> These were absolutely critical for UGRID for example -- a UGRID mesh
>>> is a single thing", NOT a collection of polygons that happen to share some
>>> vertices.
>>> >>>
>>> >>> Disadvantages:
>>> >>>  -  if the geometries do not share many vertices, it is less
>>> efficient.
>>> >>>  -  there are additional code complications in "getting" the
>>> vertices of the given geometry
>>> >>>  - it does not match the OGC data model.
>>> >>>
>>> >>> My 0.02 -- given my use cases, I tend to want teh advantages -- but
>>> I don't know that that's a typical use case. And I think it's a really good
>>> idea to keep with the OGS data model where possible -- i.e. e able to
>>> translate from netcdf to, say, geoJSON as losslessly as possible. Given
>>> that I think it's probably a better idea not to have the indirection.
>>> >>>
>>> >>> However (to equivocate) perhaps the types of information people are
>>> likely to want to store in netcdf are a subset of what the OGC standards
>>> are designed for -- and for those use-cases, maybe shared vertices are
>>> critical.
>>> >>>
>>> >>> One way to think about it -- how would you translate between netcdf
>>> geometries and, say geo JSON:
>>> >>>   - nc => geojson would lose the shared index info.
>>> >>>   - geojson => nc -- would you try to reconstruct the shared
>>> vertices?? I"m thinking that would be a bit dangerous in the general case,
>>> because you are adding information that you don't know is true -- are these
>>> a shared vertex or two that just happen to be at the same location?
>>> >>>
>>> >>> > > Break values
>>> >>>
>>> >>> I don't really like break values as an approach, but with netcdf any
>>> option will be ugly one way or another. So keeping with the WKT approach
>>> makes sense to me. Either way you'll need custom code to unpack it. (BTW --
>>> what does WellKnownBinary do?)
>>> >>>
>>> >>> > > Ragged indexing
>>> >>>
>>> >>> There are two "natural" ways to represent a ragged array:
>>> >>>
>>> >>> (a) store the length of each "row"
>>> >>> (b) store the index to the beginning (or end) or each "row"
>>> >>>
>>> >>> CF already uses (a). However, working with it, I'm pretty convinced
>>> that it's the "wrong" choice:
>>> >>>
>>> >>> If you want to know how long a given row is, that is really easy
>>> with (a), and almost as easy with (b) (involves two indexes and a
>>> subtraction)
>>> >>>
>>> >>> However, if you want to extract a particular row: (b) makes this
>>> really easy -- you simply access the slice of the array you want. with (a)
>>> you need to loop through the entire "length_of_rows" array (up to the row
>>> of interest) and add up the values to find the slice you need. not a huge
>>> issue, but it is an issue. In fact, in my code to read ragged arrays in
>>> netcdf, the first thing I do is pre-compute the index-to-each-row, so I can
>>> then use that to access individual rows for future access -- if  you are
>>> accessing via OpenDAP -- that's particular helpful.
>>> >>>
>>> >>> So -- (b) is clearly (to me) the "best" way to do it -- but is it
>>> worth introducing a second way to handle ragged arrays in CF? I would think
>>> yes, but that would be offset if:
>>> >>>
>>> >>>  - There is a bunch of existing library code that transparently
>>> handles ragged arrays in netcdf (does netcdfJava have something? I'm pretty
>>> sure Python doesn't -- certainly not in netCDF4)
>>> >>>
>>> >>>  - That that existing lib code would be advantageous to leverage for
>>> code reading features: I suspect that there will have to be enough custom
>>> code that the ragged array bits are going to be the least of it.
>>> >>>
>>> >>> So I'm for the "new" way of representing ragged arrays
>>> >>>
>>> >>> -CHB
>>> >>>
>>> >>>
>>> >>> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <
>>> bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>> wrote:
>>> >>> Then, isn't this proposal just the first step in the creation of a
>>> new model and a new encoding of Simple Features, one that is "align[ed] ...
>>> with as many other encoding standards in this space as is practical"? In
>>> other words, yet another standard for Simple Features?
>>> >>>
>>> >>> If so, it seems risky to me to take just the first (easy?) step "to
>>> support the use cases that have a compelling need today" and not solve the
>>> entire problem. I know the CF way is to just solve real, current needs, but
>>> in this case it seems to risk a head slap moment in the future when we
>>> realize that, in order to deal with some new simple feature variant, we
>>> should have done things differently from the beginning?
>>> >>>
>>> >>> And it seems odd to reject existing standards that have been so
>>> painstakingly hammered out, in favor of starting the process all over
>>> again.  We follow existing standards for other things (e.g., IEEE-754 for
>>> representing floating point numbers in binary files), why can't we follow
>>> an existing Simple Features standard?
>>> >>>
>>> >>> ---
>>> >>> Rather than just be a naysayer, let me suggest a very different
>>> alternative:
>>> >>>
>>> >>> There are several projects in the CF realm (e.g., this Simple
>>> Features project, Discrete Sampling Geometry (DSG), true variable-length
>>> Strings, ugrid(?)) which share a common underlying problem: how to deal
>>> with variable-length multidimensional arrays: a[b][c], where the length of
>>> the c dimension may be different for different b indices.
>>> >>> DSG solved this (5 different ways!), but only for DSG.
>>> >>> The Simple Features proposal seeks to solve the problem for Simple
>>> Features.
>>> >>> We still have no support for Unicode variable-length Strings.
>>> >>>
>>> >>> Instead of continuing to solve the variable-length problem a
>>> different way every time we confront it, shouldn't we solve it once, with
>>> one small addition to the standard, and then use that solution repeatedly?
>>> >>> The solution could be a simple variant of one of the DSG solutions,
>>> but generalized so that it could be used in different situations.
>>> >>> An encoding standard and built-in support for variable-length data
>>> arrays in netcdf-java/c would solve a lot of problems, now and in the
>>> future.
>>> >>> Some work on this is already done: I think the netcdf-java API
>>> already supports variable-length arrays when reading netcdf-4 files.
>>> >>> For Simple Features, the problem would reduce to: store the feature
>>> (using some specified existing standard like WKT or WKB) in a
>>> variable-length array.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Fri, Feb 3, 2017 at 9:07 AM, <cf-metadata-request at cgd.ucar.edu
>>> <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
>>> >>> Date: Fri, 3 Feb 2017 11:07:00 -0600
>>> >>> From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov
>>> >>
>>> >>> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov <mailto:
>>> bob.simons at noaa.gov>>
>>> >>> Cc: CF Metadata <cf-metadata at cgd.ucar.edu <mailto:
>>> cf-metadata at cgd.ucar.edu>>
>>> >>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>>> >>>         for Simple Features
>>> >>> Message-ID: <8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov <mailto:
>>> 8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov>>
>>> >>> Content-Type: text/plain; charset="utf-8"
>>> >>>
>>> >>> Dear Bob,
>>> >>>
>>> >>> I?ll just take these in line.
>>> >>>
>>> >>> 1) noted. We have been trying to figure out what to do with the
>>> point featureType and I think leaving it more or less alone is a viable
>>> path forward.
>>> >>>
>>> >>> 2) This is not an exact replica of WKT, but rather a similar
>>> approach to WKT. As I stated, we have followed the ISO simple features data
>>> model and well known text feature types in concept, but have not used the
>>> same standardization formalisms. We aren?t advocating for supporting ?all
>>> of? any standard but are rather attempting to support the use cases that
>>> have a compelling need today while aligning this with as many other
>>> encoding standards in this space as is practical. Hopefully that answers
>>> your question, sorry if it?s vague.
>>> >>>
>>> >>> 3) The google doc linked in my response contains the encoding we are
>>> proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
>>> http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> I
>>> want to stress, as a starting point for discussion. I expect that this
>>> proposal will change drastically before we?re done.
>>> >>>
>>> >>> 4) Absolutely envision tools doing what you say, convert to/from
>>> standard spatial formats and NetCDF-CF geometries. We intend to introduce
>>> an R and a Python implementation that does exactly as you say along with
>>> whatever form this standard takes in the end. R and Python were chosen as
>>> the team that brought this together are familiar with those two languages,
>>> additional implementations would be more than welcome.
>>> >>>
>>> >>> 5) We do include a ?geometry? featureType similar to the ?point?
>>> featureType. Thus our difficulty with what to do with the ?point?
>>> featureType. You are correct, there are lots of non timeSeries applications
>>> to be solved and this proposal does intend to support them (within the
>>> existing DSG constructs).
>>> >>>
>>> >>> Thanks for your questions, hopefully my answers close some gaps for
>>> you.
>>> >>>
>>> >>> - Dave
>>> >>>
>>> >>> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
>>> bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>> wrote:
>>> >>> >
>>> >>> > 1) There is a vague comment in the proposal about possibly
>>> changing the point featureType. Please don't, unless the changes don't
>>> affect current uses of Point. There are already 1000's of files that use
>>> it. If this new system offers an alternative, then fine, it's an
>>> alternative. One of the most important and useful features of a good
>>> standard is backwards compatibility.
>>> >>> >
>>> >>> > 2) You advocate "Implement the WKT approach using a NetCDF binary
>>> array." Is this system then an exact encoding of WKT, neither a subset nor
>>> a superset?  "Simple Features" are often not simple.
>>> >>> > If it is WKT (or something else), what is the standard you are
>>> following to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and
>>> ISO 19162:2015)?
>>> >>> > Does your proposal deviate in any way from the standard's
>>> capabilities?
>>> >>> > Do you advocate following the entire WKT standard, e.g.,
>>> supporting all the feature types that WKT supports?
>>> >>> >
>>> >>> > 3) Since you are not using the WKT encoding, but creating your
>>> own, where is the definition of the encoding system you are using?
>>> >>> >
>>> >>> > 4) This is a little out of CF scope, but:
>>> >>> > Do you envision tools, notably, netcdf-c/java, having a writer
>>> function that takes in WKT and encodes the information in a file, and
>>> having a reader function that reads the file and returns WKT? Or is it your
>>> plan that the encoding/ decoding is left to the user?
>>> >>> >
>>> >>> > 5) This proposal is for "Simple Features plus Time Series" (my
>>> phrase not yours). But aren't there lots of other uses of Simple Features?
>>> Will there be other proposals in the future for "Simple Features plus X"
>>> and "Simple Features plus Y"? If so, will CF eventually become a massive
>>> document where Simple Features are defined over and over again, but in
>>> different contexts? If so, wouldn't a better solution be to deal with
>>> Simple Features separately (as Postgres does by making a geometric data
>>> type?), and then add "Simple Features plus Time Series" as the first use of
>>> it?
>>> >>> >
>>> >>> > Thanks for answering these questions.
>>> >>> > Please forgive me if I missed parts of your proposal that answer
>>> these questions.
>>> >>> >
>>> >>> >
>>> >>> > On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu
>>> <mailto:cf-metadata-request at cgd.ucar.edu> <mailto:cf-metadata-request at cg
>>> d.ucar.edu <mailto:cf-metadata-request at cgd.ucar.edu>>> wrote:
>>> >>> > Date: Thu, 2 Feb 2017 07:57:36 -0600
>>> >>> > From: David Blodgett <dblodgett at usgs.gov <mailto:
>>> dblodgett at usgs.gov> <mailto:dblodgett at usgs.gov <mailto:
>>> dblodgett at usgs.gov>>>
>>> >>> > To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>
>>> <mailto:cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>>
>>> >>> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries
>>> for
>>> >>> >         Simple  Features
>>> >>> > Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov
>>> <mailto:224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov> <mailto:
>>> 224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:
>>> 224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>>
>>> >>> > Content-Type: text/plain; charset="utf-8"
>>> >>> >
>>> >>> > Dear CF Community,
>>> >>> >
>>> >>> > We are pleased to submit this proposal for your consideration and
>>> review. The cover letter we've prepared below provides some background and
>>> explanation for the proposed approach. The google doc here <
>>> http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <
>>> http://goo.gl/Kq9ASq>>> is an excerpt of the CF specification with
>>> track changes turned on. Permissions for the document allow any google user
>>> to comment, so feel free to comment and ask questions in line.
>>> >>> >
>>> >>> > Note that I?m sharing this with you with one issue unresolved.
>>> What to do with the point featureType? Our draft suggests that it is part
>>> of a new geometry featureType, but it could be that we leave it alone and
>>> introduce a geometry featureType. This may be a minor point of discussion,
>>> but we need to be clear that this is an issue that still needs to be
>>> resolved in the proposal.
>>> >>> >
>>> >>> > Thank you for your time and consideration.
>>> >>> >
>>> >>> > Best Regards,
>>> >>> >
>>> >>> > David Blodgett, Tim Whiteaker, and Ben Koziol
>>> >>> >
>>> >>> > Proposed Extension to NetCDF-CF for Simple Geometries
>>> >>> >
>>> >>> > Preface
>>> >>> >
>>> >>> > The proposed addition to NetCDF-CF introduced below is inspired by
>>> a pre-existing data model governed by OGC and ISO as ISO 19125-1. More
>>> information on Simple Features may be found here. <
>>> https://en.wikipedia.org/wiki/Simple_Features <
>>> https://en.wikipedia.org/wiki/Simple_Features> <
>>> https://en.wikipedia.org/wiki/Simple_Features <
>>> https://en.wikipedia.org/wiki/Simple_Features>>> To the knowledge of
>>> the authors, it is consistent with ISO 19125-1 but has not been specified
>>> using the formalisms of OGC or ISO. Language used attempts to hold true to
>>> NetCDF-CF semantics while not conflicting with the existing standards
>>> baseline. While this proposal does not support the entire scope of the the
>>> simple features ecosystem, it does support the core data types in most
>>> common use around the community.
>>> >>> >
>>> >>> > The other existing standard to mention is UGRID convention <
>>> http://ugrid-conventions.github.io/ugrid-conventions/ <
>>> http://ugrid-conventions.github.io/ugrid-conventions/> <
>>> http://ugrid-conventions.github.io/ugrid-conventions/ <
>>> http://ugrid-conventions.github.io/ugrid-conventions/>>>. The authors
>>> have experience reading and writing UGRID and have designed the proposed
>>> structure in a way that is inspired by and consistent with it.
>>> >>> >
>>> >>> > Terms and Definitions
>>> >>> >
>>> >>> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for
>>> Geographic information - Simple feature access - Part 1: Common
>>> architecture <http://www.opengeospatial.org/standards/sfa <
>>> http://www.opengeospatial.org/standards/sfa> <
>>> http://www.opengeospatial.org/standards/sfa <
>>> http://www.opengeospatial.org/standards/sfa>>>.)
>>> >>> >
>>> >>> > Feature: Abstraction of real world phenomena - typically a
>>> geospatial abstraction with associated descriptive attributes.
>>> >>> > Simple Feature: A feature with all geometric attributes described
>>> piecewise by straight line or planar interpolation between point sets.
>>> >>> > Geometry (geometric complex): A set of disjoint geometric
>>> primitives - one or more points, lines, or polygons that form the spatial
>>> representation of a feature.
>>> >>> > Introduction
>>> >>> >
>>> >>> > Discrete Sampling Geometries (DSGs) handle data from one (or a
>>> collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile
>>> or timeSeriesProfile geometries. Measurements are from a point (timeSeries
>>> and Profile) or points along a trajectory. In this proposal, we reuse the
>>> core DSG timeSeries type which provides support for basic time series use
>>> cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
>>> >>> >
>>> >>> > Changes to Existing CF Specification
>>> >>> >
>>> >>> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions
>>> and variables into two types ? instance and element <
>>> http://cfconventions.org/cf-conventions/cf-conventions.html
>>> #_collections_instances_and_elements <http://cfconventions.org/cf-c
>>> onventions/cf-conventions.html#_collections_instances_and_elements> <
>>> http://cfconventions.org/cf-conventions/cf-conventions.html
>>> #_collections_instances_and_elements <http://cfconventions.org/cf-c
>>> onventions/cf-conventions.html#_collections_instances_and_elements>>>.
>>> Instance refers to individual points, trajectories, profiles, etc. These
>>> would sometimes be referred to as features given that they are identified
>>> entities that can have associated attributes and be related to other
>>> entities. Element dimensions describe temporal or other dimensions to
>>> describe data on a per-instance basis. This proposal extends the DSG
>>> timeSeries featuretype <http://cfconventions.org/cf-c
>>> onventions/cf-conventions.html#_features_and_feature_types <http://cfcon
>>>  ventions.org/cf-conventions/cf-conventions.html#_features_a
>>> nd_feature_types> <http://cfconventions.org/cf-c
>>> onventions/cf-conventions.html#_features_and_feature_types <
>>> http://cfconventions.org/cf-conventions/cf-conventions.html
>>> #_features_and_feature_types>>> such that the geospatial coordinates of
>>> the instances can be point, multi-point, line, multi-line, polygon, or
>>> multi-polyg
>>> >>>  on geometries. Rather than overload the DSG contiguous ragged array
>>> encoding, designed with timeseries in mind, a geometry ragged array
>>> encoding is introduced in a new section 9.3.5. See thi
>>> >>> >  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq
>>> <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>>
>>> >>> > Motivation
>>> >>> >
>>> >>> > DSGs have no system to define a geometry (polyline, polygon, etc.,
>>> other than point) and an association with a time series that applies over
>>> that entire geometry e.g., The expected rainfall in this watershed polygon
>>> for some period of time is 10 mm. As suggested in the last paragraph of
>>> section 9.1, current practice is to assign a representative point or just
>>> use an ID and forgo spatial information within a NetCDF-CF file. In order
>>> to satisfy a number of environmental modeling use cases, we need a way to
>>> encode a geometry (point, line, polygon, multi-point, multi-line, or
>>> multi-polygon) that is the static spatial feature representation to which
>>> one or more timeSeries can be associated. In this proposal, we provide an
>>> encoding to define collections of simple feature geometries. It interfaces
>>> cleanly with the existing DSG specification, enabling DSGs and Simple
>>> Geometries to be used concurrently.
>>> >>> >
>>> >>> > Looking Forward
>>> >>> >
>>> >>> > This proposal is a compromise solution that attempts to stay
>>> consisten to CF ideals and fit within the structure of the existing
>>> specification with minimal disruption. Line and polygon data types often
>>> require variable length arrays. Development of this proposal has brought to
>>> light the need for a general abstraction for variable length arrays in
>>> NetCDF-CF. Such a general abstraction would necessarily be reusable for
>>> character arrays, ragged arrays of time series, and ragged arrays of
>>> geometry nodes, as well as any other ragged data structures that may come
>>> up in the future. This proposal does not introduce such a general ragged
>>> array abstraction but does not preclude such a development in the future.
>>> >>> >
>>> >>> > Three Alternative Approaches
>>> >>> >
>>> >>> > Respecting the human readability ideal of NetCDF-CF, the
>>> development of this proposal started from a human readable format for
>>> geometries known as Well Known Text <https://en.wikipedia.org/wiki
>>> /Well-known_text <https://en.wikipedia.org/wiki/Well-known_text> <
>>> https://en.wikipedia.org/wiki/Well-known_text <
>>> https://en.wikipedia.org/wiki/Well-known_text>>>. We considered three
>>> high level design approaches while developing this proposal.
>>> >>> >
>>> >>> > Direct use of Well-Known Text (WKT). In this approach, well known
>>> text strings would be encoded using character arrays following a contiguous
>>> ragged array approach to index the character array by geometry (or instance
>>> in DSG parlance).
>>> >>> > Implement the WKT approach using a NetCDF binary array. In this
>>> approach, well known text separators (brackets, commas and spaces) for
>>> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
>>> break type separator values like -1 for multiparts and -2 for holes.
>>> >>> > Implement the fundamental dimensions of geometry data in NetCDF.
>>> In this approach, additional dimensions and variables along those
>>> dimensions would be introduced to represent geometries, geometry parts,
>>> geometry nodes, and unique (potentially shared) coordinate locations for
>>> nodes to reference.
>>> >>> > Selected Approach
>>> >>> >
>>> >>> > The first approach was seen as too opaque to stay true to the CF
>>> ideal of complete self-description. The third approach seemed needlessly
>>> verbose and difficult to implement. The second approach was selected for
>>> the following reasons:
>>> >>> >
>>> >>> > The second approach is just as or more human-readable than the
>>> third.
>>> >>> > Use of break values keeps geometries relatively atomic.
>>> >>> > Will be familiar to developers who are familiar with the WKT
>>> geometry format.
>>> >>> > Character arrays, which are needed for options one and three, are
>>> cumbersome to use in some programming languages in common use with NetCDF.
>>> >>> > Break values replace the need for extraneous variables related to
>>> multi-part and polygon holes (interiors). Multi-part geometries are
>>> generally an exception and excessive instrumentation to support them should
>>> be discounted.
>>> >>> > Example: Representation of WKT-Style Polygons in a NetCDF-3
>>> timeSeriesfeatureType
>>> >>> >
>>> >>> > Below is sample CDL demonstrating how polygons are encoded in
>>> NetCDF-3 using a continuous ragged array-like encoding. There are three
>>> details to note in the example below.
>>> >>> >
>>> >>> > The attribute contiguous_ragged_dimension with value of a
>>> dimension in the file.
>>> >>> > The geom_coordinates attribute with a value containing a space
>>> separated string of variable names.
>>> >>> > The cf_role geometry_x_node and geometry_y_node.
>>> >>> > These three attributes form a system to fully describe collections
>>> of multi-polygon feature geometries. Any variable that has the
>>> continuous_ragged_dimension attribute contains integers that indicate the
>>> 0-indexed starting position of each geometry along the instance dimension.
>>> Any variable that uses the dimension referenced in the
>>> continuous_ragged_dimension attribute can be interpreted using the values
>>> in the variable containing the contiguous_ragged_dimension attribute. The
>>> variables referenced in the geom_coordinates attribute describe spatial
>>> coordinates of geometries. These variables can also be identified by the
>>> cf_roles geometry_x_node and geometry_y_node. Note that the example below
>>> also includes a mechanism to handle multi-polygon features that also
>>> contain holes.
>>> >>> >
>>> >>> > netcdf multipolygon_example {
>>> >>> > dimensions:
>>> >>> >   node = 47 ;
>>> >>> >   indices = 55 ;
>>> >>> >   instance = 3 ;
>>> >>> >   time = 5 ;
>>> >>> >   strlen = 5 ;
>>> >>> > variables:
>>> >>> >   char instance_name(instance, strlen) ;
>>> >>> >     instance_name:cf_role = "timeseries_id" ;
>>> >>> >   int coordinate_index(indices) ;
>>> >>> >     coordinate_index:geom_type = "multipolygon" ;
>>> >>> >     coordinate_index:geom_coordinates = "x y" ;
>>> >>> >     coordinate_index:multipart_break_value = -1 ;
>>> >>> >     coordinate_index:hole_break_value = -2 ;
>>> >>> >     coordinate_index:outer_ring_order = "anticlockwise" ;
>>> >>> >     coordinate_index:closure_convention =
>>> "last_node_equals_first" ;
>>> >>> >   int coordinate_index_start(instance) ;
>>> >>> >     coordinate_index_start:long_name = "index of first coordinate
>>> in each instance geometry" ;
>>> >>> >     coordinate_index_start:contiguous_ragged_dimension =
>>> "indices" ;
>>> >>> >   double x(node) ;
>>> >>> >     x:units = "degrees_east" ;
>>> >>> >     x:standard_name = "longitude" ; // or projection_x_coordinate
>>> >>> >     X:cf_role = "geometry_x_node" ;
>>> >>> >   double y(node) ;
>>> >>> >     y:units = "degrees_north" ;
>>> >>> >     y:standard_name = ?latitude? ; // or projection_y_coordinate
>>> >>> >     y:cf_role = "geometry_y_node"
>>> >>> >   double someVariable(instance) ;
>>> >>> >     someVariable:long_name = "a variable describing a
>>> single-valued attribute of a polygon" ;
>>> >>> >   int time(time) ;
>>> >>> >     time:units = "days since 2000-01-01" ;
>>> >>> >   double someData(instance, time) ;
>>> >>> >     someData:coordinates = "time x y" ;
>>> >>> >     someData:featureType = "timeSeries" ;
>>> >>> > // global attributes:
>>> >>> >     :Conventions = "CF-1.8" ;
>>> >>> >
>>> >>> > data:
>>> >>> >
>>> >>> >  instance_name =
>>> >>> >   "flash",
>>> >>> >   "bang",
>>> >>> >   "pow" ;
>>> >>> >
>>> >>> >  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11,
>>> 12, -2, 13, 14, 15, 16,
>>> >>> >     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1,
>>> 29, 30, 31, 32, 33,
>>> >>> >     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
>>> >>> >
>>> >>> >  coordinate_index_start = 0, 30, 46 ;
>>> >>> >
>>> >>> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5,
>>> 9, 7,
>>> >>> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30,
>>> -45, -20, -30, -20, -20, -30, 30,
>>> >>> >     45, 10, 30, 25, 50, 30, 25 ;
>>> >>> >
>>> >>> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15,
>>> 25, 25, 29,
>>> >>> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5,
>>> -20, -35, -20, -15, -25, -20, 20,
>>> >>> >     40, 40, 20, 5, 10, 15, 5 ;
>>> >>> >
>>> >>> >  someVariable = 1, 2, 3 ;
>>> >>> >
>>> >>> >  time = 1, 2, 3, 4, 5 ;
>>> >>> >
>>> >>> >  someData =
>>> >>> >   1, 2, 3, 4, 5,
>>> >>> >   1, 2, 3, 4, 5,
>>> >>> >   1, 2, 3, 4, 5 ;
>>> >>> > }
>>> >>> > How To Interpret
>>> >>> >
>>> >>> > Starting from the timeSeries variables:
>>> >>> >
>>> >>> > See CF-1.8 conventions.
>>> >>> > See the timeSeries featureType.
>>> >>> > Find the timeseries_id cf_role.
>>> >>> > Find the coordinates attribute of data variables.
>>> >>> > See that the variables indicated by the coordinates attribute have
>>> a cf_role geometry_x_nodeand geometry_y_node to determine that these are
>>> geometries according to this new specification.
>>> >>> > Find the coordinate index variable with geom_coordinates that
>>> point to the nodes.
>>> >>> > Find the variable with contiguous_ragged_dimension pointing to the
>>> dimension of the coordinate index variable to determine how to index into
>>> the coordinate index.
>>> >>> > Iterate over polygons, parsing out geometries using the contiguous
>>> ragged start variable and coordinate index variable to interpret the
>>> coordinate data variables.
>>> >>> > Or, without reference to timeSeries:
>>> >>> >
>>> >>> > See CF-1.8 conventions.
>>> >>> > See the geom_type of multipolygon.
>>> >>> > Find the variable with a contiguous_ragged_dimension matching the
>>> coordinate index variable?s dimension.
>>> >>> > See the geom_coordinates of x y.
>>> >>> > Using the contiguous ragged start variable found in 3 and the
>>> coordinate index variable found in 2, geometries can be parsed out of the
>>> coordinate index variable and parsed using the hole and break values in it.
>>> >>> >
>>> >>> > -------------- next part --------------
>>> >>> > An HTML attachment was scrubbed...
>>> >>> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>>> ts/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/p
>>> ipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html> <
>>> http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>>> ts/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/p
>>> ipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>>
>>> >>> >
>>> >>> > ------------------------------
>>> >>> >
>>> >>> > Subject: Digest Footer
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > CF-metadata mailing list
>>> >>> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> <mailto:CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>>
>>> >>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>
>>> >>> >
>>> >>> >
>>> >>> > ------------------------------
>>> >>> >
>>> >>> > End of CF-metadata Digest, Vol 166, Issue 3
>>> >>> > *******************************************
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Sincerely,
>>> >>> >
>>> >>> > Bob Simons
>>> >>> > IT Specialist
>>> >>> > Environmental Research Division
>>> >>> > NOAA Southwest Fisheries Science Center
>>> >>> > 99 Pacific St., Suite 255A      (New!)
>>> >>> > Monterey, CA 93940               (New!)
>>> >>> > Phone: (831)333-9878 <(831)%20333-9878> <tel:%28831%29333-9878>
>>>           (New!)
>>> >>> > Fax:   (831)648-8440 <(831)%20648-8440> <tel:%28831%29648-8440>
>>> >>> > Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov> <mailto:
>>> bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>>
>>> >>> >
>>> >>> > The contents of this message are mine personally and
>>> >>> > do not necessarily reflect any position of the
>>> >>> > Government or the National Oceanic and Atmospheric Administration.
>>> >>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > CF-metadata mailing list
>>> >>> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> >>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> >>>
>>> >>> -------------- next part --------------
>>> >>> An HTML attachment was scrubbed...
>>> >>> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>>> ts/20170203/4ff55def/attachment.html <http://mailman.cgd.ucar.edu/p
>>> ipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html>>
>>> >>>
>>> >>> ------------------------------
>>> >>>
>>> >>> Subject: Digest Footer
>>> >>>
>>> >>> _______________________________________________
>>> >>> CF-metadata mailing list
>>> >>> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> >>>
>>> >>>
>>> >>> ------------------------------
>>> >>>
>>> >>> End of CF-metadata Digest, Vol 166, Issue 5
>>> >>> *******************************************
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Sincerely,
>>> >>>
>>> >>> Bob Simons
>>> >>> IT Specialist
>>> >>> Environmental Research Division
>>> >>> NOAA Southwest Fisheries Science Center
>>> >>> 99 Pacific St., Suite 255A      (New!)
>>> >>> Monterey, CA 93940               (New!)
>>> >>> Phone: (831)333-9878 <(831)%20333-9878> <tel:(831)%20333-9878>
>>>       (New!)
>>> >>> Fax:   (831)648-8440 <(831)%20648-8440> <tel:(831)%20648-8440>
>>> >>> Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
>>> >>>
>>> >>> The contents of this message are mine personally and
>>> >>> do not necessarily reflect any position of the
>>> >>> Government or the National Oceanic and Atmospheric Administration.
>>> >>> <>< <>< <>< <>< <>< <>< <>< <>< <><
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> CF-metadata mailing list
>>> >>> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>> Christopher Barker, Ph.D.
>>> >>> Oceanographer
>>> >>>
>>> >>> Emergency Response Division
>>> >>> NOAA/NOS/OR&R            (206) 526-6959   voice
>>> >>> 7600 Sand Point Way NE   (206) 526-6329   fax
>>> >>> Seattle, WA  98115       (206) 526-6317   main reception
>>> >>>
>>> >>> Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
>>> _______________________________________________
>>> >>> CF-metadata mailing list
>>> >>> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> >>
>>> >
>>>
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>>> ts/20170217/b548709a/attachment.html>
>>>
>>> ------------------------------
>>>
>>> Subject: Digest Footer
>>>
>>> _______________________________________________
>>> CF-metadata mailing list
>>> CF-metadata at cgd.ucar.edu
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>>
>>>
>>> ------------------------------
>>>
>>> End of CF-metadata Digest, Vol 166, Issue 15
>>> ********************************************
>>>
>>
>>
>>
>> --
>> Sincerely,
>>
>> Bob Simons
>> IT Specialist
>> Environmental Research Division
>> NOAA Southwest Fisheries Science Center
>> 99 Pacific St., Suite 255A      (New!)
>> Monterey, CA 93940               (New!)
>> Phone: (831)333-9878 <(831)%20333-9878>            (New!)
>> Fax:   (831)648-8440 <(831)%20648-8440>
>> Email: bob.simons at noaa.gov
>>
>> The contents of this message are mine personally and
>> do not necessarily reflect any position of the
>> Government or the National Oceanic and Atmospheric Administration.
>> <>< <>< <>< <>< <>< <>< <>< <>< <><
>>
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>
>
> --
> David Hassell
> National Centre for Atmospheric Science
> Department of Meteorology, University of Reading,
> Earley Gate, PO Box 243, Reading RG6 6BB
> Tel: +44 118 378 5613 <+44%20118%20378%205613>
> http://www.met.reading.ac.uk/
>



-- 
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170222/0c68e411/attachment-0001.html>


More information about the CF-metadata mailing list