[CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

David Blodgett dblodgett at usgs.gov
Fri Feb 3 10:07:00 MST 2017

Dear Bob,

I’ll just take these in line.

1) noted. We have been trying to figure out what to do with the point featureType and I think leaving it more or less alone is a viable path forward. 

2) This is not an exact replica of WKT, but rather a similar approach to WKT. As I stated, we have followed the ISO simple features data model and well known text feature types in concept, but have not used the same standardization formalisms. We aren’t advocating for supporting “all of” any standard but are rather attempting to support the use cases that have a compelling need today while aligning this with as many other encoding standards in this space as is practical. Hopefully that answers your question, sorry if it’s vague.

3) The google doc linked in my response contains the encoding we are proposing as a starting point for conversation: http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> I want to stress, as a starting point for discussion. I expect that this proposal will change drastically before we’re done.

4) Absolutely envision tools doing what you say, convert to/from standard spatial formats and NetCDF-CF geometries. We intend to introduce an R and a Python implementation that does exactly as you say along with whatever form this standard takes in the end. R and Python were chosen as the team that brought this together are familiar with those two languages, additional implementations would be more than welcome.

5) We do include a “geometry” featureType similar to the “point” featureType. Thus our difficulty with what to do with the “point” featureType. You are correct, there are lots of non timeSeries applications to be solved and this proposal does intend to support them (within the existing DSG constructs).

Thanks for your questions, hopefully my answers close some gaps for you.

- Dave

> On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <bob.simons at noaa.gov> wrote:
> 1) There is a vague comment in the proposal about possibly changing the point featureType. Please don't, unless the changes don't affect current uses of Point. There are already 1000's of files that use it. If this new system offers an alternative, then fine, it's an alternative. One of the most important and useful features of a good standard is backwards compatibility. 
> 2) You advocate "Implement the WKT approach using a NetCDF binary array." Is this system then an exact encoding of WKT, neither a subset nor a superset?  "Simple Features" are often not simple. 
> If it is WKT (or something else), what is the standard you are following to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and ISO 19162:2015)?
> Does your proposal deviate in any way from the standard's capabilities?
> Do you advocate following the entire WKT standard, e.g., supporting all the feature types that WKT supports?
> 3) Since you are not using the WKT encoding, but creating your own, where is the definition of the encoding system you are using? 
> 4) This is a little out of CF scope, but:
> Do you envision tools, notably, netcdf-c/java, having a writer function that takes in WKT and encodes the information in a file, and having a reader function that reads the file and returns WKT? Or is it your plan that the encoding/ decoding is left to the user?  
> 5) This proposal is for "Simple Features plus Time Series" (my phrase not yours). But aren't there lots of other uses of Simple Features? Will there be other proposals in the future for "Simple Features plus X" and "Simple Features plus Y"? If so, will CF eventually become a massive document where Simple Features are defined over and over again, but in different contexts? If so, wouldn't a better solution be to deal with Simple Features separately (as Postgres does by making a geometric data type?), and then add "Simple Features plus Time Series" as the first use of it?
> Thanks for answering these questions.
> Please forgive me if I missed parts of your proposal that answer these questions. 
> On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
> Date: Thu, 2 Feb 2017 07:57:36 -0600
> From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>
> To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
> Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
>         Simple  Features
> Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>
> Content-Type: text/plain; charset="utf-8"
> Dear CF Community,
> We are pleased to submit this proposal for your consideration and review. The cover letter we've prepared below provides some background and explanation for the proposed approach. The google doc here <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> is an excerpt of the CF specification with track changes turned on. Permissions for the document allow any google user to comment, so feel free to comment and ask questions in line.
> Note that I?m sharing this with you with one issue unresolved. What to do with the point featureType? Our draft suggests that it is part of a new geometry featureType, but it could be that we leave it alone and introduce a geometry featureType. This may be a minor point of discussion, but we need to be clear that this is an issue that still needs to be resolved in the proposal.
> Thank you for your time and consideration.
> Best Regards,
> David Blodgett, Tim Whiteaker, and Ben Koziol
> Proposed Extension to NetCDF-CF for Simple Geometries
> Preface
> The proposed addition to NetCDF-CF introduced below is inspired by a pre-existing data model governed by OGC and ISO as ISO 19125-1. More information on Simple Features may be found here. <https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/wiki/Simple_Features>> To the knowledge of the authors, it is consistent with ISO 19125-1 but has not been specified using the formalisms of OGC or ISO. Language used attempts to hold true to NetCDF-CF semantics while not conflicting with the existing standards baseline. While this proposal does not support the entire scope of the the simple features ecosystem, it does support the core data types in most common use around the community.
> The other existing standard to mention is UGRID convention <http://ugrid-conventions.github.io/ugrid-conventions/ <http://ugrid-conventions.github.io/ugrid-conventions/>>. The authors have experience reading and writing UGRID and have designed the proposed structure in a way that is inspired by and consistent with it.
> Terms and Definitions
> (Taken from OGC 06-103r4 OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture <http://www.opengeospatial.org/standards/sfa <http://www.opengeospatial.org/standards/sfa>>.)
> Feature: Abstraction of real world phenomena - typically a geospatial abstraction with associated descriptive attributes.
> Simple Feature: A feature with all geometric attributes described piecewise by straight line or planar interpolation between point sets.
> Geometry (geometric complex): A set of disjoint geometric primitives - one or more points, lines, or polygons that form the spatial representation of a feature.
> Introduction
> Discrete Sampling Geometries (DSGs) handle data from one (or a collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile or timeSeriesProfile geometries. Measurements are from a point (timeSeries and Profile) or points along a trajectory. In this proposal, we reuse the core DSG timeSeries type which provides support for basic time series use cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
> Changes to Existing CF Specification
> In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and variables into two types ? instance and element <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>>. Instance refers to individual points, trajectories, profiles, etc. These would sometimes be referred to as features given that they are identified entities that can have associated attributes and be related to other entities. Element dimensions describe temporal or other dimensions to describe data on a per-instance basis. This proposal extends the DSG timeSeries featuretype <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>> such that the geospatial coordinates of the instances can be point, multi-point, line, multi-line, polygon, or multi-polygon geometries. Rather than overload the DSG contiguous ragged array encoding, designed with timeseries in mind, a geometry ragged array encoding is introduced in a new section 9.3.5. See thi
>  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>
> Motivation
> DSGs have no system to define a geometry (polyline, polygon, etc., other than point) and an association with a time series that applies over that entire geometry e.g., The expected rainfall in this watershed polygon for some period of time is 10 mm. As suggested in the last paragraph of section 9.1, current practice is to assign a representative point or just use an ID and forgo spatial information within a NetCDF-CF file. In order to satisfy a number of environmental modeling use cases, we need a way to encode a geometry (point, line, polygon, multi-point, multi-line, or multi-polygon) that is the static spatial feature representation to which one or more timeSeries can be associated. In this proposal, we provide an encoding to define collections of simple feature geometries. It interfaces cleanly with the existing DSG specification, enabling DSGs and Simple Geometries to be used concurrently.
> Looking Forward
> This proposal is a compromise solution that attempts to stay consisten to CF ideals and fit within the structure of the existing specification with minimal disruption. Line and polygon data types often require variable length arrays. Development of this proposal has brought to light the need for a general abstraction for variable length arrays in NetCDF-CF. Such a general abstraction would necessarily be reusable for character arrays, ragged arrays of time series, and ragged arrays of geometry nodes, as well as any other ragged data structures that may come up in the future. This proposal does not introduce such a general ragged array abstraction but does not preclude such a development in the future.
> Three Alternative Approaches
> Respecting the human readability ideal of NetCDF-CF, the development of this proposal started from a human readable format for geometries known as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <https://en.wikipedia.org/wiki/Well-known_text>>. We considered three high level design approaches while developing this proposal.
> Direct use of Well-Known Text (WKT). In this approach, well known text strings would be encoded using character arrays following a contiguous ragged array approach to index the character array by geometry (or instance in DSG parlance).
> Implement the WKT approach using a NetCDF binary array. In this approach, well known text separators (brackets, commas and spaces) for multipoint, multiline, multipolygon, and polygon holes, would be encoded as break type separator values like -1 for multiparts and -2 for holes.
> Implement the fundamental dimensions of geometry data in NetCDF. In this approach, additional dimensions and variables along those dimensions would be introduced to represent geometries, geometry parts, geometry nodes, and unique (potentially shared) coordinate locations for nodes to reference.
> Selected Approach
> The first approach was seen as too opaque to stay true to the CF ideal of complete self-description. The third approach seemed needlessly verbose and difficult to implement. The second approach was selected for the following reasons:
> The second approach is just as or more human-readable than the third.
> Use of break values keeps geometries relatively atomic.
> Will be familiar to developers who are familiar with the WKT geometry format.
> Character arrays, which are needed for options one and three, are cumbersome to use in some programming languages in common use with NetCDF.
> Break values replace the need for extraneous variables related to multi-part and polygon holes (interiors). Multi-part geometries are generally an exception and excessive instrumentation to support them should be discounted.
> Example: Representation of WKT-Style Polygons in a NetCDF-3 timeSeriesfeatureType
> Below is sample CDL demonstrating how polygons are encoded in NetCDF-3 using a continuous ragged array-like encoding. There are three details to note in the example below.
> The attribute contiguous_ragged_dimension with value of a dimension in the file.
> The geom_coordinates attribute with a value containing a space separated string of variable names.
> The cf_role geometry_x_node and geometry_y_node.
> These three attributes form a system to fully describe collections of multi-polygon feature geometries. Any variable that has the continuous_ragged_dimension attribute contains integers that indicate the 0-indexed starting position of each geometry along the instance dimension. Any variable that uses the dimension referenced in the continuous_ragged_dimension attribute can be interpreted using the values in the variable containing the contiguous_ragged_dimension attribute. The variables referenced in the geom_coordinates attribute describe spatial coordinates of geometries. These variables can also be identified by the cf_roles geometry_x_node and geometry_y_node. Note that the example below also includes a mechanism to handle multi-polygon features that also contain holes.
> netcdf multipolygon_example {
> dimensions:
>   node = 47 ;
>   indices = 55 ;
>   instance = 3 ;
>   time = 5 ;
>   strlen = 5 ;
> variables:
>   char instance_name(instance, strlen) ;
>     instance_name:cf_role = "timeseries_id" ;
>   int coordinate_index(indices) ;
>     coordinate_index:geom_type = "multipolygon" ;
>     coordinate_index:geom_coordinates = "x y" ;
>     coordinate_index:multipart_break_value = -1 ;
>     coordinate_index:hole_break_value = -2 ;
>     coordinate_index:outer_ring_order = "anticlockwise" ;
>     coordinate_index:closure_convention = "last_node_equals_first" ;
>   int coordinate_index_start(instance) ;
>     coordinate_index_start:long_name = "index of first coordinate in each instance geometry" ;
>     coordinate_index_start:contiguous_ragged_dimension = "indices" ;
>   double x(node) ;
>     x:units = "degrees_east" ;
>     x:standard_name = "longitude" ; // or projection_x_coordinate
>     X:cf_role = "geometry_x_node" ;
>   double y(node) ;
>     y:units = "degrees_north" ;
>     y:standard_name = ?latitude? ; // or projection_y_coordinate
>     y:cf_role = "geometry_y_node"
>   double someVariable(instance) ;
>     someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
>   int time(time) ;
>     time:units = "days since 2000-01-01" ;
>   double someData(instance, time) ;
>     someData:coordinates = "time x y" ;
>     someData:featureType = "timeSeries" ;
> // global attributes:
>     :Conventions = "CF-1.8" ;
> data:
>  instance_name =
>   "flash",
>   "bang",
>   "pow" ;
>  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16,
>     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30, 31, 32, 33,
>     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
>  coordinate_index_start = 0, 30, 46 ;
>  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30,
>     45, 10, 30, 25, 50, 30, 25 ;
>  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
>     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
>     40, 40, 20, 5, 10, 15, 5 ;
>  someVariable = 1, 2, 3 ;
>  time = 1, 2, 3, 4, 5 ;
>  someData =
>   1, 2, 3, 4, 5,
>   1, 2, 3, 4, 5,
>   1, 2, 3, 4, 5 ;
> }
> How To Interpret
> Starting from the timeSeries variables:
> See CF-1.8 conventions.
> See the timeSeries featureType.
> Find the timeseries_id cf_role.
> Find the coordinates attribute of data variables.
> See that the variables indicated by the coordinates attribute have a cf_role geometry_x_nodeand geometry_y_node to determine that these are geometries according to this new specification.
> Find the coordinate index variable with geom_coordinates that point to the nodes.
> Find the variable with contiguous_ragged_dimension pointing to the dimension of the coordinate index variable to determine how to index into the coordinate index.
> Iterate over polygons, parsing out geometries using the contiguous ragged start variable and coordinate index variable to interpret the coordinate data variables.
> Or, without reference to timeSeries:
> See CF-1.8 conventions.
> See the geom_type of multipolygon.
> Find the variable with a contiguous_ragged_dimension matching the coordinate index variable?s dimension.
> See the geom_coordinates of x y.
> Using the contiguous ragged start variable found in 3 and the coordinate index variable found in 2, geometries can be parsed out of the coordinate index variable and parsed using the hole and break values in it.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> ------------------------------
> End of CF-metadata Digest, Vol 166, Issue 3
> *******************************************
> -- 
> Sincerely,
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center 
> 99 Pacific St., Suite 255A      (New!)
> Monterey, CA 93940               (New!) 
> Phone: (831)333-9878            (New!)
> Fax:   (831)648-8440
> Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
> The contents of this message are mine personally and 
> do not necessarily reflect any position of the 
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <>< 
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment-0001.html>

More information about the CF-metadata mailing list