[CF-metadata] CF-metadata Digest, Vol 166, Issue 5

Bob Simons - NOAA Federal bob.simons at noaa.gov
Fri Feb 3 12:41:09 MST 2017


Then, isn't this proposal just the first step in the creation of a new
model and a new encoding of Simple Features, one that is "align[ed] ...
with as many other encoding standards in this space as is practical"? In
other words, yet another standard for Simple Features?

If so, it seems risky to me to take just the first (easy?) step "to support
the use cases that have a compelling need today" and not solve the entire
problem. I know the CF way is to just solve real, current needs, but in
this case it seems to risk a head slap moment in the future when we realize
that, in order to deal with some new simple feature variant, we should have
done things differently from the beginning?

And it seems odd to reject existing standards that have been so
painstakingly hammered out, in favor of starting the process all over
again.  We follow existing standards for other things (e.g., IEEE-754 for
representing floating point numbers in binary files), why can't we follow
an existing Simple Features standard?

---
Rather than just be a naysayer, let me suggest a very different alternative:

There are several projects in the CF realm (e.g., this Simple Features
project, Discrete Sampling Geometry (DSG), true variable-length Strings,
ugrid(?)) which share a common underlying problem: how to deal with
variable-length multidimensional arrays: a[b][c], where the length of the c
dimension may be different for different b indices.
DSG solved this (5 different ways!), but only for DSG.
The Simple Features proposal seeks to solve the problem for Simple Features.
We still have no support for Unicode variable-length Strings.

Instead of continuing to solve the variable-length problem a different way
every time we confront it, shouldn't we solve it once, with one small
addition to the standard, and then use that solution repeatedly?
The solution could be a simple variant of one of the DSG solutions, but
generalized so that it could be used in different situations.
An encoding standard and built-in support for variable-length data arrays
in netcdf-java/c would solve a lot of problems, now and in the future.
Some work on this is already done: I think the netcdf-java API already
supports variable-length arrays when reading netcdf-4 files.
For Simple Features, the problem would reduce to: store the feature (using
some specified existing standard like WKT or WKB) in a variable-length
array.





On Fri, Feb 3, 2017 at 9:07 AM, <cf-metadata-request at cgd.ucar.edu> wrote:

> Date: Fri, 3 Feb 2017 11:07:00 -0600
> From: David Blodgett <dblodgett at usgs.gov>
> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
> Cc: CF Metadata <cf-metadata at cgd.ucar.edu>
> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>         for Simple Features
> Message-ID: <8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Bob,
>
> I?ll just take these in line.
>
> 1) noted. We have been trying to figure out what to do with the point
> featureType and I think leaving it more or less alone is a viable path
> forward.
>
> 2) This is not an exact replica of WKT, but rather a similar approach to
> WKT. As I stated, we have followed the ISO simple features data model and
> well known text feature types in concept, but have not used the same
> standardization formalisms. We aren?t advocating for supporting ?all of?
> any standard but are rather attempting to support the use cases that have a
> compelling need today while aligning this with as many other encoding
> standards in this space as is practical. Hopefully that answers your
> question, sorry if it?s vague.
>
> 3) The google doc linked in my response contains the encoding we are
> proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
> http://goo.gl/Kq9ASq> I want to stress, as a starting point for
> discussion. I expect that this proposal will change drastically before
> we?re done.
>
> 4) Absolutely envision tools doing what you say, convert to/from standard
> spatial formats and NetCDF-CF geometries. We intend to introduce an R and a
> Python implementation that does exactly as you say along with whatever form
> this standard takes in the end. R and Python were chosen as the team that
> brought this together are familiar with those two languages, additional
> implementations would be more than welcome.
>
> 5) We do include a ?geometry? featureType similar to the ?point?
> featureType. Thus our difficulty with what to do with the ?point?
> featureType. You are correct, there are lots of non timeSeries applications
> to be solved and this proposal does intend to support them (within the
> existing DSG constructs).
>
> Thanks for your questions, hopefully my answers close some gaps for you.
>
> - Dave
>
> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
> bob.simons at noaa.gov> wrote:
> >
> > 1) There is a vague comment in the proposal about possibly changing the
> point featureType. Please don't, unless the changes don't affect current
> uses of Point. There are already 1000's of files that use it. If this new
> system offers an alternative, then fine, it's an alternative. One of the
> most important and useful features of a good standard is backwards
> compatibility.
> >
> > 2) You advocate "Implement the WKT approach using a NetCDF binary
> array." Is this system then an exact encoding of WKT, neither a subset nor
> a superset?  "Simple Features" are often not simple.
> > If it is WKT (or something else), what is the standard you are following
> to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and ISO
> 19162:2015)?
> > Does your proposal deviate in any way from the standard's capabilities?
> > Do you advocate following the entire WKT standard, e.g., supporting all
> the feature types that WKT supports?
> >
> > 3) Since you are not using the WKT encoding, but creating your own,
> where is the definition of the encoding system you are using?
> >
> > 4) This is a little out of CF scope, but:
> > Do you envision tools, notably, netcdf-c/java, having a writer function
> that takes in WKT and encodes the information in a file, and having a
> reader function that reads the file and returns WKT? Or is it your plan
> that the encoding/ decoding is left to the user?
> >
> > 5) This proposal is for "Simple Features plus Time Series" (my phrase
> not yours). But aren't there lots of other uses of Simple Features? Will
> there be other proposals in the future for "Simple Features plus X" and
> "Simple Features plus Y"? If so, will CF eventually become a massive
> document where Simple Features are defined over and over again, but in
> different contexts? If so, wouldn't a better solution be to deal with
> Simple Features separately (as Postgres does by making a geometric data
> type?), and then add "Simple Features plus Time Series" as the first use of
> it?
> >
> > Thanks for answering these questions.
> > Please forgive me if I missed parts of your proposal that answer these
> questions.
> >
> >
> > On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu
> <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
> > Date: Thu, 2 Feb 2017 07:57:36 -0600
> > From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>
> > To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
> >         Simple  Features
> > Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:
> 224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear CF Community,
> >
> > We are pleased to submit this proposal for your consideration and
> review. The cover letter we've prepared below provides some background and
> explanation for the proposed approach. The google doc here <
> http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> is an excerpt of the CF
> specification with track changes turned on. Permissions for the document
> allow any google user to comment, so feel free to comment and ask questions
> in line.
> >
> > Note that I?m sharing this with you with one issue unresolved. What to
> do with the point featureType? Our draft suggests that it is part of a new
> geometry featureType, but it could be that we leave it alone and introduce
> a geometry featureType. This may be a minor point of discussion, but we
> need to be clear that this is an issue that still needs to be resolved in
> the proposal.
> >
> > Thank you for your time and consideration.
> >
> > Best Regards,
> >
> > David Blodgett, Tim Whiteaker, and Ben Koziol
> >
> > Proposed Extension to NetCDF-CF for Simple Geometries
> >
> > Preface
> >
> > The proposed addition to NetCDF-CF introduced below is inspired by a
> pre-existing data model governed by OGC and ISO as ISO 19125-1. More
> information on Simple Features may be found here. <
> https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/
> wiki/Simple_Features>> To the knowledge of the authors, it is consistent
> with ISO 19125-1 but has not been specified using the formalisms of OGC or
> ISO. Language used attempts to hold true to NetCDF-CF semantics while not
> conflicting with the existing standards baseline. While this proposal does
> not support the entire scope of the the simple features ecosystem, it does
> support the core data types in most common use around the community.
> >
> > The other existing standard to mention is UGRID convention <
> http://ugrid-conventions.github.io/ugrid-conventions/ <
> http://ugrid-conventions.github.io/ugrid-conventions/>>. The authors have
> experience reading and writing UGRID and have designed the proposed
> structure in a way that is inspired by and consistent with it.
> >
> > Terms and Definitions
> >
> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for
> Geographic information - Simple feature access - Part 1: Common
> architecture <http://www.opengeospatial.org/standards/sfa <
> http://www.opengeospatial.org/standards/sfa>>.)
> >
> > Feature: Abstraction of real world phenomena - typically a geospatial
> abstraction with associated descriptive attributes.
> > Simple Feature: A feature with all geometric attributes described
> piecewise by straight line or planar interpolation between point sets.
> > Geometry (geometric complex): A set of disjoint geometric primitives -
> one or more points, lines, or polygons that form the spatial representation
> of a feature.
> > Introduction
> >
> > Discrete Sampling Geometries (DSGs) handle data from one (or a
> collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile
> or timeSeriesProfile geometries. Measurements are from a point (timeSeries
> and Profile) or points along a trajectory. In this proposal, we reuse the
> core DSG timeSeries type which provides support for basic time series use
> cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
> >
> > Changes to Existing CF Specification
> >
> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and
> variables into two types ? instance and element <
> http://cfconventions.org/cf-conventions/cf-conventions.
> html#_collections_instances_and_elements <http://cfconventions.org/cf-
> conventions/cf-conventions.html#_collections_instances_and_elements>>.
> Instance refers to individual points, trajectories, profiles, etc. These
> would sometimes be referred to as features given that they are identified
> entities that can have associated attributes and be related to other
> entities. Element dimensions describe temporal or other dimensions to
> describe data on a per-instance basis. This proposal extends the DSG
> timeSeries featuretype <http://cfconventions.org/cf-
> conventions/cf-conventions.html#_features_and_feature_types <
> http://cfconventions.org/cf-conventions/cf-conventions.
> html#_features_and_feature_types>> such that the geospatial coordinates
> of the instances can be point, multi-point, line, multi-line, polygon, or
> multi-polyg
>  on geometries. Rather than overload the DSG contiguous ragged array
> encoding, designed with timeseries in mind, a geometry ragged array
> encoding is introduced in a new section 9.3.5. See thi
> >  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <
> http://goo.gl/Kq9ASq>>
> > Motivation
> >
> > DSGs have no system to define a geometry (polyline, polygon, etc., other
> than point) and an association with a time series that applies over that
> entire geometry e.g., The expected rainfall in this watershed polygon for
> some period of time is 10 mm. As suggested in the last paragraph of section
> 9.1, current practice is to assign a representative point or just use an ID
> and forgo spatial information within a NetCDF-CF file. In order to satisfy
> a number of environmental modeling use cases, we need a way to encode a
> geometry (point, line, polygon, multi-point, multi-line, or multi-polygon)
> that is the static spatial feature representation to which one or more
> timeSeries can be associated. In this proposal, we provide an encoding to
> define collections of simple feature geometries. It interfaces cleanly with
> the existing DSG specification, enabling DSGs and Simple Geometries to be
> used concurrently.
> >
> > Looking Forward
> >
> > This proposal is a compromise solution that attempts to stay consisten
> to CF ideals and fit within the structure of the existing specification
> with minimal disruption. Line and polygon data types often require variable
> length arrays. Development of this proposal has brought to light the need
> for a general abstraction for variable length arrays in NetCDF-CF. Such a
> general abstraction would necessarily be reusable for character arrays,
> ragged arrays of time series, and ragged arrays of geometry nodes, as well
> as any other ragged data structures that may come up in the future. This
> proposal does not introduce such a general ragged array abstraction but
> does not preclude such a development in the future.
> >
> > Three Alternative Approaches
> >
> > Respecting the human readability ideal of NetCDF-CF, the development of
> this proposal started from a human readable format for geometries known as
> Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <
> https://en.wikipedia.org/wiki/Well-known_text>>. We considered three high
> level design approaches while developing this proposal.
> >
> > Direct use of Well-Known Text (WKT). In this approach, well known text
> strings would be encoded using character arrays following a contiguous
> ragged array approach to index the character array by geometry (or instance
> in DSG parlance).
> > Implement the WKT approach using a NetCDF binary array. In this
> approach, well known text separators (brackets, commas and spaces) for
> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
> break type separator values like -1 for multiparts and -2 for holes.
> > Implement the fundamental dimensions of geometry data in NetCDF. In this
> approach, additional dimensions and variables along those dimensions would
> be introduced to represent geometries, geometry parts, geometry nodes, and
> unique (potentially shared) coordinate locations for nodes to reference.
> > Selected Approach
> >
> > The first approach was seen as too opaque to stay true to the CF ideal
> of complete self-description. The third approach seemed needlessly verbose
> and difficult to implement. The second approach was selected for the
> following reasons:
> >
> > The second approach is just as or more human-readable than the third.
> > Use of break values keeps geometries relatively atomic.
> > Will be familiar to developers who are familiar with the WKT geometry
> format.
> > Character arrays, which are needed for options one and three, are
> cumbersome to use in some programming languages in common use with NetCDF.
> > Break values replace the need for extraneous variables related to
> multi-part and polygon holes (interiors). Multi-part geometries are
> generally an exception and excessive instrumentation to support them should
> be discounted.
> > Example: Representation of WKT-Style Polygons in a NetCDF-3
> timeSeriesfeatureType
> >
> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3
> using a continuous ragged array-like encoding. There are three details to
> note in the example below.
> >
> > The attribute contiguous_ragged_dimension with value of a dimension in
> the file.
> > The geom_coordinates attribute with a value containing a space separated
> string of variable names.
> > The cf_role geometry_x_node and geometry_y_node.
> > These three attributes form a system to fully describe collections of
> multi-polygon feature geometries. Any variable that has the
> continuous_ragged_dimension attribute contains integers that indicate the
> 0-indexed starting position of each geometry along the instance dimension.
> Any variable that uses the dimension referenced in the
> continuous_ragged_dimension attribute can be interpreted using the values
> in the variable containing the contiguous_ragged_dimension attribute. The
> variables referenced in the geom_coordinates attribute describe spatial
> coordinates of geometries. These variables can also be identified by the
> cf_roles geometry_x_node and geometry_y_node. Note that the example below
> also includes a mechanism to handle multi-polygon features that also
> contain holes.
> >
> > netcdf multipolygon_example {
> > dimensions:
> >   node = 47 ;
> >   indices = 55 ;
> >   instance = 3 ;
> >   time = 5 ;
> >   strlen = 5 ;
> > variables:
> >   char instance_name(instance, strlen) ;
> >     instance_name:cf_role = "timeseries_id" ;
> >   int coordinate_index(indices) ;
> >     coordinate_index:geom_type = "multipolygon" ;
> >     coordinate_index:geom_coordinates = "x y" ;
> >     coordinate_index:multipart_break_value = -1 ;
> >     coordinate_index:hole_break_value = -2 ;
> >     coordinate_index:outer_ring_order = "anticlockwise" ;
> >     coordinate_index:closure_convention = "last_node_equals_first" ;
> >   int coordinate_index_start(instance) ;
> >     coordinate_index_start:long_name = "index of first coordinate in
> each instance geometry" ;
> >     coordinate_index_start:contiguous_ragged_dimension = "indices" ;
> >   double x(node) ;
> >     x:units = "degrees_east" ;
> >     x:standard_name = "longitude" ; // or projection_x_coordinate
> >     X:cf_role = "geometry_x_node" ;
> >   double y(node) ;
> >     y:units = "degrees_north" ;
> >     y:standard_name = ?latitude? ; // or projection_y_coordinate
> >     y:cf_role = "geometry_y_node"
> >   double someVariable(instance) ;
> >     someVariable:long_name = "a variable describing a single-valued
> attribute of a polygon" ;
> >   int time(time) ;
> >     time:units = "days since 2000-01-01" ;
> >   double someData(instance, time) ;
> >     someData:coordinates = "time x y" ;
> >     someData:featureType = "timeSeries" ;
> > // global attributes:
> >     :Conventions = "CF-1.8" ;
> >
> > data:
> >
> >  instance_name =
> >   "flash",
> >   "bang",
> >   "pow" ;
> >
> >  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12,
> -2, 13, 14, 15, 16,
> >     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30,
> 31, 32, 33,
> >     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
> >
> >  coordinate_index_start = 0, 30, 46 ;
> >
> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20,
> -30, -20, -20, -30, 30,
> >     45, 10, 30, 25, 50, 30, 25 ;
> >
> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25,
> 25, 29,
> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35,
> -20, -15, -25, -20, 20,
> >     40, 40, 20, 5, 10, 15, 5 ;
> >
> >  someVariable = 1, 2, 3 ;
> >
> >  time = 1, 2, 3, 4, 5 ;
> >
> >  someData =
> >   1, 2, 3, 4, 5,
> >   1, 2, 3, 4, 5,
> >   1, 2, 3, 4, 5 ;
> > }
> > How To Interpret
> >
> > Starting from the timeSeries variables:
> >
> > See CF-1.8 conventions.
> > See the timeSeries featureType.
> > Find the timeseries_id cf_role.
> > Find the coordinates attribute of data variables.
> > See that the variables indicated by the coordinates attribute have a
> cf_role geometry_x_nodeand geometry_y_node to determine that these are
> geometries according to this new specification.
> > Find the coordinate index variable with geom_coordinates that point to
> the nodes.
> > Find the variable with contiguous_ragged_dimension pointing to the
> dimension of the coordinate index variable to determine how to index into
> the coordinate index.
> > Iterate over polygons, parsing out geometries using the contiguous
> ragged start variable and coordinate index variable to interpret the
> coordinate data variables.
> > Or, without reference to timeSeries:
> >
> > See CF-1.8 conventions.
> > See the geom_type of multipolygon.
> > Find the variable with a contiguous_ragged_dimension matching the
> coordinate index variable?s dimension.
> > See the geom_coordinates of x y.
> > Using the contiguous ragged start variable found in 3 and the coordinate
> index variable found in 2, geometries can be parsed out of the coordinate
> index variable and parsed using the hole and break values in it.
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170202/4ce5b42f/attachment.html <
> http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170202/4ce5b42f/attachment.html>>
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> >
> >
> > ------------------------------
> >
> > End of CF-metadata Digest, Vol 166, Issue 3
> > *******************************************
> >
> >
> >
> > --
> > Sincerely,
> >
> > Bob Simons
> > IT Specialist
> > Environmental Research Division
> > NOAA Southwest Fisheries Science Center
> > 99 Pacific St., Suite 255A      (New!)
> > Monterey, CA 93940               (New!)
> > Phone: (831)333-9878            (New!)
> > Fax:   (831)648-8440
> > Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
> >
> > The contents of this message are mine personally and
> > do not necessarily reflect any position of the
> > Government or the National Oceanic and Atmospheric Administration.
> > <>< <>< <>< <>< <>< <>< <>< <>< <><
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170203/4ff55def/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> ------------------------------
>
> End of CF-metadata Digest, Vol 166, Issue 5
> *******************************************
>



-- 
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/6d2b1adf/attachment.html>


More information about the CF-metadata mailing list