[CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

David Blodgett dblodgett at usgs.gov
Thu Feb 2 06:57:36 MST 2017


Dear CF Community,

We are pleased to submit this proposal for your consideration and review. The cover letter we've prepared below provides some background and explanation for the proposed approach. The google doc here <http://goo.gl/Kq9ASq> is an excerpt of the CF specification with track changes turned on. Permissions for the document allow any google user to comment, so feel free to comment and ask questions in line.

Note that I’m sharing this with you with one issue unresolved. What to do with the point featureType? Our draft suggests that it is part of a new geometry featureType, but it could be that we leave it alone and introduce a geometry featureType. This may be a minor point of discussion, but we need to be clear that this is an issue that still needs to be resolved in the proposal.

Thank you for your time and consideration.

Best Regards,

David Blodgett, Tim Whiteaker, and Ben Koziol

Proposed Extension to NetCDF-CF for Simple Geometries

Preface

The proposed addition to NetCDF-CF introduced below is inspired by a pre-existing data model governed by OGC and ISO as ISO 19125-1. More information on Simple Features may be found here. <https://en.wikipedia.org/wiki/Simple_Features> To the knowledge of the authors, it is consistent with ISO 19125-1 but has not been specified using the formalisms of OGC or ISO. Language used attempts to hold true to NetCDF-CF semantics while not conflicting with the existing standards baseline. While this proposal does not support the entire scope of the the simple features ecosystem, it does support the core data types in most common use around the community.

The other existing standard to mention is UGRID convention <http://ugrid-conventions.github.io/ugrid-conventions/>. The authors have experience reading and writing UGRID and have designed the proposed structure in a way that is inspired by and consistent with it. 

Terms and Definitions

(Taken from OGC 06-103r4 OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture <http://www.opengeospatial.org/standards/sfa>.)

Feature: Abstraction of real world phenomena - typically a geospatial abstraction with associated descriptive attributes.
Simple Feature: A feature with all geometric attributes described piecewise by straight line or planar interpolation between point sets.
Geometry (geometric complex): A set of disjoint geometric primitives - one or more points, lines, or polygons that form the spatial representation of a feature.
Introduction

Discrete Sampling Geometries (DSGs) handle data from one (or a collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile or timeSeriesProfile geometries. Measurements are from a point (timeSeries and Profile) or points along a trajectory. In this proposal, we reuse the core DSG timeSeries type which provides support for basic time series use cases e.g., a timeSerieswhich is measured (or modeled) at a given point.

Changes to Existing CF Specification

In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and variables into two types — instance and element <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>. Instance refers to individual points, trajectories, profiles, etc. These would sometimes be referred to as features given that they are identified entities that can have associated attributes and be related to other entities. Element dimensions describe temporal or other dimensions to describe data on a per-instance basis. This proposal extends the DSG timeSeries featuretype <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types> such that the geospatial coordinates of the instances can be point, multi-point, line, multi-line, polygon, or multi-polygon geometries. Rather than overload the DSG contiguous ragged array encoding, designed with timeseries in mind, a geometry ragged array encoding is introduced in a new section 9.3.5. See this google doc for specific proposed changes. <http://goo.gl/Kq9ASq>
Motivation

DSGs have no system to define a geometry (polyline, polygon, etc., other than point) and an association with a time series that applies over that entire geometry e.g., The expected rainfall in this watershed polygon for some period of time is 10 mm. As suggested in the last paragraph of section 9.1, current practice is to assign a representative point or just use an ID and forgo spatial information within a NetCDF-CF file. In order to satisfy a number of environmental modeling use cases, we need a way to encode a geometry (point, line, polygon, multi-point, multi-line, or multi-polygon) that is the static spatial feature representation to which one or more timeSeries can be associated. In this proposal, we provide an encoding to define collections of simple feature geometries. It interfaces cleanly with the existing DSG specification, enabling DSGs and Simple Geometries to be used concurrently.

Looking Forward

This proposal is a compromise solution that attempts to stay consisten to CF ideals and fit within the structure of the existing specification with minimal disruption. Line and polygon data types often require variable length arrays. Development of this proposal has brought to light the need for a general abstraction for variable length arrays in NetCDF-CF. Such a general abstraction would necessarily be reusable for character arrays, ragged arrays of time series, and ragged arrays of geometry nodes, as well as any other ragged data structures that may come up in the future. This proposal does not introduce such a general ragged array abstraction but does not preclude such a development in the future.

Three Alternative Approaches

Respecting the human readability ideal of NetCDF-CF, the development of this proposal started from a human readable format for geometries known as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text>. We considered three high level design approaches while developing this proposal.

Direct use of Well-Known Text (WKT). In this approach, well known text strings would be encoded using character arrays following a contiguous ragged array approach to index the character array by geometry (or instance in DSG parlance).
Implement the WKT approach using a NetCDF binary array. In this approach, well known text separators (brackets, commas and spaces) for multipoint, multiline, multipolygon, and polygon holes, would be encoded as break type separator values like -1 for multiparts and -2 for holes.
Implement the fundamental dimensions of geometry data in NetCDF. In this approach, additional dimensions and variables along those dimensions would be introduced to represent geometries, geometry parts, geometry nodes, and unique (potentially shared) coordinate locations for nodes to reference.
Selected Approach

The first approach was seen as too opaque to stay true to the CF ideal of complete self-description. The third approach seemed needlessly verbose and difficult to implement. The second approach was selected for the following reasons:

The second approach is just as or more human-readable than the third.
Use of break values keeps geometries relatively atomic.
Will be familiar to developers who are familiar with the WKT geometry format.
Character arrays, which are needed for options one and three, are cumbersome to use in some programming languages in common use with NetCDF.
Break values replace the need for extraneous variables related to multi-part and polygon holes (interiors). Multi-part geometries are generally an exception and excessive instrumentation to support them should be discounted.
Example: Representation of WKT-Style Polygons in a NetCDF-3 timeSeriesfeatureType

Below is sample CDL demonstrating how polygons are encoded in NetCDF-3 using a continuous ragged array-like encoding. There are three details to note in the example below.

The attribute contiguous_ragged_dimension with value of a dimension in the file.
The geom_coordinates attribute with a value containing a space separated string of variable names.
The cf_role geometry_x_node and geometry_y_node.
These three attributes form a system to fully describe collections of multi-polygon feature geometries. Any variable that has the continuous_ragged_dimension attribute contains integers that indicate the 0-indexed starting position of each geometry along the instance dimension. Any variable that uses the dimension referenced in the continuous_ragged_dimension attribute can be interpreted using the values in the variable containing the contiguous_ragged_dimension attribute. The variables referenced in the geom_coordinates attribute describe spatial coordinates of geometries. These variables can also be identified by the cf_roles geometry_x_node and geometry_y_node. Note that the example below also includes a mechanism to handle multi-polygon features that also contain holes.

netcdf multipolygon_example {
dimensions:
  node = 47 ;
  indices = 55 ;
  instance = 3 ;
  time = 5 ;
  strlen = 5 ;
variables:
  char instance_name(instance, strlen) ;
    instance_name:cf_role = "timeseries_id" ;
  int coordinate_index(indices) ;
    coordinate_index:geom_type = "multipolygon" ;
    coordinate_index:geom_coordinates = "x y" ;
    coordinate_index:multipart_break_value = -1 ;
    coordinate_index:hole_break_value = -2 ;
    coordinate_index:outer_ring_order = "anticlockwise" ;
    coordinate_index:closure_convention = "last_node_equals_first" ;
  int coordinate_index_start(instance) ;
    coordinate_index_start:long_name = "index of first coordinate in each instance geometry" ;
    coordinate_index_start:contiguous_ragged_dimension = "indices" ;
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ; // or projection_x_coordinate
    X:cf_role = "geometry_x_node" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = “latitude” ; // or projection_y_coordinate
    y:cf_role = "geometry_y_node"
  double someVariable(instance) ;
    someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
  int time(time) ;
    time:units = "days since 2000-01-01" ;
  double someData(instance, time) ;
    someData:coordinates = "time x y" ;
    someData:featureType = "timeSeries" ;
// global attributes:
    :Conventions = "CF-1.8" ;

data:

 instance_name =
  "flash",
  "bang",
  "pow" ;

 coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16,
    -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30, 31, 32, 33,
    34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;

 coordinate_index_start = 0, 30, 46 ;

 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
    5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30, 
    45, 10, 30, 25, 50, 30, 25 ;

 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 
    25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
    40, 40, 20, 5, 10, 15, 5 ;

 someVariable = 1, 2, 3 ;

 time = 1, 2, 3, 4, 5 ;

 someData =
  1, 2, 3, 4, 5,
  1, 2, 3, 4, 5,
  1, 2, 3, 4, 5 ;
}
How To Interpret

Starting from the timeSeries variables:

See CF-1.8 conventions.
See the timeSeries featureType.
Find the timeseries_id cf_role.
Find the coordinates attribute of data variables.
See that the variables indicated by the coordinates attribute have a cf_role geometry_x_nodeand geometry_y_node to determine that these are geometries according to this new specification.
Find the coordinate index variable with geom_coordinates that point to the nodes.
Find the variable with contiguous_ragged_dimension pointing to the dimension of the coordinate index variable to determine how to index into the coordinate index.
Iterate over polygons, parsing out geometries using the contiguous ragged start variable and coordinate index variable to interpret the coordinate data variables.
Or, without reference to timeSeries:

See CF-1.8 conventions.
See the geom_type of multipolygon.
Find the variable with a contiguous_ragged_dimension matching the coordinate index variable’s dimension.
See the geom_coordinates of x y.
Using the contiguous ragged start variable found in 3 and the coordinate index variable found in 2, geometries can be parsed out of the coordinate index variable and parsed using the hole and break values in it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>


More information about the CF-metadata mailing list