[CF-metadata] Feedback requested on proposed CF Simple Geometries

Chris Barker chris.barker at noaa.gov
Tue Jan 31 10:52:52 MST 2017


A couple quick comments:

I think we're close here, so that's good. I'm not that clear on where tehre
are decisions left to be made, but I'll highlight two:

...

> Your aim is to
> describe the network alone.
>
...

>  a collection of timeseries is stored as a
> data variable with a single dimension of time and a single dimension of
> space.
>

I don't see a conflict here -- if you can describe the network (geometry)
then you can associate data with it (UGRID used indexes into cells, nodes,
etc, this should be equally applicable)

 > You would like to have SOMETHING alone in the file, just to

> describe the network itself. CF doesn't do this at present (domain without
> data),


isn't a set of coordinate variables essentially do that? i.e. you can
define a rectangular grid -- even if there is no data on it. And you can
certainly do that with UGRID, which is another standard, but I don't think
it conflicts with CF.


> Taking your previous comments into account (I'll come back to them below),
> as
> a modified version of what I suggested before, here's a possible way to
> handle
> this case, for a small number (3) of linestrings:
>

That looks good to me, I think...


>
>   data:
>     SOMETHING=2, 4, 3;
>     lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
>     lat=51, 52,  51, 50, 50, 49,  55, 55, 56;
>

I'm confused about what this is.

These simple geometries can be regarded as a more complex alternative to
> cells
> bounds - each timeseries has a complicated geometry of nodes and lines, but
> logically it's still a single "cell".


yup.


> For the sake of applications which can
> read CF but don't understand simple geometries, it might be a good idea in
> addition to provide a "representative" location for each timeseries, as
> representive_lat(station) and representative_lon(station), which could for
> instance be the mean of the node coordinates for each geometry.


We do that in UGRID, too -- I think it's even required (and called
coordinates, actually). It may make little sense with complex geometries,
but it can be handy.

> You propose the index variable in order for the convention to be like
> > ugrid. However this still seems to me to be an unnecessary complexity and
> > use of space if you aren’t going to have many shared nodes.



> To be frank, I'm not convinced by either argument. Regarding the first, in
> your
> example you don't reuse any points at all. Can you give an example where
> there
> is a lot of reuse?


The stream network example would be a good one. also things like political
boundaries -- they tend to be complex polygons with shared vertices.


> Regarding the second, I agree that it is a nuisance and
> unreliable to have to make comparisons with tolerance between
> floating-point
> numbers to determine equality. However, when you write a file, I suppose
> you
> can and would write exactly the same numbers for the coordinates of a node
> if
> it appears several times, wouldn't you? Thus the coincidence of nodes can
> be
> tested by *exact* equality of coordinates - no tolerance needed.
>

you still don't know fo sure if the vertices are the SAME or if the Happen
to be the same.

This is a tough one -- the "normal" GIS data model does not have shared
nodes (that I know of) so perhaps we should follow that. But this lack of
shared nodes is actually a substantial pain for GIS systems and uses --
there is a lot of complex "snapping" that needs to be done.  So I'm on the
fence about this -- I'm pretty convinced shared nodes are a better model,
but if we want to interact seamlessly with other GIS formats, we may be
better off matching that data model.

In my example above, I assumed the polygons have no holes in them, so I've
> omitted the inside/outside information. If needed, this information could
> also
> be an attribute e.g. SOMETHING:inout="OIIIOOOOIOO", with as many elements
> as
> there are polygons in total. Thinking again about it, I wonder whether this
> information is really needed. If you draw all the polygons, isn't it
> apparent
> which ones are inside anyway? When would you use this information?
>

it's not always clear. if there is a hole in a polygon, you can figure it
out, but if there is a lake in a land polygon, and a island in the lake,
then it gets pretty tricky.

I think shapefiles use clockwise vs anti-clockwise to indicate
inside-outside, but IIUC, they are pretty limited with nested polygons, too.


> My scheme avoids the use of break values, which you're not very keen on
> your-
> selves, it sounds like.


I don't like break values either.


> You wrote > - It is more difficult to extract a single geometry using this
> approach.  It's not hard, though, and the same comment would apply to the
> CF
> contiguous ragged array representation.


yes -- you can represent a ragged array by either specifying the
start-index of each "row", or by specifying the size of each row. CF
specifies the size of each row. I think that's a worse way to  do it --
it's similar if you are looping through  from the start, but much harder to
get an arbitrary row in the middle -- but I"ve gone with the the CF way for
other stuff [1] because it's better not to have two ways to do the same
thing. So we might as well stick with it here, too.

-CHB

[1] a netcdf format for particle tracking model output:

https://github.com/NOAA-ORR-ERD/nc_particles



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170131/a33df293/attachment-0001.html>


More information about the CF-metadata mailing list