[CF-metadata] high sample rate (seismic) data conventions

Seth McGinnis mcginnis at ucar.edu
Fri Apr 7 16:43:49 MDT 2017

Hi Jonathan,

I would interpret the CF stance as being that the value in having
explicit coordinate variables and other ancillary data to accompany the
data outweighs the cost of increased storage.

There are some cases where CF bends away from that for the sake of
practicality (see, e.g., the discussion about external file references
for cell_bounds in CMIP5), but overall, my sense is that the community
feels that it's better to have things explicitly written out in the file
than it is to provide them implicitly via a formula to calculate them.

Based on my personal experiences, I think this is the right approach.
(In fact, I take it even further: I prefer to avoid data compression
entirely and to keep like data with like as much as possible, rather
than splitting big files into smaller pieces.)

I have endured far, far more suffering and toil from (a) trying to
figure out what's wrong with a file that violates some implicit
assumption (like "there are never gaps in the time coordinate") and (b)
dealing with the complications of various tactics for keeping file sizes
small than I ever have from storing and working with very large files.

YMMV, of course.  What are your data volumes like?  I'm working at the
terabyte scale, and as long as my file sizes stay under a few dozen GB,
I don't really even bother thinking about anything that affects the file
size by less than an order of magnitude.


Seth McGinnis


On 4/7/17 9:55 AM, Maccarthy, Jonathan K wrote:
> Hi all,
> I’m curious about the suitability of CF metadata conventions for
> seismic sensor data.  I’ve done a bit of searching, but can’t find
> any mention of how CF conventions would store high sample-rate data
> sensor data.  I do see descriptions of time series conventions, where
> hourly or daily sensor data samples are stored along with their
> timestamps, but storing individual timestamps for each sample of a
> high sample rate sensor would unnecessarily double the storage.
> Seismic formats typically don’t store time vectors, but instead just
> store vectors of samples with an associated start time and sampling
> rate.
> Could someone please point me towards a discussion or existing
> conventions on this topic?  Any help or suggestion is appreciated.
> Best, Jon _______________________________________________ CF-metadata
> mailing list CF-metadata at cgd.ucar.edu 
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

More information about the CF-metadata mailing list