[CF-metadata] high sample rate (seismic) data conventions

Jim Biard jbiard at cicsnc.org
Fri Apr 14 09:10:51 MDT 2017


Jonathan,

There is an associated convention, the Attribute Convention for Dataset 
Discovery (ACDD) 
<http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-3> 
that defines file-level attributes detailing things like 
time_coverage_resolution. The issue of high-rate time data arises with 
satellite data as well. The files often end up with a time variable 
containing lower-rate start times for groups of measurements. Some also 
have a second coordinate variable containing fixed relative times that 
are intended to be interpreted as offsets to the start times, but this 
is not a standardized practice.

Grace and peace,

Jim


On 4/10/17 4:33 PM, Maccarthy, Jonathan K wrote:
> Seth & Roy,
>
> Technically CF-compliant but “unconventional” is probably not the way 
> to go, as I’d miss out on the tools that use the convention, which is 
> the point of using the standard.  I think I just needed someone to 
> help me navigate the CF documents, as they’re rather dense and 
> unfamiliar:-)  Originally, I came across this EarthCube page 
> (https://www.earthcube.org/group/advancing-netcdf-cf) about expanding 
> CF conventions, and I thought I'd read that a seismologist was 
> involved.  Seismology formats are a zoo, so I’m always on the hunt for 
> a well-documented standard, especially one with community already 
> behind it:-)
>
> Thanks again, all!
>
> Best,
> Jon
>
>> On Apr 10, 2017, at 11:54 AM, Seth McGinnis <mcginnis at ucar.edu 
>> <mailto:mcginnis at ucar.edu>> wrote:
>>
>> Hi Jonathan,
>>
>> Oh, climate model outputs are also supposed to have a uniform sample
>> rate for the whole time series -- emphasis on *SUPPOSED TO*.  To my
>> dismay, I have encountered multiple cases where something went wrong
>> with the generation of the data files, resulting in missing or repeated
>> or weirdly-spaced timesteps, and sorting out the resulting problems is
>> how I came to appreciate the value of the explicit coordinate...
>>
>> As far as I know, you are correct that CF does not have a standardized
>> way to represent a coordinate solely in terms of a formula without
>> reference to a corresponding coordinate variable.
>>
>> However, that doesn't mean you couldn't do it and still have the file be
>> CF-compliant.  As far as I am aware (and somebody correct me if I'm
>> wrong), coordinate variables are not actually mandatory.
>>
>> So if, for reasons of feasibility, you found it necessary to do
>> something like the following, I believe that strictly speaking it would
>> be not just allowed but fully CF-compliant:
>>
>> dimensions:
>>  time = UNLIMITED; // (1892160000 currently)
>> variables:
>>  double acceleration(time);
>>    acceleration:long_name = "ground acceleration";
>>    acceleration:units = "m s-2";
>>    acceleration:start_time = "2017-01-01 00:00:00.01667"
>>    acceleration:sampling_rate = "60 hz"
>> data:
>>    acceleration = 1.324145e-6, ...
>>
>>
>> I actually have some files without any coordinate variables sitting
>> around from the intermediate stage of some processing I did; I checked
>> one with Rosalyn Hatcher's cf-checker, and it didn't complain, so I
>> think it is technically legal.  It's kind of a letter-of-the-law rather
>> than spirit-of-the-law thing, but it's at least theoretically compliant.
>> Up to you whether that would count as sufficiently suitable for your
>> use case.
>>
>> Cheers,
>>
>> --Seth
>>
>>
>>
>> On 4/10/17 10:54 AM, Maccarthy, Jonathan K wrote:
>>> Hi Seth,
>>>
>>> Thanks for the very helpful response.  I can understand the argument for
>>> explicit coordinates, as opposed to using formulae; I think it solves
>>> several problems.  The assumption of a uniform sample rate for the
>>> length of a continuous time series is deeply engrained in most seismic
>>> software, however.  Changing that assumption may lead to other problems
>>> (but maybe not!).  Data volumes for a single channel can be 40-100
>>> 4-byte samples per second, which is something like 5-12 GB per channel
>>> per year uncompressed.  Commonly, dozens of channels are used at once,
>>> though some of them may share time coordinates.  It sounds like this
>>> use-case is similar in volume to what you've used, and may be worth
>>> trying out.
>>>
>>> Just to be clear, however, would I be correct in saying that CF has no
>>> accepted way of representing the data as I've described?
>>>
>>> Thanks again,
>>> Jonathan
>>>
>>>> On Apr 7, 2017, at 4:43 PM, Seth McGinnis <mcginnis at ucar.edu 
>>>> <mailto:mcginnis at ucar.edu>
>>>> <mailto:mcginnis at ucar.edu>> wrote:
>>>>
>>>> Hi Jonathan,
>>>>
>>>> I would interpret the CF stance as being that the value in having
>>>> explicit coordinate variables and other ancillary data to accompany the
>>>> data outweighs the cost of increased storage.
>>>>
>>>> There are some cases where CF bends away from that for the sake of
>>>> practicality (see, e.g., the discussion about external file references
>>>> for cell_bounds in CMIP5), but overall, my sense is that the community
>>>> feels that it's better to have things explicitly written out in the 
>>>> file
>>>> than it is to provide them implicitly via a formula to calculate them.
>>>>
>>>> Based on my personal experiences, I think this is the right approach.
>>>> (In fact, I take it even further: I prefer to avoid data compression
>>>> entirely and to keep like data with like as much as possible, rather
>>>> than splitting big files into smaller pieces.)
>>>>
>>>> I have endured far, far more suffering and toil from (a) trying to
>>>> figure out what's wrong with a file that violates some implicit
>>>> assumption (like "there are never gaps in the time coordinate") and (b)
>>>> dealing with the complications of various tactics for keeping file 
>>>> sizes
>>>> small than I ever have from storing and working with very large files.
>>>>
>>>> YMMV, of course.  What are your data volumes like?  I'm working at the
>>>> terabyte scale, and as long as my file sizes stay under a few dozen GB,
>>>> I don't really even bother thinking about anything that affects the 
>>>> file
>>>> size by less than an order of magnitude.
>>>>
>>>> Cheers,
>>>>
>>>> Seth McGinnis
>>>>
>>>> ----
>>>> NARCCAP / NA-CORDEX Data Manager
>>>> RISC - IMAGe - CISL - NCAR
>>>> ----
>>>>
>>>>
>>>> On 4/7/17 9:55 AM, Maccarthy, Jonathan K wrote:
>>>>> Hi all,
>>>>>
>>>>> I’m curious about the suitability of CF metadata conventions for
>>>>> seismic sensor data.  I’ve done a bit of searching, but can’t find
>>>>> any mention of how CF conventions would store high sample-rate data
>>>>> sensor data.  I do see descriptions of time series conventions, where
>>>>> hourly or daily sensor data samples are stored along with their
>>>>> timestamps, but storing individual timestamps for each sample of a
>>>>> high sample rate sensor would unnecessarily double the storage.
>>>>> Seismic formats typically don’t store time vectors, but instead just
>>>>> store vectors of samples with an associated start time and sampling
>>>>> rate.
>>>>>
>>>>> Could someone please point me towards a discussion or existing
>>>>> conventions on this topic?  Any help or suggestion is appreciated.
>>>>>
>>>>> Best, Jon _______________________________________________ CF-metadata
>>>>> mailing listCF-metadata at cgd.ucar.edu 
>>>>> <mailto:CF-metadata at cgd.ucar.edu><mailto:CF-metadata at cgd.ucar.edu>
>>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>>>>
>>>> _______________________________________________
>>>> CF-metadata mailing list
>>>> CF-metadata at cgd.ucar.edu 
>>>> <mailto:CF-metadata at cgd.ucar.edu><mailto:CF-metadata at cgd.ucar.edu>
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-- 
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> 	*Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org <mailto:jbiard at cicsnc.org>
o: +1 828 271 4900

/Connect with us on Facebook for climate 
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics 
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us 
on Twitter at @NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and 
@NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170414/7a154f9e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CicsLogoTiny.png
Type: image/png
Size: 15784 bytes
Desc: not available
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170414/7a154f9e/attachment-0001.png>


More information about the CF-metadata mailing list