[CF-metadata] Recording "day of year on which something happens"
j.d.blower at reading.ac.uk
Tue Mar 21 10:15:09 MDT 2017
Many thanks indeed for all the replies to this thread. It seems that there are a few issues here that lots of people would like to resolve! I’m going to try to summarise my take on the discussions here. I’m aware I’m probably missing some important contributions – apologies if so, I’m having trouble keeping up with all the replies.
1. Our specific use case is to record the day of the year on which a given number of “growing degree days” are reached within the year. There are a few ways of calculating GDDs, as this page  explains, but it’s essentially some kind of integral of temperature over time. I’m not sure how best to record the exact method used, since I don’t think there are snappy names for the various options that can be captured in a standard name. I guess that we could use a fairly generic standard name (see point 4 below) and use documentation to explain the exact derivation we used.
2. The NetCDF file might record days of year for several “threshold” values of GDD, so I would probably use a dimension to hold all the possible thresholds. (So we can find the day of the year on which 100, 200, 1000 GDDs were reached, etc.)
3. The variable itself would probably be the day of the year, expressed as an integer between 1 (January 1st) and 366. I note the helpful warnings not to refer to this as a “Julian Day” (I’ve made that mistake before!) and I also note Nan’s comment that the Navy might regard day of year as a fractional quantity (so noon on Jan 1st is “1.5”). But I think it’s simplest just to regard the day of the year as an integer number, starting at 1. Doing something else would probably surprise the users (mainly crop growers).
4. So a reasonable standard name, following the precedents cited by Antonio, might be “day_of_year_when_growing_degree_days_exceeds_threshold”. I would imagine that this would be unitless? A comment could contain more information about the method used.
5. Alternatively, we could record the date on which the degree days exceed threshold by using a time variable (with units of “days since X”), meaning that each year’s measurements would contain a different range of data values (year 1 would be [1:365], year 2 would be [366:730] etc. However, this would make it a little harder for users to compare measurements from year to year, which would be a very common phenological use case, so I would tend to prefer the “day of year” option. (I think Nan made a point essentially agreeing with this.)
5. We need a dimension representing the year of measurement. Nothing greater than yearly precision is meaningful here. It seems that the “CF way” of doing this would be to use a “nominal date” within the year of, say, 1st July and record time_bounds that span the length of each year, as was suggested by Jim. But, to be honest, it would be simpler to record the year as a number, without further precision being involved (so the axis values would simply be integers, 2015, 2016, 2017 etc). Also, this would make it much easier for users to “slice out” the year(s) they are interested in using simple scripts, without first needing to figure out what the “nominal date” is within the year. Thoughts? (The question of time precision must have come up before!)
6. Finally, there’s the question of cell_methods for the time axis bounds (if we use them). Jonathan suggested that “if it’s not ‘point’ it should be ‘sum’”. But in this case, although GDD is a kind of sum, the quantity we are expressing (day of year) is not a sum – it’s the day on which the sum reached a certain value. So, I’m still not sure which cell_method might be appropriate.
Apologies for the long email, but thanks for listening, and any comments would be much appreciated.
More information about the CF-metadata