# [CF-metadata] Recording "day of year on which something happens"

Hollis, Dan dan.hollis at metoffice.gov.uk
Mon Mar 27 07:45:32 MDT 2017

```Hi all,

(Several of last week's postings only appeared in my inbox on Friday and were not in chronological order of when they were sent - apologies if I've missed something or am repeating something already posted by others.)

A couple of thoughts:

Regarding the quantity itself, my feeling is that "day of year" is better than "days since YYYY-MM-DD" (as noted by others, the former makes intercomparisons between different years that little bit easier). However I'm not sure how that works for a period that spans two calendar years e.g. dates of first frost and last frost for the period 1st July to 30th June. If we use "day of year" then the first frost might have values in the range 270-330 in the first calendar year, while the last frost might have values of 100-150 in the second calendar year. Would it be obvious to a user how to interpret these values?

So, how about using "day of period" instead? By this I mean the number of days since the start of the bounds for the time coordinate. In my example: 1 = 1st July, 2 = 2nd July,.., 365 = 30th June. However if another user defined their period of interest to be 1st Aug to 1st May (I think Jim said he used this definition) then in this situation 1 = 1st Aug, 2 = 2nd Aug,.., 274 = 1st May. This approach would be completely flexible i.e. you could handle any period of interest and you could even store values for periods that span several calendar years without ambiguity. If you need the actual date of the event (e.g. to allow you to compare Jim's data with mine) then you could add the bounds start to the "day of period" value. The other benefit would be that higher values always meant chronologically later events (unlike "day of year" where the last frost has lower values than the first frost despite occurring later in the period).

Secondly, regarding cell methods, it occurs to me that there are several existing examples where the cell methods describe what happens before and after a threshold is applied but they do not describe the effect of the threshold itself. For example, I believe that valid cell methods for "number_of_days_with_air_temperature_above_threshold" could be:

"time: minimum within days   time: sum over days"

The fact that a time series of daily minimum temperatures (created by the first part of the cell methods) is magically transformed into a series of 0s and 1s (which can then be summed according to second part of the cell methods) is implicit in the standard_name (and its associated definition and scalar coordinate variable). The transformation itself is _not_ captured by the cell methods.

Following this approach for 'first frost', we could imagine standard names like:
"day_number_of_air_temperature_below_threshold"

and cell methods something like:
"time: minimum within days   time: minimum over days"

The fact that the time series of daily minimum temperatures is 'magically' transformed into a series of day numbers (but only when the min temp is below the threshold) is implicit in the standard name (and is deliberately not captured by the cell methods).

'last frost' could use the same standard name but slightly modified cell methods:
"time: minimum within days   time: maximum over days"

I'm not certain, but maybe this could also be applied to things like degree days i.e. the cell methods would capture what happened before and after, but they would not capture the actual transformation from daily temperature to degree days.

Hope these ideas are useful.

Regards,

Dan

Dan Hollis   Climatologist
Met Office   Hadley Centre   FitzRoy Road   Exeter   Devon   EX1 3PB   United Kingdom
Tel: +44 (0)1392 884535   Mob: +44 (0)7342058682   Fax: +44 (0)1392 885681
E-mail: dan.hollis at metoffice.gov.uk   Website: http://www.metoffice.gov.uk
For UK climate and past weather information, visit http://www.metoffice.gov.uk/climate

-----Original Message-----
Sent: 22 March 2017 12:47
To: Jon Blower; cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] Recording "day of year on which something happens"

Dear all,

1. I think that it is essential to also capture the temperature threshold used to calculate the GDD. The Wikipedia article suggests that 10 degC is most common (or even standard), but in my experience 5 degC is common, and this is what ETCCDI and ET-SCI are using in their definitions. And it seems that 5 degC is used in the scientific paper cited in Wikipedia article. Hence, a CF standard name should somehow be able to capture which threshold temperature is used.

2.The Wikipedia article mentions that "maximum temperature is usually capped at 30 °C" but this is to my understanding in relation to the simplified calculation using  diurnal midrange, (Tmax+Tmin)/2. as a 'proxy' for daily mean temperature. And it is not clear what is meant by "usually" in this context. For example ETCCDI and ET-SCI  definitions do not impose this upper limit, and their definitions are very well established and have for example been used in several rounds of IPCC assessments. Now, there might be (are) alternative definitions out there, some tweaked for specific purposes that involves all sorts of complexities. But if we are going to work towards defining some standard names I think it would be good to begin with the well-established and widely disseminated definitions, cf. the AMS glossary http://glossary.ametsoc.org/wiki/Growing_degree-day.

Another aspect to keep in mind is that with modern (since a couple of decades...) measurement equipment it is possible to get higher temporal resolution than daily mean temperature (proper,  or estimated as diurnal mid-range). Hence there is also the very similar concept of growing degree hours, cf. http://glossary.ametsoc.org/wiki/Growing_degree-hour

Preferably, a standard name should be agnostic to the temporal resolution of the input data.

4. I guess that the cell methods, and units, will become clearer once the standard name has been teased out.

Jon, I would be interested to learn what your project colleagues are using.

Finally, to refocus back on the original topic of this thread (that is much broader than growing degree day thresholds dates) : We should not forget that there are many other use cases for a CF mechanism to record the first/last/etc. timing of an event in relation to a reference time.

Kind regards,
Lars

-----Original Message-----
Sent: den 22 mars 2017 10:34
Subject: Re: [CF-metadata] Recording "day of year on which something happens"

Dear all,

Thanks again for the helpful replies to my last summary email. I’ll pick up on the points here:

1. I realise that my use of the word “threshold” may have been confusing in this context. I was following the precedent set by previous standard names. The variable would record the day of the year (or growing season) on which the “threshold” number of degree days is attained. The possible values of this threshold are stored in a coordinate variable. This is very different from the “threshold” temperature that is used in the calculation of the “growing degree day” parameter itself.

2. I’m not at all an expert here, but my understanding is that there are various possible ways to calculate GDDs. Lars has helpfully pointed out that ET-SCI and ETCCDI definitions exist, and I’ll pass these on to the project team – maybe that’s what the team are using. But anyway, I’m not totally sure that “integral_of_air_temperature_excess_wrt_time” is strictly accurate in all cases, since it’s not always simply a question of integrating some “delta-T” over time. The Wikipedia article points out some ways in which GDD is not a strict integral (e.g. in some cases it is considered that there is a maximum number of GDDs that can be meaningfully attained in a day).

3. David correctly pointed out the “Northern Hemisphere chauvinism” in my proposal. Our project is focused on Europe, but it is quite correct to consider how the same approach might apply to the southern hemisphere growing season.

4. I’m still not convinced about using “sum” in cell_methods. This might be appropriate if the variable in question were GDD, but the variable is actually a _time_ at which we reach a certain number of GDD. We are not summing time, so I’m not sure that using “sum” is right. Happy to be corrected on this though, maybe I’ve misunderstood the intention.

I think I need to discuss these issues within the project to work out exactly what we’re actually going to be recording – I’ll do this and report back our thoughts.

Many thanks again,
Jon

_______________________________________________