[CF-metadata] Weighted means and cell_methods in the CMIP6 data request
martin.juckes at stfc.ac.uk
martin.juckes at stfc.ac.uk
Mon Nov 16 08:43:19 MST 2015
There are a number of variables in the CMIP6 data request which are requested as weighted means, such as the age of snow as a time mean weighted by mass of snow. We also have area weighted means, as in the monthly mean temperature of sea-ice weighted by sea ice area. The latter can be handled well enough with the existing cell_methods syntax: "time: mean where sea_ice". For the former, the CMIP5 approach was to have a comment indicating that weighting should be used. As this is a reasonably common operation it would be nice to have something more explicit in the cell_methods attribute.
I can think of several possibilities:
(1) add a "weighted-mean" method, and leave it up to the data provider to give additional information. This would at least alert the user that they need to look for additional information. This would be an improvement. In the present convention "time: mean" can mean either a simple mean or a weighted mean. By adding the "weighted-mean" option we would be able to stipulate that "time: mean" only be used for non-weighted means, and this would reduce an existing ambiguity.
(2) add a "weighted-by: <variable name>" option in the cell-methods comment statement, similar to the "interval: ..." clause, e.g. "time: mean (weighted-by: snw)". This would give more information, but if the comment is considered as optional it does not remove the ambiguity that "time: mean" can apply to either weighted or un-weighted mean. Making the comment obligatory for weighted means would blur the status of the comment. There is also the problem that since the variable "snw" is not going to be in the file the information remains incomplete.
(3) add a weighted clause, e.g. "time: mean [where .... ] weighted snm". The main problem here is that parsing cell_methods is already complicated, and this would add to that difficulty, though only in a small incremental way.
(4) as (1), but with a additional requirement that the dimension over which the weighted mean is being taken carry information about the weighting. The information could be attached either as a specified attribute "weighting". Because the weighting variable will generally be at a higher frequency than the weighted-mean we are trying to describe it will not be sensible to include it, so this attribute will at most provide a clue about the provenance. For example, it might be of the form "<variable name> [(<optional comment>)]".
e.g. 'weighting: snw (daily snow mass --- archived in the "day" MIP table)'.
The last option appears the cleanest to me, as it does not change the grammar of the cell_methods string and adds additional information to the relevant dimension in a fairly self-explanatory way.
Perhaps this has been discussed before? Any other thoughts?
More information about the CF-metadata