[CF-metadata] CF and multi-forecast system ensemble data

Francisco Doblas-Reyes Francisco.Doblas-Reyes at ecmwf.int
Thu Oct 12 07:27:01 MDT 2006


Hi,

The EU-funded ENSEMBLES project is generating a large set of 
seasonal-to-decadal (s2d) multi-forecast system ensemble hindcasts. 
Multi-forecast systems include multi-models and perturbed-parameter 
ensembles. The ENSEMBLES s2d hindcasts mimic the European multi-model 
seasonal ensemble operational forecasts. The data are written in GRIB, 
but we intend to improve their dissemination by making the data also 
available in NetCDF format.

We found that the current standard names do not allow to describe the 
structure of multi-forecast system ensemble forecasts using the CF 
convention. Therefore, we would like to propose some additional CF 
standard names to avoid ambiguities when coding multi-forecast system 
ensemble data:

1) experiment_identifier (STRING). The producing centre is responsible 
for assigning unique experiment identifiers for the different 
experiments created, and should (ideally) provide documentation of each 
experiment. It is possible for common experiment identifiers to be 
agreed between different centres, if they are carrying out a common 
experiment or creating a multi-model forecast system. There is no a 
priori guarantee that identical identifiers from different centres could 
refer to scientifically equivalent experiments.
2) originating_centre (STRING). Institution with scientific 
responsibility for the forecast system.
3) forecast_system_version_number (INTEGER, units=1). This number should 
be used to distinguish between different prediction systems used by the 
same institution. For instance, the Met Office will have to choose a 
system number for the GloSea model and a different one for the DePreSys 
system (both based on the HadCM3 coupled model). It is assigned by the 
producing centre and gives scientific details of the models used. A 
table online should provide the corresponding information.
4) forecast_method_number (INTEGER, units=1). This variable 
distinguishes forecasts made with the same underlying forecasting 
system, but where variations have been introduced such that the 
different integrations have different properties, most importantly 
different climate drift. An example is given by the several members of a 
perturbed parameter ensemble forecast, which should share the 
"forecast_system_version_number" but have different values of the 
"forecast_method_number". As for "forecast_system_version_number", a 
table online should provide the corresponding information.
5) ensemble_member_number (INTEGER, units=1). Different integrations 
made with the same origin, experiment identifier, method and system 
number created using initial-condition perturbations, which form a 
homogenous and a priori statistically indistinguishable ensemble.

A single multi-forecast system experiment includes data from multiple 
forecast systems, either from a single centre or from several. The 
variables 1-4 make a natural tuplet to define a particular homogenous 
multi-forecast system ensemble forecast. The ensemble is then spanned by 
the ensemble_member_number variable. For instance, a multi-model 
ensemble forecast or a perturbed-parameter ensemble is made of a 
collection of such tuplets.

Although not actually needed for distribution and archive purposes, 
additional variables with the same dimension as the variable 
experiment_identifier are also suggested:

1) original_distributor (STRING). Centre with responsibility for 
distribution of data, ie the centre who first made the data publicly 
available, and to whom queries of data integrity should be sent.
2) production_status (STRING). Operational, research or a user defined 
project identifier. The value "research" should be used for general 
research at a specific centre, while project_id should be used for 
specified international research projects.
3) sst_specification (STRING). It describes the use of the SSTs in the 
specific experiment and can take values such as "coupled", "observed", 
"predicted", "persisted anomaly" or "persisted absolute".
4) real_time (CHARACTER). It takes the values "true" or "false", 
according to whether the forecast was or not made in real-time. It is an 
attribute of the individual forecasts.
5) archive_date (INTEGER, units=days from specific date). Describes when 
the data was archived or published. The aim is to provide an approximate 
timestamp, to easily distinguish between recent experiments and much 
older ones. Also, in the case that data need to be corrected in a 
globally distributed data system, the archive_date could be used to 
distinguish between the older, original data and the newer, corrected 
data. This is an attribute of the individual forecast.

Some relevant issues for the encoding of multi-forecast system ensembles 
data are as follows:

- We use the variables "forecast_period" and "forecast_reference_time" 
as independent time variables employed to define the two time axes of a 
forecast dataset with several start dates, ie, both "forecast_period" 
and "forecast_reference_time" are multivalued. We believe that 
"forecast_period" cannot have time units referenced to a specific date 
as "forecast_reference_time" does. This is to prevent having in the file 
forecasts with the same verifying date but produced from a different 
start date (and, hence, intrinsically different). An alternative would 
consist in introducing an index dimension and make two one-dimensional 
auxiliary time coordinate variables with this dimension, as suggested by 
Jonathan Gregory in the thread "file with both run time and forecast 
(valid) time coordinates". Any thoughts about this?

- It has to be mentioned that although "realization" is an existing 
standard name to handle ensembles, it can be used to identify either a 
forecast_system_version_number (a member of a multi-model ensemble from 
the same institution), a forecast_method_number (a member of a 
perturbed-parameter ensemble) or an ensemble_member_number (a member of 
an initial-condition ensemble). This is problematic as a multi-forecast 
system ensemble dataset might have to use those three variables 
dimensioned independently. Therefore, the use of these three variables 
is suggested instead to distinguish the elements of an experiment 
carried out with a multi-forecast system.

The proposed names take account of established practice at operational 
centres and usual practice in the research community of climate 
variability at different time scales. These names are part of a more 
general proposal to unambiguously define the appropriate metadata for 
multi-forecast system ensembles, which is based upon a more general 
proposal under discussion by WCRP. The proposal is available from:
http://www.ecmwf.int/research/EU_projects/ENSEMBLES/data/index.html

The names and data structure suggested in this message are likely to be 
relevant for other operational multi-forecast system ensemble forecast 
activities such as EUROSIP or TIGGE.

Apologies for the long message.
Best regards,
Paco
-- 
________________________________________

Francisco J. Doblas-Reyes
European Centre for Medium-Range
Weather Forecasting (ECMWF)
Shinfield Park, RG2 9AX
Reading, UK

Tel: +44 (0)118 9499 655
Fax: +44 (0)118 9869 450
f.doblas-reyes at ecmwf.int
_______________________________________


More information about the CF-metadata mailing list