[CF-metadata] original_ensemble_size

Hedley, Mark mark.hedley at metoffice.gov.uk
Mon Aug 17 07:33:30 MDT 2015


many thanks to all the contributors to date on this discussion

I have prepared a trac ticket, based on my latest considered opinion following the discussions here

http://cf-trac.llnl.gov/trac/ticket/142

please consider further comments on this ticket, if appropriate

thank you
mark

________________________________
From: CF-metadata [cf-metadata-bounces at cgd.ucar.edu] on behalf of Hedley, Mark [mark.hedley at metoffice.gov.uk]
Sent: 03 August 2015 17:10
To: Jim Biard; cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] original_ensemble_size

Hello Jim

I like your thinking on this topic.  I agree with you that this feels like metadata about the coordinate and that the coordinate could carry further information about the ensemble.

A new coordinate attribute named ensemble_size, which is limited in scope to being used on a coordinate with a standard_name of realization sounds like a neat and simple solution to me

If this approach is interesting, then I think that your suggestion of a new coordinate type in Chapter 5 is a good one.  This would provide some nice consistency with spatial and temporal coordinates and give  scope for future work in more detailed descriptions of ensembles.

At the moment I think there is some value in separating the description of an ensemble dimensions from the description of statistical processes performed with respect to an ensemble dimension.  This leads me away from using cell_methods to meet my use case.

many thanks
mark

________________________________
From: Jim Biard [jbiard at cicsnc.org]
Sent: 29 July 2015 16:02
To: Hedley, Mark; cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] original_ensemble_size

Mark,

It seems to me that we have quite a few examples of coordinate variables that have extra attributes that further define the contents. Time coordinate variables, for example, have the calendar attribute. There are many standard names that direct the developer to specify an attribute (most always comment or flag_values/flag_meanings attributes) that further defines the contents. Any variable can validly have attributes associated with it.

If having a specific attribute name that's not mentioned in the CF Conventions document called for in a standard name definition is too troubling, the standard name definition could call for putting a string of the form 'ensemble size N' in a comment attribute. Or it could call for putting the ensemble size in a comment section in the cell_methods attribute on the data variable as Jonathan and Karl's suggested. Jonathan and Karl's suggestions imply a change to the conventions, since they propose new standardized cell method comment names.

In all of these cases the information will be available to a human who reads a file dump, but none of them make the information immediately available to software automation. The addition to the cell_methods attribute grammar is likely the least intrusive way to make it something that people can write general software for. The down side to this approach is that the information is not held with the coordinate variable, which is the most natural place for it.

Another alternative is to add a new section to Chapter 5 that defines an ensemble or sample pool coordinate type (or whatever name you prefer). It may be worth the extra trouble to go ahead and give it formal recognition instead of trying to work it into existing forms that, in my opinion, don't fit it too well. I appreciate the desire to find the least intrusive way to modify the conventions, but we can end up painting ourselves into corners in the process.

Grace and peace,

Jim

On 7/29/15 3:24 AM, Hedley, Mark wrote:
Hello Jim

this is a really neat alternative approach

I agree that the information about the ensemble_size is closely related to the realization coordinate and less closely related to the data variable, so this method encapsulates the metadata nicely.

Whilst the solution is elegant, I cannot see a previous example of a coordinate variable within CF defining extra attributes, so I'm a bit wary that this approach will require a change to the conventions document, not just a new standard_name.

Is there a neat way to use CF to provide metadata about a coordinate, rather than about a data variable?

I think it's well worth considering, but it may be a path of some resistance

many thanks
mark

________________________________
From: CF-metadata [cf-metadata-bounces at cgd.ucar.edu<mailto:cf-metadata-bounces at cgd.ucar.edu>] on behalf of Jim Biard [jbiard at cicsnc.org<mailto:jbiard at cicsnc.org>]
Sent: 23 July 2015 13:11
To: cf-metadata at cgd.ucar.edu<mailto:cf-metadata at cgd.ucar.edu>
Subject: Re: [CF-metadata] original_ensemble_size

Hi.

It seems to me that you would want a coordinate variable with the standard name 'realization' (whether scalar or multi-valued) and give it an attribute with the name 'ensemble_size'. You can store the realization number in the variable and the ensemble size in the attribute.

Grace and peace,

Jim

On 7/23/15 6:11 AM, Hedley, Mark wrote:
I use the
'coordinates'
attribute on my data variable, referencing the scalar 'ensemble_size' variable, thus defining this ensemble_size as a scalar coordinate variable for the temperature dataset

mark

________________________________
From: CF-metadata [cf-metadata-bounces at cgd.ucar.edu<mailto:cf-metadata-bounces at cgd.ucar.edu>] on behalf of Karl Taylor [taylor13 at llnl.gov<mailto:taylor13 at llnl.gov>]
Sent: 22 July 2015 22:53
Cc: CF Metadata List
Subject: Re: [CF-metadata] original_ensemble_size

Hi all,

I'm still curious about something:

Suppose we have the temperature field stored from one member of an ensemble of size 10.   We want to make the size of the ensemble known to the user.   We store 10 as a scalar variable with standard name "ensemble_size", but how does that scalar get associated with our temperature variable (other than it having being stored in the same file)?

cheers,
Karl

On 7/22/15 1:59 AM, Hedley, Mark wrote:
Hello John, Karl et al

I'm not sure I agree with John's last statement. I think that an ensemble is a defined collection of members, so my need is the need for ensemble size to be defined explicitly.
The distinction that not all members may be present characterises the need for this metadata descriptor, rather than just using the dimension size of realization, which does not meet my requirement.

On reflection, I think that I prefer Karl's name of 'ensemble_size'

To restate my use case, I have a data set from an ensemble, where there is a coordinate variable called 'realization'.  Let's say there are 23 members, this dimension is size 23.

I want to reference the number of members in the ensemble, whilst sub-setting the data variable in various ways.

The suggestion is to add a scalar coordinate to my original dataset, which contains the number of members in the ensemble.  Then any sub-setting operation will retain this coordinate, and I will always be able to state that this member is member 0 of 23, 5 of 23 etc

One requirement I have is to slice this variable, to result in a 2D data array, 2 1D coordinate variables: latitude and longitude; with all other coordinates as scalars.

If it is reasonable to talk about an ensemble as a defined collection of members, then I agree with Karl, that a standard_name of 'ensemble_size' fits the bill.  The description fits my use case nicely

many thanks
mark


________________________________
From: CF-metadata [cf-metadata-bounces at cgd.ucar.edu<mailto:cf-metadata-bounces at cgd.ucar.edu>] on behalf of John Graybeal [jbgraybeal at mindspring.com<mailto:jbgraybeal at mindspring.com>]
Sent: 22 July 2015 05:52
To: Karl Taylor
Cc: CF Metadata List
Subject: Re: [CF-metadata] original_ensemble_size

Karl,

To my understanding (then and now), the use case is explicitly not what your definition describes. The entire point of the request was to provide a label that was clearly distinguished from the typical concept of ensemble size.

John



On Jul 21, 2015, at 16:36, Karl Taylor <taylor13 at llnl.gov<mailto:taylor13 at llnl.gov>> wrote:

Dear all,

I wonder if the following might also meet requirements of the use case:

name: ensemble_size

description: The number of member realizations in an ensemble.  This name provides context for any specific realization, which might not be co-located with the other members of the ensemble.

Karl

On 7/20/15 9:49 PM, John Graybeal wrote:
To save others the lookup, the use case phrasing that Mark signed on to were these words: "In my use case, the whole ensemble is not present, I only have a subset of the members. I have a metadata element telling me how many members there were at the time the ensemble was created, which I would like to encode."  The entire thread is titled 'realization | x of n', but it is pretty, umm, rich with detail.

The last email before discussion went silent appears to be mine:

Modified to fit Mark's use case, I think suitable text is:

name: original_ensemble_size

description: The number of member realizations in the originally constituted ensemble. This provides context for any specific realization, for example orienting a member relative to its original group (even if the group is no longer intact).

This does not mention forecasting, preserves the origination concept, and gives a bit of context, without constraining the application. It could even be an ensemble of observations, or cat videos, or ... you get the idea.

I will let someone else provide the example of how that is associated with the variable, it will be more authoritative!

John


On Jul 20, 2015, at 14:42, Karl Taylor <taylor13 at llnl.gov<mailto:taylor13 at llnl.gov>> wrote:

Hi Mark,

I didn't quite understand how the standard name gets associated with a variable (containing 1 or more realizations from the ensemble).   Someone said it was through a scalar coordinate variable, but I don't see how the ensemble member is a function of the ensemble size, so why would this be appropriate?

Could you supply an example?

Also, I didn't follow why "original" was included in "original ensemble size".  Surely, you wouldn't report this number unless you thought the ensemble size was pretty much set and wouldn't change.  In that case there shouldn't be a need for a "modified ensemble size", so wouldn't "ensemble size" suffice?

thanks,
Karl


On 7/20/15 9:24 AM, Hedley, Mark wrote:
Hello CF

Late last year we had a discussion about storing

original_ensemble_size

in a CF file
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/thread.html#57756

There were a few options discussed, with John Graybeal making the suggestion

original_ensemble_size
description: The number of members constituting an ensemble.


for a new standard_name definition, which seemed to fit the case very well

It does not seem to have been adopted into the standard names list as yet.

Please may this name and definition be adopted, or reasons not to detailed here?

thank you
mark





_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata







_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
[CICS-NC] <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc>       Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
formerly NOAA’s National Climatic Data Center
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org<mailto:jbiard at cicsnc.org>
o: +1 828 271 4900

Connect with us on Facebook for climate<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us on Twitter at @NOAANCEIclimate<https://twitter.com/NOAANCEIclimate> and @NOAANCEIocngeo<https://twitter.com/NOAANCEIocngeo>.


--
[CICS-NC] <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc>       Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
formerly NOAA’s National Climatic Data Center
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org<mailto:jbiard at cicsnc.org>
o: +1 828 271 4900

Connect with us on Facebook for climate<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us on Twitter at @NOAANCEIclimate<https://twitter.com/NOAANCEIclimate> and @NOAANCEIocngeo<https://twitter.com/NOAANCEIocngeo>.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150817/ab908c34/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00001.png
Type: image/png
Size: 15784 bytes
Desc: ATT00001.png
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150817/ab908c34/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CicsLogoTiny.png
Type: image/png
Size: 15784 bytes
Desc: CicsLogoTiny.png
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150817/ab908c34/attachment-0003.png>


More information about the CF-metadata mailing list