[CF-metadata] point observation data in CF 1.4

Christopher Barker Chris.Barker at noaa.gov
Tue Nov 2 14:45:26 MDT 2010

On 11/2/10 6:09 AM, Wright, Bruce wrote:
> Sorry for a late follow-up (and once again breaking the thread), but
> below is some feedback from our guys running the particle trajectory
> models at the Met Office, which I think highlight the difficulties
> storing particle trajectories efficiently.

Thanks for the comments -- this supports what some conclusions had had 
been coming to:

> In a long (multi-year) air quality or risk assessment run, the total
> number of particles followed could be a thousand times the maximum
> number existing at any one time ...That suggests that
> padding out arrays to the total number of particles is not a sensible
> option.

Agreed, I've decided that that's not the way to go.

... in
> that it links particles arbitrarily according to whether they reuse the
> same space).

right -- that really isn't an option -- yes the storage space can be 
re-used, but it wouldn't mean that a given space in the array meant 

> An alternative is, at each time, to store the particle data and for this
> to include a particle id, without attempting to link particles at
> different times.

I think this is the way to go. In fact, I think the particle ID could be 
optional -- some applications don't keep an ID, and most post processing 
does care about the ID. However, an ID could be handy for linking 
particle properties that might be constant over time, but vary among 
particles, rather than storing the property over an over again.

> However retrieving a trajectory is then difficult as
> will have to search each time for the particle id required.

Yes, it would. My thought is that this is OK price to pay. In models 
that create and destroy particles, the trajectory of an individual 
particle is generally not of interest. Far more common is wanting to 
know about the collection of particles at a given time, so that's what 
should be easy to extract.

> Storing
> start and end time for each particle id would help, but restoring a
> complete trajectory would still be inefficient. One can think of ways
> round this: in a computer language one would have an array for each
> particle id giving the indices in each time slice corresponding to the
> particle (these arrays could be offset relative to the particle start
> time so they would not have to be very long), and then an array of such
> structures, one for each particle id. Can NetCDF do that?

Maybe, but the data can be re-constructed, so I wouldn't bother. Yes, it 
would require reading the whole file for one particles trajectory, but I 
don't think that's a common use case -- am I wrong? are folks likely to 
want to extract a particular particle's trajectory from a big data set?

> To make things more difficult it might also be useful to store
> trajectories with different length time-steps for different
> trajectories.

So some particles are using a larger time step than others? This gets a 
bit ugly yes, and I can't think of a use case either. I suppose it's 
possible that a model could use smaller time steps for particles that 
are in regions with faster-changing or more complex current fields, but 
does any model do this? If so, I'd imagine it would be sub-timestep 
process (like the intermediate results in a R-K integrator), and you 
wouldn't need/want to store the smaller steps anyway.

> For very long runs, one would probably not want to be forced to store
> everything in one very large file.

yup. I don't think that's hard to accommodate.

> I think it would be acceptable to have more than one format for storing
> data with different methods being efficient for different retrieval
> types, together with (slow) utilities for converting between these
> formats. Indeed that might be preferable if it enables things to be kept
> simple conceptually.

Maybe, but it seems that we can get one that fits the needs of everyone 
that has spoken up here, so that's a reasonable start.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov

More information about the CF-metadata mailing list