[CF-metadata] a different (but perhaps unoriginal) approach to standard name construction

John Graybeal graybeal at mbari.org
Tue Nov 4 09:55:42 MST 2008


I love the list of classifiers and hope that discussion can continue.  
Having also tried to come up with a pervasive system for standard  
names (both in CF and in other contexts) over the years, here are some  
observations.

Naming Effort: It appears CF standard names were originally Much More  
about coming up with the right name, and partially partitioning useful  
characteristics, than about a precise definition.  This reflects the  
original community needs, i think; as community needs for precision  
have grown, so has attention to the definition. But Jonathan is spot- 
on: getting a name that reflects both the meaning AND community usage  
has been the challenge. While it frustrates name proposers, it  
provides great comfort to users.

Normalization and uniqueness: If I understand the proposal correctly,  
it calls for tracking all the orthogonal classifiers as possible  
components of the standard name. ('These independent bits of  
information could be automatically assembled together to create the  
"standard name".') Is this any different from a database key  
construction from multiple independent columns of data? Each unique  
combination of the n components makes another possible name, and the  
meaning is encoded into the name itself. Exclusion of a component from  
the name means all values are accepted in that axis.

Length and Complexity: It will be a Very Long standard name in many  
cases. No technical limitations, probably, but social reaction to  
these long names will be poor at best. (And will depend on some  
particularly clever way to indicate omitted categories when  
constructing the name.) Of course, more common cases will usually be  
shorter, but people won't always put in the relevant categories, or  
won't realize they are relevant. ("Oh, c'mon, everyone knows that  
*has* to be over water.") Like filling out metadata, detail will be  
avoided during name creation, for better and for worse.

Unique Identifiers for Resources: I agree with Benno: CF absolutely  
should have a separate resource identifier on the web for (a) all the  
existing and historical standard names, and (b) any name you come up  
in this system.  (I am separately engaged in creating and serving  
identifiers for vocabulary terms, so of course I would feel that way.   
We just now have a service that can provide this; I just started  
pursuing its application for/with CF.)  As an aside, this proposal may  
be a case where using opaque codes as the identifier, and the standard  
name as a label string, offers improved value to users.

Unique Identifiers for Data Set Variable: This was proposed as a  
solution "to identify with a single standard name, closely related  
variables that one might want to store in a single array". I  
discourage using standard names as "the unique names for a data set",  
because there will always be a category for differentiating variables  
that isn't available in the standard convention. (primary vs secondary  
instrument, first/second/third installed sensor, clean/dirty, and on  
and on). Standard names should be used to describe each variable, not  
name it.

Defining Similarity: For a variable mapping exercise, we considered  
what makes one thing the 'same as' something else. The answer is (of  
course) 'it depends'. The great advantage of this proposed approach is  
that it 'normalizes' the distinctions into the separate categories, so  
the user can evaluate the match much more directly for his or her own  
needs. But be aware that it will move the discussions of similarity  
and difference into the next layer of semantic detail ("does 'body of  
water' include underground streams?" and so on).

Central Catalog: If the rules are deterministic, and every category  
has a controlled vocabulary, you don't need a single list of what  
names (i..e, combinations of categories) are approved; any possible  
combination of category terms is legal, right? This is fortunate, as  
the number of proposed names may indeed grow very large very quickly,  
and people will often just construct the names without bothering to  
submit them. You also don't need definitions; the definition is the  
compilation of all the displayed components in that name. (If it  
*isn't* the same as the aggregation, then there is by definition  
another axis of interest that needs to be turned into a category, or  
you will have 2 standard names that look the same but have different  
meanings.)  So this is really a system for creating a single-label  
categorization scheme across multiple axes; no catalog is strictly  
needed for the naming convention to work.

Semantics and Ontologies: WIth this proposal, we are much further into  
creating classification systems for all concepts relevant to CF names  
(as opposed to conceptually linking the existing CF concepts, which is  
slightly different). I think this is inevitably a direction to be  
taken by someone -- witness the Plasmo work -- but it turns the  
process into something very much like other knowledge classification  
efforts in the semantic community. That isn't a pro or a con, just an  
observation. There are lessons to be learned and tools to be reused  
from work that has gone before. In that regard, I would love to be  
informed of existing vocabularies (formal or informal) that exist for  
each of these categories, particularly the first two. (Can we start a  
wiki page for this info somewhere?)

In summary, I love this idea in principle, but think we can expect a  
stately progression toward seeing it in action. It serves a different  
need and audience than Standard Names, and so perhaps should be  
considered and developed separately, not necessarily as a replacement  
for them.

John

--------------
John Graybeal   <mailto:graybeal at mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org



More information about the CF-metadata mailing list