[CF-metadata] A Pre-Proposal for Additional standard_names for String Variables

Bob Simons - NOAA Federal bob.simons at noaa.gov
Fri Feb 10 13:18:10 MST 2017


As you all know, the vast majority of standard_names are for numeric
variables and have an associated "Canonical Units".
However, there are some existing standard_names for string variables
(e.g., area_type, institution, land_cover, land_cover_lccs,
platform_id, platform_name, region, sensor_band_identifier, source,
and surface_cover). They do not have associated Canonical Units.

If the "data_type=string" and "charset" attributes are accepted by CF
and thus we can clearly identify String variables and know their
character encoding, I would like to propose that we add several
additional standard_names that identify/describe the String variables
in the same way that other standard_names describe numeric variables.

I want to get your comments and suggestions before I formally propose them.

Here are the possible additional String standard_names,
with definitions and [comments in brackets]. I have current
needs/uses for almost of these (most exceptions are noted below),
e.g., I have a tabular dataset where each row has information
about a different project at NOAA.

  Each String specifies a single Digital Object Identifier.
  Each String specifies a single email address.
  Each String specifies a single,
  voice (thus not including fax numbers),
  international (thus starting with +countryCode)
  phone number.
  The E.164 format is required:
    +countryCode subscriberNumberIncludingAreaCode
  e.g., "+1 202 456 1111" (The White House!)
  Spaces between the country code, the area code, the prefix,
  and the number are strongly encouraged by not required.
  Parentheses and dashes are discouraged.
  Each String specifies a single URI.
  Each String specifies a complete, single URL.
  It must start with a "scheme" (http:// , https:// , ftp:// , etc.).
  [It would be possible in the future add related
  standard_names by appending a specific subtype,
  e.g., url_project_webpage, url_iso19115_2, url_image
  if there is a need and if people think it's a good idea.]

  Each String specifies a complete HTML document.
  [I am not sure about this one.
  I admit I don't have a current use case, but I think it is
  important to distinguish a complete HTML document from a snippet.]
  Each String is a snippet of text using HTML markup
  which describes something [e.g., a project, a buoy,
  the condition of a beached whale, ...]
  Each String is a snippet of text using HTML markup
  tags that isn't a complete HTML document.
  This is to be used for html snippets whenever there isn't
  a suitable, more specific variant,          //italics
  e.g., html_description
  [I'm open to words other than "snippet".]
  Each String is JSON-text: a JSON object, array, number, string,
  or one of the following three literal names: false, null, true.
  See http://www.rfc-editor.org/rfc/rfc7159.txt
  Each String is GeoJSON, as specified by
  Each String specifies a complete WKT geometry
  as specified in the ISO/IEC 13249-3:2016 standard,
  "Information technology – Database languages – SQL multimedia
  and application packages – Part 3: Spatial" (SQL/MM).
  [If additional variants need to be specified in the future,
  we can append _*subtype*, e.g., wkt_geometry_iso13249_3_2016.
  NOTE that the use of wkt_geometry with a String variable
  (a multidimensional char with a charset attribute) doesn't
  preclude other methods of storing geometries.]
  Each String specifies a WKT CRS as specified by
  ISO 19162:2015, "Geographic information – Well-known text
  representation of coordinate reference systems".
  Each String specifies a complete XML document.
  Use this only if there isn't a suitable, more specific variant,
  e.g., xml_iso19115_2.
  [I am not sure about this one.
  I admit I don't have a current use case, but I think it is
  important to distinguish a complete XML document from a snippet.]
  Each String specifies a complete ISO 19115-2 / ISO 19139 XML
  [Ted Habermann: does this make your day? :-) ]
  Each String specifies a complete ISO 19115-1 XML document.
  [My need for this is not immediate, but I know it is coming.]

Additional Comments

These are somewhat different than the current standard_names.
Here is the reasoning behind them:

As with existing standard_names, the goal was short, human-readable
names which follow the CF naming convention.

syntax_meaning -
Although MIME types are too general for our purposes and only
apply to entire documents, I like their use of type/subtype
(although I used '_' as the separator instead of '/')
and I like that the "type" prefix can serve a software-related function
(e.g., all standard_names above that start with "xml" indicate
that the content can be parsed with an XML parser).
So when relevant, the proposed standard_names specify syntax
and meaning, using for format *syntax_meaning*, e.g., xml_iso19115_2.
Interestingly, I think the actual ISO 19115-2 document just
specifies the meaning/content, while ISO 19139 specified the XML
representation of that content, so it is a good example of the
need for *syntax_meaning* notation.

text_plain -
I didn't include anything like "text_plain" because that is,
in a practical sense, the default for Strings, and because it
is implied by more specific standard_names like existing
platform_name, region, source.

single vs. plural -
For many standard_names, I specified that each String specify
a single item. I'm open to allowing multiple values if the
separator is specified in the standard_names definition.

Thank you for considering these names.


Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170210/d362a3b4/attachment.html>

More information about the CF-metadata mailing list