[CF-metadata] A Pre-Proposal for Additional standard_names for String Variables

Bob Simons - NOAA Federal bob.simons at noaa.gov
Fri Feb 10 13:18:10 MST 2017


Background:

As you all know, the vast majority of standard_names are for numeric
variables and have an associated "Canonical Units".
However, there are some existing standard_names for string variables
(e.g., area_type, institution, land_cover, land_cover_lccs,
platform_id, platform_name, region, sensor_band_identifier, source,
and surface_cover). They do not have associated Canonical Units.

If the "data_type=string" and "charset" attributes are accepted by CF
and thus we can clearly identify String variables and know their
character encoding, I would like to propose that we add several
additional standard_names that identify/describe the String variables
in the same way that other standard_names describe numeric variables.

I want to get your comments and suggestions before I formally propose them.

Here are the possible additional String standard_names,
with definitions and [comments in brackets]. I have current
needs/uses for almost of these (most exceptions are noted below),
e.g., I have a tabular dataset where each row has information
about a different project at NOAA.

doi
  Each String specifies a single Digital Object Identifier.
email_address
  Each String specifies a single email address.
phone_number
  Each String specifies a single,
  voice (thus not including fax numbers),
  international (thus starting with +countryCode)
  phone number.
  The E.164 format is required:
    +countryCode subscriberNumberIncludingAreaCode
  e.g., "+1 202 456 1111" (The White House!)
  https://en.wikipedia.org/wiki/E.164
  Spaces between the country code, the area code, the prefix,
  and the number are strongly encouraged by not required.
  Parentheses and dashes are discouraged.
uri
  Each String specifies a single URI.
url
  Each String specifies a complete, single URL.
  It must start with a "scheme" (http:// , https:// , ftp:// , etc.).
  [It would be possible in the future add related
  standard_names by appending a specific subtype,
  e.g., url_project_webpage, url_iso19115_2, url_image
  if there is a need and if people think it's a good idea.]

html_document
  Each String specifies a complete HTML document.
  [I am not sure about this one.
  I admit I don't have a current use case, but I think it is
  important to distinguish a complete HTML document from a snippet.]
html_description
  Each String is a snippet of text using HTML markup
  which describes something [e.g., a project, a buoy,
  the condition of a beached whale, ...]
html_snippet
  Each String is a snippet of text using HTML markup
  tags that isn't a complete HTML document.
  This is to be used for html snippets whenever there isn't
  a suitable, more specific variant,          //italics
  e.g., html_description
  [I'm open to words other than "snippet".]
json
  Each String is JSON-text: a JSON object, array, number, string,
  or one of the following three literal names: false, null, true.
  See http://www.rfc-editor.org/rfc/rfc7159.txt
json_geojson
  Each String is GeoJSON, as specified by
  https://tools.ietf.org/html/rfc7946
wkt_geometry
  Each String specifies a complete WKT geometry
  as specified in the ISO/IEC 13249-3:2016 standard,
  "Information technology – Database languages – SQL multimedia
  and application packages – Part 3: Spatial" (SQL/MM).
  [If additional variants need to be specified in the future,
  we can append _*subtype*, e.g., wkt_geometry_iso13249_3_2016.
  NOTE that the use of wkt_geometry with a String variable
  (a multidimensional char with a charset attribute) doesn't
  preclude other methods of storing geometries.]
wkt_crs
  Each String specifies a WKT CRS as specified by
  ISO 19162:2015, "Geographic information – Well-known text
  representation of coordinate reference systems".
xml_document
  Each String specifies a complete XML document.
  Use this only if there isn't a suitable, more specific variant,
  e.g., xml_iso19115_2.
  [I am not sure about this one.
  I admit I don't have a current use case, but I think it is
  important to distinguish a complete XML document from a snippet.]
xml_iso19115_2
  Each String specifies a complete ISO 19115-2 / ISO 19139 XML
  document.
  [Ted Habermann: does this make your day? :-) ]
xml_iso19115_1
  Each String specifies a complete ISO 19115-1 XML document.
  [My need for this is not immediate, but I know it is coming.]


Additional Comments

These are somewhat different than the current standard_names.
Here is the reasoning behind them:

As with existing standard_names, the goal was short, human-readable
names which follow the CF naming convention.

syntax_meaning -
Although MIME types are too general for our purposes and only
apply to entire documents, I like their use of type/subtype
(although I used '_' as the separator instead of '/')
and I like that the "type" prefix can serve a software-related function
(e.g., all standard_names above that start with "xml" indicate
that the content can be parsed with an XML parser).
So when relevant, the proposed standard_names specify syntax
and meaning, using for format *syntax_meaning*, e.g., xml_iso19115_2.
Interestingly, I think the actual ISO 19115-2 document just
specifies the meaning/content, while ISO 19139 specified the XML
representation of that content, so it is a good example of the
need for *syntax_meaning* notation.

text_plain -
I didn't include anything like "text_plain" because that is,
in a practical sense, the default for Strings, and because it
is implied by more specific standard_names like existing
platform_name, region, source.

single vs. plural -
For many standard_names, I specified that each String specify
a single item. I'm open to allowing multiple values if the
separator is specified in the standard_names definition.

Thank you for considering these names.

-- 
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons at noaa.gov

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170210/d362a3b4/attachment.html>


More information about the CF-metadata mailing list