[CF-metadata] Pre-proposal for "charset"
chris.barker at noaa.gov
Mon Feb 27 18:07:32 MST 2017
On Wed, Feb 22, 2017 at 12:08 PM, Bob Simons - NOAA Federal <
bob.simons at noaa.gov> wrote:
> I do like ISO-8859-1, because
> * It is compatible with ASCII for chars 0-127, which is all that ASCII
> * Any variable that has just 7bit ASCII chars can be labelled
> * It is the most commonly used single-page 8bit charset for supporting the
> European languages.
> * It is widely used and supported.
all good. And I don't know if this is only the Python implementation, but
at least in Python, 8859-1 can read ANY binary data, and it round-trips
through a "proper" unicode object to get teh saem bytes back.
i.e. if the data are not 8859-1 or are malformed for some reason, the
8859-1 decoder will not error out on any input, and if you re-encode it,
you'll get back the same bytes you started with. Really nice property.
I do like UTF-8 because it is the only charset that supports full Unicode
> (all UTF-16/UCS-4/UTF-32 characters) in an 8bit encoding (since that is all
> we have for characters in netcdf-3 files: 8bit chars).
Again, I think this is a non-issue -- UTF-32 uses 4 bytes per char, i.e. 4
chars per codepoint. no reason you couldn't put UTF-32 encoded data in a
char array (C programmer do it all the time :-) )
> And it is incredibly widely used and supported in software.
All the rest of your reasons are good -- UTF-8 is the best choice.
So my proposal is: charset can specify any single-page (8bit) character
> set, but the two recommended charsets would be "ISO-8859-1" (for most
> simple cases) and "UTF-8" (for harder cases / full Unicode).
sounds good. though part of me wants to say that "ISO-8859-1" and "UTF-8"
should be the only options!
(darn those legacy files!)
Also -- I don't think yu can call UTF-8 an 8bit character set.
I'd also like the work "encoding" to be used instead of character set
wherever possible. "charset" comes from, and still implies, a 1-byte per
But that that's really a nitpick.
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CF-metadata