[CF-metadata] Add new integer types to CF?

Charlie Zender zender at uci.edu
Wed Sep 20 20:07:49 MDT 2017


Two weeks ago I suggested adding new numeric atomic types to CF.
There were two substantive responses:

1. Mary Jo Brodzik objected to this:

"Use of unsigned types to hold packed data is not permitted since
they are incapable of representing negative numbers."

That statement is certainly objectionable because it is false.
Both scale_factor and add_offset can be negative, so a positive
or negative value can be packed into an unsigned integer.
I can't think of any good reason to prohibit packing into unsigned.
I concur with Mary Jo and think CF 1.8 should allow it.
I no longer think Section 8.1 needs any modification.

2. Chris Barker questioned this proposed text:

"One byte numeric data should be stored using the byte or unsigned
byte data type. It is possible to treat the byte type as unsigned by
using the NUG convention of indicating the unsigned range using the
valid_min, valid_max, or valid_range attributes."

In Chris' words, "So is there an unsigned byte type or not? if so, why
bother with the text about valid-min, etc....?"

That text on unsigned bytes is copied directly from Section 2.2.
It defines and legitimizes the current practice of indicating that
a value _stored_ as a signed byte (which is all that CF and netCDF
CDF1 and CDF2 formats allow) should be _interpreted_ as an unsigned
byte when the attributes valid_min and brethren say so.
It seems to me that the text, though awkward, must be retained so
that CF continues to permit the current practice for backwards
compatibility. The CDF5 and netCDF4 binary formats both support
unsigned bytes as atomic types, so there will be no _need_ to store
unsigned bytes as signed bytes in those two formats. However, the
current text that defines the current "unsigned workaround" must
be included in CF 1.8 to continue to allow CDF1 and CDF2 formats
to represent unsigned bytes. Otherwise we would be requiring all
CF 1.8-compliant datasets with unsigned bytes to be stored in
CDF5 or netCDF4. This minor proposal (new atomic types) is intended
to provide new flexibility, not to force users of unsigned bytes to
migrate from CDF1/2 to CDF5/netCDF4. Perhaps the wording could be
improved but I thought it most straightforward to copy the text
directly from the existing CF. I think the draft text I originally
suggested for section 2.2 (datatypes) is appropriate.

In light of these points, my revised suggested draft language
to support new numeric atomic types would fit completely in Section
2.2 (datatypes) with no changes to Section 8.1 (Packing).

The current CF 1.8 draft reads (Section 2.2):

"The netCDF data types char, byte, short, int, float or real, and
double are all acceptable. The char type is not intended for numeric
data. One byte numeric data should be stored using the byte data
type. All integer types are treated by the netCDF interface as
signed. It is possible to treat the byte type as unsigned by using the
NUG convention of indicating the unsigned range using the valid_min,
valid_max, or valid_range attributes."

I suggest replacing that text with something like:

"The netCDF data types char, byte, unsigned byte, short, unsigned
short, int, unsigned int, int64, unsigned int64, float or real,
and double are all acceptable. The char type is not intended for
numeric data. One byte numeric data should be stored using the byte
or unsigned byte data type. It is possible to treat the byte type as
unsigned by using the NUG convention of indicating the unsigned range
using the valid_min, valid_max, or valid_range attributes. The
convention explicitly distinguishes between signed and unsigned
integer types only where necessary. Unless otherwise noted, int is
interchangeable with unsigned int, int64, and unsigned int64 in this
convention, including examples and appendices. Similarly short is
interchangable with unsigned short, and byte with unsigned byte."

Unsigned,
Charlie
-- 
Charlie Zender, Earth System Sci. & Computer Sci.
University of California, Irvine 949-891-2429 )'(



More information about the CF-metadata mailing list