-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the draft PR, which looks to be in really good shape. It always bemuses me that the netCDF API is so elegant, yet conceals so much fine detail under-the-hood. This is probably a good chance to comment while things are in flux.
int quantize_mode; /**< Quantize mode. NC_NOQUANTIZE is 0, and means no quantization. */
int nsd; /**< Number of significant digits if quantization is used, 0 if not. */
Using nsd == 0 for no quantization is consistent with the existing API, e.g., DEFLATE level=0 indicates uncompressed data. However, nsd=0 seems inverted since the smaller the NSD the more quantization is performed. One alternative would be to use something like nsd=7 (for float) or nsd=15 (for double) to indicate no quantization. That would certainly be messier than nsd=0, though it would preserve the meaning of NSD. Thus I think your current implementation is better from a software engineering standpoint, though possibly more unclear to an end-user who might wonder how NSD can be 0.
Also I favor quantize_bitgroom_number_of_significant_digits
or _quantize_bitgroom_number_of_significant_digits
as Ed mentioned in an email to me, since those are more extensible to possible future quantization modes.
The mode and nsd arguments are integers, so that gives the option to use -1 (negative one) |
@czender would nsd=0 be a valid setting if I were not using it to turn off quantize? We already have an easy way to turn quantize off, which is to pass a mode flag of NC_NOQUANTIZE (which is 0). The only reason to have nsd = 0 turn off quantize was to be consisten with the deflate call. If we are not going to be consistent with deflate in this regard, then I would suggest that we use the quantize_mode to turn quantize off, and return NC_EINVAL for any invalid nsd value. If we change the name of the attribute to _quantize_bitgroom_number_of_significant_digits will you propose that as your CF convention? Will NCO switch to the new attribute name? Because, IIRC, you are now using a different name with NCO. |
@edwardhartnett nsd=0 is only valid theoretically, e.g., in the sense that 7 = 5 to 0 significant digits. However, when given NSD=0, the NCO BG implementation will complain that NSD must be a positive integer. As to your next question, I'm unsure what it means to "turn quantize off". That seems to be an implementation concept inherited from lossless compression. Once a number has been quantized, it always remains quantized. Help me understand how you envision the netCDF implementation working. Will the
Yes
Yes, NCO will adopt whatever convention CF agrees on in the end. For the time-being, I will change it to |
OK, so 0 is not a valid nsd value. Quantize can only be turned off before any data are written or the variable created in the HDF5 file, just like deflate. So it can be turned on and off as many times as the user pleases, until the first enddef is called. At that point, it is written in stone, and can never be changed. Once the variable is actually created as a dataset in the HDF5 file, attempts to change quantize will result in the appropriately named error code NC_ELATEDEF. So quantize can not be changed on any var that has any data, or turned on after any data have been written. It can only be set at file create time, before anything is written to disk. Quantize is on a variable by variable basis. So the quantization attribute will only be on the variables that have quantization. The attributes are variable attributes, not NC_GLOBAL. Does NCO use a global attribute and apply it to all variables? |
NCO writes the quantization attribute on a variable-by-variable basis. |
re: PR Unidata#2088 Primary changes: * Add NCZarr-specific quantize function to the dispatch table. * Copy quantize code from libhdf5 * Add quantize invocation to zvar.c * Add support for _QuantizeBitgroomNumberOfSignificantDigits to ncgen. * Copy quantize test from nc_test4 to nczarr_tests. Remove some parts that are not relevant to NCZarr. Other Changes: * Break zsync.c into zsync.c (writing) and zload.c (reading). * Clean up the fill value handling (many changes) * Disable atexit() under Windows * Move ncjson to libdispatch * Add documentation of differences between netcdf-4 and NCZarr, especially WRT fill value. * Some mingw fixes * Remove some cruft * Cleanup the handling of scalars
re: PR Unidata#2088 re: PR Unidata#2130 replaces: Unidata#2140 Changes: * Add NCZarr-specific quantize functions to the dispatch table. * Copy (modified) quantize code from libhdf5 to NCZarr * Add quantize invocation to zvar.c * Add support for _QuantizeBitgroomNumberOfSignificantDigits and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen. * Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5 and for nczarr. * Make dap4 properly handle quantize functions in dispatch table. * Add quantize attribute support to ncgen. Other changes: * Caught and fixed some S3 problems * Fixed some nczarr fillvalue problems. * Fixed some nczarr cache problems. * Cleanup some flaws in libdispatch/dinfermodel.c * Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type * Remove the libnczarr ztracedispatch code (big change).
Adding nc_def_var_quantize()/nc_inq_var_quantize() as discussed in recent meeting.
This is a second try, after #2081. In this version I add the inq function directly to the dispatch table, as suggested by @DennisHeimbigner , instead of extending inq_var_all(). This leads to a cleaner implementation, with fewer changes in the other dispatch layers.
This is now complete and fully documented and tested. Comments and questions welcome.
Fixes #1548
Fortran changes can be found in Unidata/netcdf-fortran#304.
I have issue open to determine where this should be documented in the language-neutral docs: Unidata/netcdf#46.