Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Merged
merged 57 commits into from
Oct 1, 2021
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
9a18689
getting ready for next try at quantization code
edwardhartnett Aug 24, 2021
dabe008
further preparation for try 2 at quantizing
edwardhartnett Aug 24, 2021
d475e1f
further preparation for try 2 at quantizing
edwardhartnett Aug 24, 2021
3202b8b
adding quantize functions to all the dispatch tables
edwardhartnett Aug 24, 2021
d6d9825
now qunatizing with inq function in dispatch table
edwardhartnett Aug 24, 2021
74c4b9d
fixed version numbers
edwardhartnett Aug 24, 2021
f3435da
merged configure.ac and CMakeLists.txt with changes from master branch
edwardhartnett Aug 24, 2021
d053418
merged nc4hdf.c with changes from master branch
edwardhartnett Aug 24, 2021
24ed2a4
fixed comment
edwardhartnett Aug 24, 2021
233ddfb
further development
edwardhartnett Aug 24, 2021
148706b
now reading quantize attribute to get settings
edwardhartnett Aug 24, 2021
c9eca4b
moving function
edwardhartnett Aug 24, 2021
0f26083
perparing to apply bitgroom algorithm
edwardhartnett Aug 25, 2021
a02faa0
more testing of qunatize setting
edwardhartnett Aug 25, 2021
b2c0bb9
more quantize testing
edwardhartnett Aug 25, 2021
4ac7fa9
more quantize testing
edwardhartnett Aug 25, 2021
ee788d6
more quantize testing
edwardhartnett Aug 25, 2021
c655488
more quantize testing
edwardhartnett Aug 25, 2021
539578d
more tests for quantization
edwardhartnett Aug 26, 2021
c609a17
more quantize testing
edwardhartnett Aug 26, 2021
0265953
more quantize testing
edwardhartnett Aug 26, 2021
2b3d2c1
changed name of attribute to _quantize_bitgroom_number_of_significant…
edwardhartnett Aug 26, 2021
d29436c
improved doxygen documenation
edwardhartnett Aug 26, 2021
4f96fcc
improved doxygen documenation
edwardhartnett Aug 26, 2021
8f3da3f
change name of att to _QuantizeBitgroomNumberOfSignificantDigits
edwardhartnett Aug 27, 2021
2db4311
Merge branch 'master' into ejh_quantize_2
edwardhartnett Aug 27, 2021
eabbd68
bitgroom working for floats
edwardhartnett Aug 29, 2021
bec26aa
more quantize testing
edwardhartnett Aug 29, 2021
f8b7296
cleanup quantize code
edwardhartnett Aug 29, 2021
5aa429c
whitespace cleanup
edwardhartnett Aug 29, 2021
ed60a16
moving quantize to its own function
edwardhartnett Aug 29, 2021
d3e725b
attempting to fix ncdap test on appvayor
edwardhartnett Aug 30, 2021
229e101
quantize now working for NC_DOUBLE
edwardhartnett Aug 30, 2021
f5e2926
testing of quantize with scalars
edwardhartnett Aug 30, 2021
5d1aa2a
added more documentation, also started on test code for type conversion
edwardhartnett Aug 30, 2021
1e6ad09
type conversion with quantize between float and double
edwardhartnett Aug 30, 2021
d7b4b94
undid suggested change to ncdap_test/CMakeLists.txt
edwardhartnett Aug 30, 2021
bb40936
more testing with type conversion
edwardhartnett Aug 31, 2021
f809aad
testing with fill values
edwardhartnett Aug 31, 2021
e3c8be8
testing with fill values
edwardhartnett Aug 31, 2021
4cd4aff
testing with fill values
edwardhartnett Aug 31, 2021
684f73c
merged master
edwardhartnett Sep 1, 2021
3e056f4
more tests
edwardhartnett Sep 1, 2021
30448b4
merged
edwardhartnett Sep 1, 2021
ae3b083
turned off failing quantize test
edwardhartnett Sep 1, 2021
d2656ba
code clean up
edwardhartnett Sep 1, 2021
e2570c3
refactored quantize code
edwardhartnett Sep 1, 2021
09defc5
more tests for quantize
edwardhartnett Sep 2, 2021
f880a63
added parallel I/O quantize test
edwardhartnett Sep 2, 2021
18aebd9
added parallel I/O quantize test
edwardhartnett Sep 2, 2021
0ce4637
Merge branch 'main' into ejh_quantize_2
edwardhartnett Sep 7, 2021
7943172
improving benchmark program
edwardhartnett Sep 7, 2021
db72457
changed makefile to allow tst_gfs_data_1 to pick up libz from LD_LIBR…
edwardhartnett Sep 8, 2021
9cc39fe
changed makefile to make benchmark bm_file work properly with zlib-ng
edwardhartnett Sep 9, 2021
e8587b5
changed name of tst_gfs_data_1.c to tst_compress_par.c
edwardhartnett Sep 9, 2021
7806ded
tinker with data algorithm for tst_compress_par.c
edwardhartnett Sep 9, 2021
5200477
now nsd of 0 is NC_EINVAL for nc_def_var_quantize()
edwardhartnett Sep 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ SET(PACKAGE_VERSION ${VERSION})

# Version of the dispatch table. This must match the value in
# configure.ac.
SET(NC_DISPATCH_VERSION 3)
SET(NC_DISPATCH_VERSION 4)

# Get system configuration, Use it to determine osname, os release, cpu. These
# will be used when committing to CDash.
Expand Down Expand Up @@ -1412,6 +1412,9 @@ ENDIF()
# Always enable DISKLESS
OPTION(ENABLE_DISKLESS "Enable in-memory files" ON)

# Always enable quantization.
OPTION(ENABLE_QUANTIZE "Enable variable quantization" ON)

# By default, MSVC has a stack size of 1000000.
# Allow a user to override this.
IF(MSVC)
Expand Down Expand Up @@ -2188,6 +2191,7 @@ is_enabled(ENABLE_NCZARR HAS_NCZARR)
is_enabled(ENABLE_NCZARR_S3_TESTS DO_NCZARR_S3_TESTS)
is_enabled(ENABLE_MULTIFILTERS HAS_MULTIFILTERS)
is_enabled(ENABLE_NCZARR_ZIP DO_NCZARR_ZIP_TESTS)
is_enabled(ENABLE_QUANTIZE HAS_QUANTIZE)

# Generate file from template.
CONFIGURE_FILE("${CMAKE_CURRENT_SOURCE_DIR}/libnetcdf.settings.in"
Expand Down
3 changes: 2 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -1649,6 +1649,7 @@ AC_SUBST(HAS_NCZARR,[$enable_nczarr])
AC_SUBST(DO_NCZARR_S3_TESTS,[$enable_nczarr_s3_tests])
AC_SUBST(HAS_MULTIFILTERS,[$has_multifilters])
AC_SUBST(DO_NCZARR_ZIP_TESTS,[$enable_nczarr_zip])
AC_SUBST([HAS_QUANTIZE],[yes])

# Include some specifics for netcdf on windows.
#AH_VERBATIM([_WIN32_STRICMP],
Expand Down Expand Up @@ -1728,7 +1729,7 @@ AX_SET_META([NC_HAS_MULTIFILTERS],[$has_multifilters],[yes])
# dispatch table to submit. If this is changed, make sure the value in
# CMakeLists.txt also changes to match.

AC_SUBST([NC_DISPATCH_VERSION], [3])
AC_SUBST([NC_DISPATCH_VERSION], [4])
AC_DEFINE_UNQUOTED([NC_DISPATCH_VERSION], [${NC_DISPATCH_VERSION}], [Dispatch table version.])

#####
Expand Down
4 changes: 4 additions & 0 deletions include/hdf5internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,10 @@ int NC4_hdf5_addfilter(NC_VAR_INFO_T* var, unsigned int id, size_t nparams, cons
int NC4_hdf5_filter_freelist(NC_VAR_INFO_T* var);
int NC4_hdf5_find_missing_filter(NC_VAR_INFO_T* var, unsigned int* idp);

/* Add an attribute to the attribute list. */
int nc4_put_att(NC_GRP_INFO_T* grp, int varid, const char *name, nc_type file_type,
size_t len, const void *data, nc_type mem_type, int force);

/* Support functions for provenance info (defined in nc4hdf.c) */
extern int NC4_hdf5get_libversion(unsigned*,unsigned*,unsigned*);/*libsrc4/nc4hdf.c*/
extern int NC4_hdf5get_superblock(struct NC_FILE_INFO*, int*);/*libsrc4/nc4hdf.c*/
Expand Down
6 changes: 6 additions & 0 deletions include/nc4dispatch.h
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,12 @@ extern "C" {
EXTERNL int
NC4_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparams, unsigned int* params);

EXTERNL int
NC4_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);

EXTERNL int
NC4_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);

#if defined(__cplusplus)
}
#endif
Expand Down
7 changes: 5 additions & 2 deletions include/nc4internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ typedef struct NC_VAR_INFO
size_t chunk_cache_size; /**< Size in bytes of the var chunk chache. */
size_t chunk_cache_nelems; /**< Number of slots in var chunk cache. */
float chunk_cache_preemption; /**< Chunk cache preemtion policy. */
int quantize_mode; /**< Quantize mode. NC_NOQUANTIZE is 0, and means no quantization. */
int nsd; /**< Number of significant digits if quantization is used, 0 if not. */
void *format_var_info; /**< Pointer to any binary format info. */
void* filters; /**< Record of the list of filters to be applied to var data; format dependent */
} NC_VAR_INFO_T;
Expand Down Expand Up @@ -341,8 +343,9 @@ extern int NC4_lookup_atomic_type(const char *name, nc_type* idp, size_t *sizep)
/* These functions convert between netcdf and HDF5 types. */
extern int nc4_get_typelen_mem(NC_FILE_INFO_T *h5, nc_type xtype, size_t *len);
extern int nc4_convert_type(const void *src, void *dest, const nc_type src_type,
const nc_type dest_type, const size_t len, int *range_error,
const void *fill_value, int strict_nc3);
const nc_type dest_type, const size_t len, int *range_error,
const void *fill_value, int strict_nc3, int quantize_mode,
int nsd);

/* These functions do HDF5 things. */
extern int nc4_reopen_dataset(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var);
Expand Down
25 changes: 25 additions & 0 deletions include/netcdf.h
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,21 @@ there. */
#define NC_MIN_DEFLATE_LEVEL 0 /**< Minimum deflate level. */
#define NC_MAX_DEFLATE_LEVEL 9 /**< Maximum deflate level. */

#define NC_NOQUANTIZE 0 /**< No quantization in use. */
#define NC_QUANTIZE_BITGROOM 1 /**< Use bitgroom quantization. */

/** When quantization is used for a variable, an attribute of this
* name is added. */
#define NC_QUANTIZE_ATT_NAME "number_of_significant_digits"

/** For quantization, the allowed value of number of significant
* digits for float. */
#define NC_QUANTIZE_MAX_FLOAT_NSD (7)

/** For quantization, the allowed value of number of significant
* digits for double. */
#define NC_QUANTIZE_MAX_DOUBLE_NSD (15)

/** The netcdf version 3 functions all return integer error status.
* These are the possible values, in addition to certain values from
* the system errno.h.
Expand Down Expand Up @@ -854,6 +869,16 @@ nc_get_varm(int ncid, int varid, const size_t *startp,

/* Extra netcdf-4 stuff. */

/* Set quantization settings for a variable. Quantizing data improves
* later compression. Must be called after nc_def_var and before
* nc_enddef. */
EXTERNL int
nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);

/* Find out quantization settings of a var. */
EXTERNL int
nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);

/* Set compression settings for a variable. Lower is faster, higher is
* better. Must be called after nc_def_var and before nc_enddef. */
EXTERNL int
Expand Down
7 changes: 6 additions & 1 deletion include/netcdf_dispatch.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ struct NC_Dispatch
/* Version 3 Replace filteractions with more specific functions */
int (*inq_var_filter_ids)(int ncid, int varid, size_t* nfilters, unsigned int* filterids);
int (*inq_var_filter_info)(int ncid, int varid, unsigned int id, size_t* nparams, unsigned int* params);
/* Version 4 Add quantization. */
int (*def_var_quantize)(int ncid, int varid, int quantize_mode, int nsd);
int (*inq_var_quantize)(int ncid, int varid, int *quantize_modep, int *nsdp);
};

#if defined(__cplusplus)
Expand Down Expand Up @@ -223,7 +226,9 @@ extern "C" {
EXTERNL int NC_NOTNC4_inq_typeids(int, int *, int *);
EXTERNL int NC_NOTNC4_inq_user_type(int, nc_type, char *, size_t *,
nc_type *, size_t *, int *);

EXTERNL int NC_NOTNC4_def_var_quantize(int, int, int, int);
EXTERNL int NC_NOTNC4_inq_var_quantize(int, int, int *, int *);

/* These functions are for dispatch layers that don't implement
* the enhanced model, but want to succeed anyway.
* They return NC_NOERR plus properly set the out parameters.
Expand Down
1 change: 1 addition & 0 deletions include/netcdf_meta.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,6 @@
#define NC_HAS_PAR_FILTERS @NC_HAS_PAR_FILTERS@ /* Parallel I/O with filter support. */
#define NC_HAS_NCZARR @NC_HAS_NCZARR@
#define NC_HAS_MULTIFILTERS @NC_HAS_MULTIFILTERS@
#define NC_HAS_QUANTIZE @NC_HAS_QUANTIZE@

#endif
3 changes: 3 additions & 0 deletions libdap2/ncd2dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,9 @@ NCD2_get_var_chunk_cache,
NC_NOOP_inq_var_filter_ids,
NC_NOOP_inq_var_filter_info,

NC_NOTNC4_def_var_quantize,
NC_NOTNC4_inq_var_quantize,

};

const NC_Dispatch* NCD2_dispatch_table = NULL; /* moved here from ddispatch.c */
Expand Down
3 changes: 3 additions & 0 deletions libdap4/ncd4dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -974,4 +974,7 @@ NCD4_get_var_chunk_cache,

NC_NOTNC4_inq_var_filter_ids,
NC_NOTNC4_inq_var_filter_info,

NC_NOTNC4_def_var_quantize,
NC_NOTNC4_inq_var_quantize,
};
35 changes: 35 additions & 0 deletions libdispatch/dnotnc4.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,41 @@
#include "ncdispatch.h"
#include "nc4internal.h"

/**
* @internal Not implemented in some dispatch tables
*
* @param ncid Ignored.
* @param varid Ignored.
* @param quantize_mode Ignored.
* @param nsd Ignored.
*
* @return ::NC_ENOTNC4 Not implemented for a dispatch table
* @author Ed Hartnett
*/
int
NC_NOTNC4_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd)
{
return NC_ENOTNC4;
}

/**
* @internal Not implemented in some dispatch tables
*
* @param ncid Ignored.
* @param varid Ignored.
* @param quantize_modep Ignored.
* @param nsdp Ignored.
*
* @return ::NC_ENOTNC4 Not implemented for a dispatch table
* @author Ed Hartnett
*/
int
NC_NOTNC4_inq_var_quantize(int ncid, int varid, int *quantize_modep,
int *nsdp)
{
return NC_ENOTNC4;
}

/**
* @internal Not implemented in some dispatch tables
*
Expand Down
63 changes: 63 additions & 0 deletions libdispatch/dvar.c
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,69 @@ nc_def_var_deflate(int ncid, int varid, int shuffle, int deflate, int deflate_le
return ncp->dispatch->def_var_deflate(ncid,varid,shuffle,deflate,deflate_level);
}

/**
Turn on quantization for a variable.

The data data are quantized by setting unneeded bits alternately to
1/0, so that they may compress well. Quantization is lossy (data
are irretrievably altered), and it improves the compression ratio
provided by a subsequent lossless compression filter. Quantization
alone will not reduce the size of the data - lossless compression
like zlib must also be used (see nc_def_var_deflate()).

This data quantization used the bitgroom algorithm. A notable
feature of BitGroom is that the data it processes remain in IEEE754
format after quantization. Therefore the BitGroom algorithm does
nothing when data are read.

Quantization is only available for variables of type NC_FLOAT or
NC_DOUBLE. Attempts to set quantization for other variable
types return an error (NC_EINVAL).

Quantization is not applied to values equal to the value of the
_FillValue attribute, if any.

For more information about quantization and the bitgroom filter, see

Zender, C. S. (2016), Bit Grooming: Statistically accurate
precision-preserving quantization with compression, evaluated in
the netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9,
3199-3211, doi:10.5194/gmd-9-3199-2016 Retrieved on Sep 21, 2020
from
https://www.researchgate.net/publication/301575383_Bit_Grooming_Statistically_accurate_precision-preserving_quantization_with_compression_evaluated_in_the_netCDF_Operators_NCO_v448.

@param ncid File ID.
@param varid Variable ID. NC_GLOBAL is not a valid varid, and may
not be used.
@param quantize_mode A integer flag specifying the quantization
used. Current NC_QUANTIZE_BITGROOM is the only available setting.
@param nsd Number of significant digits to retain. Allowed single- and
double-precision NSDs are 1-7 and 1-15, respectively.

@return ::NC_NOERR No error.
@return ::NC_EGLOBAL Can't use ::NC_GLOBAL with this function.
@return ::NC_EBADID Bad ncid.
@return ::NC_ENOTVAR Invalid variable ID.
@return ::NC_ENOTNC4 Attempting netcdf-4 operation on file that is
not netCDF-4/HDF5.
@return ::NC_ESTRICTNC3 Attempting netcdf-4 operation on strict nc3
netcdf-4 file.
@return ::NC_ELATEDEF Too late to change settings for this variable.
@return ::NC_EINVAL Invalid input
@author Charlie Zender, Ed Hartnett
*/
int
nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd)
{
NC* ncp;
int stat = NC_check_id(ncid,&ncp);
if(stat != NC_NOERR) return stat;

/* Using NC_GLOBAL is illegal. */
if (varid == NC_GLOBAL) return NC_EGLOBAL;
return ncp->dispatch->def_var_quantize(ncid,varid,quantize_mode,nsd);
}

/**
Set checksum for a var.

Expand Down
30 changes: 30 additions & 0 deletions libdispatch/dvarinq.c
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,36 @@ nc_inq_var_fill(int ncid, int varid, int *no_fill, void *fill_valuep)
);
}

/** @ingroup variables
* Learn whether BitGroom quantization is on for a variable, and, if so,
* the NSD setting.
*
* @param ncid File ID.
* @param varid Variable ID. Must not be NC_GLOBAL.
* @param quantize_modep Pointer that gets a 0 if BitGroom is not in
* use for this var, and a 1 if it is. Ignored if NULL.
* @param nsdp Pointer that gets the NSD setting (from 1 to 15), if
* BitGroom is in use. Ignored if NULL.
*
* @return 0 for success, error code otherwise.
* @author Charlie Zender, Ed Hartnett
*/
int
nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp)
{
NC* ncp;
int stat = NC_check_id(ncid,&ncp);

if(stat != NC_NOERR) return stat;
TRACE(nc_inq_var_quantize);

/* Using NC_GLOBAL is illegal. */
if (varid == NC_GLOBAL) return NC_EGLOBAL;

return ncp->dispatch->inq_var_quantize(ncid, varid,
quantize_modep, nsdp);
}

/** \ingroup variables
Find the endianness of a variable.

Expand Down
2 changes: 1 addition & 1 deletion libhdf4/hdf4var.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ NC_HDF4_get_vara(int ncid, int varid, const size_t *startp,
if (var->type_info->hdr.id != memtype)
{
if ((retval = nc4_convert_type(data, ip, var->type_info->hdr.id, memtype, nelem,
&range_error, NULL, 0)))
&range_error, NULL, 0, NC_NOQUANTIZE, 0)))
return retval;
free(data);
if (range_error)
Expand Down
3 changes: 2 additions & 1 deletion libhdf5/hdf5attr.c
Original file line number Diff line number Diff line change
Expand Up @@ -716,7 +716,8 @@ nc4_put_att(NC_GRP_INFO_T* grp, int varid, const char *name, nc_type file_type,
/* Data types are like religions, in that one can convert. */
if ((retval = nc4_convert_type(data, att->data, mem_type, file_type,
len, &range_error, NULL,
(h5->cmode & NC_CLASSIC_MODEL))))
(h5->cmode & NC_CLASSIC_MODEL),
NC_NOQUANTIZE, 0)))
BAIL(retval);
}
}
Expand Down
4 changes: 4 additions & 0 deletions libhdf5/hdf5dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ static const NC_Dispatch HDF5_dispatcher = {

NC4_hdf5_inq_var_filter_ids,
NC4_hdf5_inq_var_filter_info,

NC4_def_var_quantize,
NC4_inq_var_quantize,

};

const NC_Dispatch* HDF5_dispatch_table = NULL; /* moved here from ddispatch.c */
Expand Down
Loading