Provide a way to do bit grooming before compression #1548

edwardhartnett · 2019-11-21T20:51:16Z

In the NOAA GFS data files they are doing something neat to get better compression. They are zeroing out a bunch of bits they don't care about in the float.

For this to work, all we need to know is the number of bits. So we could have:

nc_def_var_smoothing(ncid, varid, nbits);

to turn this on.

For some code from Jeff Whitaker see #1543. (They call this "quantizing" and maybe that's a better name. I'm not sure if "smoothing" is correct.)

This is part of #1545

The text was updated successfully, but these errors were encountered:

czender · 2019-12-02T18:58:08Z

I support adding lossy compression to netCDF. However, bit smoothing is sub-optimal because it always rounds down. Bit Grooming alternately rounds up (to 1s) and down (to 0s) which reduces the statistical biases of rounding by orders of magnitude. See the comparisons in:

Zender, C. S. (2016), Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211, doi:10.5194/gmd-9-3199-2016.

edwardhartnett · 2019-12-03T15:58:49Z

@czender excellent point! And great reference.

I propose to add this to netcdf-c after the agu. I am thinking of the following functions:

nc_def_var_quantize(ncid, varid, nbits)
nc_inq_var_quantize(ncid, varid, &nbits_in)

These would only apply to NC_DOUBLE and NC_FLOAT, calling with any other type will return an error.

nc_def_var_quantize() will add an attribute _netcdf_nbits=12 for example.

Some things to note:

The savings in file size can be enormous.
NOAA GFS is already doing this; other model teams also are or will be. This will be popular.
This is fully backward compatible (as not all our new compression features will be).

czender · 2019-12-03T16:09:14Z

You are welcome to use the NCO code lines 576--860 (not all are necessary) of https://github.com/nco/nco/tree/master/src/nco/nco_ppc.c

edwardhartnett · 2019-12-03T16:11:53Z

@czender I was already planning to steal your code. ;-)

(With full attribution of course. We're classy that way on the netCDF project. We steal your code, but we do say "thank you"). ;-)

WardF · 2019-12-03T16:22:58Z

Tagging in @lesserwhirls and @DennisHeimbigner to this discussion, so that we can consider the potential impact on NetCDF java as well. Adding the FORTRAN and C++ interfaces shouldn’t be a problem.

edwardhartnett · 2019-12-03T16:28:44Z

This will have no impact on Java, because the files can be read in the normal way.

If netCDF-Java would also like to add this feature, that is of course possible. I would not hold back the rest of the netCDF community while waiting for Java.

I will add these calls to the Fortran APIs as well; just forgot to mention it, but of course we want both the F77 and F90 APIs to fully reflect the C API.

edwardhartnett · 2019-12-03T16:47:24Z

Also would welcome input from @jswhit and @gsjaardema on this issue...

lesserwhirls · 2019-12-03T16:58:43Z

I assume we’re talking about something like this?

https://www.unidata.ucar.edu/blogs/developer/en/entry/compression_by_bit_shaving

We don’t have a way of doing this (writing) via the netCDF-Java API currently, but would be easy to add (have code in our tests). Could also tie in to whatever the C library ends up doing since we use the C library for writing. Reading won’t be an issue.

One question though - what would a variable look like? How would nbits be encoded? Would it be exposed as an attribute (so a need to update the NUG)?

No technical concerns from a netCDF-Java point of view, but thank you @WardF for even asking the question. Surprises are never fun :-)

edwardhartnett · 2019-12-03T17:29:38Z

@lesserwhirls this is part of a larger discussion about netcdf-c and compression. See #1545

I was going to ask if netcdf-java is now using the C library. You say for writing only? Why is that? Why not read as well as write?

WardF · 2019-12-03T18:47:53Z

@lesserwhirls no problem, I don't want to (potentially) add to your roadmap/workload without at least waving in your direction first ;).

No particular hurry on this; when lossy compression has come up in meetings or at conferences, the scientists have expressed their... hesitancy to truncate any of their data, even when it objectively contains no information. The lack of people asking for it makes it a relatively low priority internally, but adding support through a PR contribution is effort I can justify ;). Anyways, back to prepping for AGU. Thanks all!

lesserwhirls · 2019-12-03T19:57:28Z

@lesserwhirls this is part of a larger discussion about netcdf-c and compression. See #1545

I was going to ask if netcdf-java is now using the C library. You say for writing only? Why is that? Why not read as well as write?

Note: this was all done before my time, but here is my take.

Writing because, I think, the HDF documentation was too unclear regarding how writing is to be done (for example, chunking, VLEN, I think?) to support all of the features needed to write netCDF-4. We'd need to rope in John Caron for the details.

In terms of reading, we do support reading through the C library, but it's not threads safe (we have to thread lock on the java side to ensure we're ok there). That's a pretty big bummer. We've enabled reading through C on the server side (TDS), and performance of the server was...well, we don't do that, and never recommend for users it outside of super rare debugging cases (once). John wrote native HDF5 read support, so he utilized that to support reading netCDF-4. We do run into some issues in making sure the Java library and the C library see things in a similar way (Enums, for example), but that's helpful for everyone as it exposes areas in need of clarification in the netCDF data model documentation (not always, but sometimes).

Also (my opinion), it's really hard to recommend an on disk data format for archive purposes that relies on a single implementation for reading. That's one of the reasons why I loved the netCDF-3 on disk formats before my time at Undiata.

edwardhartnett · 2019-12-03T20:00:59Z

@lesserwhirls thanks for the clarifications. Thread-safety is something Denis has been working on I believe.

The classic formats are all still alive and well, for those who don't want to depend on HDF5. That said, HDF5 is an official NASA format that is guaranteed support based on the large amount of very expensive Earth science data NASA owns in HDF5. ;-)

lesserwhirls · 2019-12-03T20:51:03Z

I don't doubt the support one bit, and my comments should not be taken as a ding (just in case it came across that way), but knowing that a separate read implementation can be done is certainly a plus :-)

DennisHeimbigner · 2019-12-04T17:05:44Z

I think this functionality could handled with the current library
since quant can be packaged as jusy another filter.

edwardhartnett · 2019-12-04T17:10:32Z

I agree with Elena, although a general purpose filter handling ability is great, it is too complex for most users. As with HDF5, I believe the most common compression methods should be specified in the API, including this one.

czender · 2019-12-04T17:41:53Z

As implemented in NCO, Bit Grooming exposes to the user the number of significant digits, then uses an internal table to convert this to minimum number of bits to quantize while preserving the requested number of significant digits. Bits are finer-grained and thus more flexible, yet less familiar to scientists than significant digits. The underlying bit-masking implementation is the same. NCO chose to expose digits rather than bits because there is something satisfying about knowing how many significant digits the quantization is guaranteed to preserve. netCDF is lower-level than NCO so perhaps bits are the better quantity to expose throught the API. In any case, I hope netCDF makes an intentional decision that considers this aspect of usability before deciding on the API. This is just another example of how user-friendliness and flexibility often lead to different outcomes.

lesserwhirls · 2019-12-04T19:41:04Z

As implemented in NCO, Bit Grooming exposes to the user the number of significant digits, then uses an internal table to convert this to minimum number of bits to quantize while preserving the requested number of significant digits. Bits are finer-grained and thus more flexible, yet less familiar to scientists than significant digits. The underlying bit-masking implementation is the same. NCO chose to expose digits rather than bits because there is something satisfying about knowing how many significant digits the quantization is guaranteed to preserve. netCDF is lower-level than NCO so perhaps bits are the better quantity to expose throught the API. In any case, I hope netCDF makes an intentional decision that considers this aspect of usability before deciding on the API. This is just another example of how user-friendliness and flexibility often lead to different outcomes.

100% agree. I think setting the significant digits is the right call, although if the API could support both, that'd be even better.

@czender - does NCO add any metadata to variables that use Bit Grooming to indicate that the process has been done?

czender · 2019-12-05T03:31:35Z

Yes, see http://nco.sf.net/nco.html#number_of_significant_digits and/or the Bit Grooming paper referenced above.

gsjaardema · 2019-12-05T16:57:21Z

@edwardhartnett Sorry for not responding earlier, but I think this would be a great capability.

edwardhartnett · 2019-12-05T17:09:02Z

Setting significant digits is an interesting take on this. I had not considered it but am certainly not opposed. I agree that for users, significant digits is much more understandable than significant bits. I will study Charlie's paper and then get back with a proposed implementation, after the AGU next week.

lesserwhirls · 2019-12-06T19:58:58Z

Yes, see http://nco.sf.net/nco.html#number_of_significant_digits and/or the Bit Grooming paper referenced above.

Excellent - thank you @czender! number_of_significant_digits looks good to me. Maybe also an attribute to for retained bits as well (perhaps as a less visible "special attribute")? At any rate, would want to make sure this gets added to the NUG.

edwardhartnett · 2019-12-06T20:35:57Z

I don't think we want to make this a special attribute (i.e. one that is hidden from the user by the netCDF API).

There's a problem adding to the list of special attributes: earlier versions of netCDF will not know to hide that attribute. So there will be a different number of attributes in the file, depending on library version used to read it.

I believe we can simply add a new attribute, visible to users, but start with an underscore. Since users only get this attribute when they explicitly call nc_dev_var_groooming(), everyone will know what to expect and all versions of the library will be able to read and understand the file (even versions that don't have the nc_def_var_grooming() function!)

Absolutely this would be documented in the NUG. The CF Conventions gang might also be interested in the definition of this attribute.

edhartnett · 2019-12-10T16:04:34Z

At the AGU I have been chatting with people about this and there is a great deal of interest.

Stephan Siemen, Head of Development for ECMWF, says that compression, including lossy compression, is of great interest for ECMWF data sets, both model output and data from the Copernicus Sentinel satellites.

So we have strong interest from NOAA for their main operational forecasting system (GFS), from NCAR's biggest climate modeling group (CESM), and from the ECMWF.

czender · 2019-12-10T18:16:05Z

That's great to hear. I'll be at the EarthCube booth (#510) from 1-2 today (Tuesday) if you would like to chat in person.

WardF · 2019-12-10T19:24:00Z

I'll drop by as well, I'd be interested in this conversation. Hope the conference is going well for everybody. @czender I tried to stop by your poster earlier but, predictably, it was pretty well mobbed.

edwardhartnett · 2019-12-17T22:30:24Z

It was good to discuss this with @czender and @WardF at the AGU! ;-)

I have studied Charlie's paper - it is excellent and does a great job answering all the detailed questions and comparing the different approaches. I agree with Charlie that bit grooming can be a very valuable lossy compression technique, and the fact that NOAA is already using a similar algorithm (and absolutely depends on it to get the required compression) is a strong indication that this is a good feature.

I have a paper in which I am outlining all the proposals for new compression in the C/Fortran APIs. Anyone interested in reviewing or collaboration should shoot me an email.

edwardhartnett · 2019-12-17T22:32:55Z

@czender quick question: you support specifying both NSD (number of significant digits) or DSD (decimal significant digits).

Why both? It would seem that NSD would be sufficient. Am I missing something?

czender · 2019-12-17T22:52:49Z

NSD is better suited for science. DSD is more familiar to some laypeople who think of significance as how many digits before or after the decimal point. The section of the paper that discusses 800 pound gorillas attempts to convey the possible DSD use-cases. I personally would always use NSD and never use DSD, but I wrote the software for generic purposes that others may find useful.

edwardhartnett · 2021-06-26T04:01:55Z

Bit-grooming is the only one I know of.

All the other filters are compression/decompression, so needed for both reading and writing.

DennisHeimbigner · 2021-06-26T04:03:38Z

At this point we need an implementation of this
as an HDF5 custom filter. That way I can incorporate it into both libhdf5
and libnczarr.

edwardhartnett · 2021-06-26T04:09:07Z

@DennisHeimbigner we have a working bit-groom filter here: https://github.com/ccr/ccr/blob/master/hdf5_plugins/BITGROOM/src/H5Zbitgroom.c

I can do the netcdf/HDF5 implementation.

We need to consult closely with @czender - he has some upgrades in mind for the filter, we should take on-board.

If this is implemented in netcdf-c, then we would remove it from CCR.

DennisHeimbigner · 2021-06-26T04:13:37Z

I would prefer to add this as a built-in filter so that we do not need to extend the API.
Also, that would make it easier to use in ncgen and nczarr.

edwardhartnett · 2021-06-26T04:16:07Z

We should still extend the API to make this easy for users to use...

DennisHeimbigner · 2021-06-26T04:32:26Z

I leave that question for discussion: new API or use existing def_var_filter?

edwardhartnett · 2021-06-26T05:41:56Z

I think filters are not the answer.

If we implement this as a filter, we still have all the compatibility problems: such data cannot be read by older versions of netcdf-c, cannot be read by netcdf-java, and cannot be read by HDF5 (without installing the filter, even though it is not doing a thing for readers).

I was thinking this should be implemented in that giant function that does type conversion of data. As the data are converted (or even if the data are not converted) the bit grooming algorithm can be applied.

In this way we get netcdf-java for free, as well as all older versions of netcdf-c and native HDF5. I suspect @lesserwhirls and @dopplershift would prefer it if we did not break netcdf-java reads with this feature, and there is no reason we need to.

If we just forget about filters in this case, it becomes a lot simpler for everyone...

DennisHeimbigner · 2021-06-26T19:10:01Z

If you do it in the write functions (which are not chunk aware) then you need
to make sure that the algorithm is idempotent because it could end up being
applied to the same data multiple times. Do we know it that true for this algorithm?
Also, when you say "jave" do you mean the pure java read-only implementation
or the mixed implementation using java+netcdf-c?

dopplershift · 2021-06-26T20:33:21Z

I believe @edhartnett means if it's implemented as a library feature, the pure Java implementation gets to read these files with no code modifications.

If it's left as a filter, netCDF-Java would have to at least recognize and ignore the filter to read the data.

edwardhartnett · 2021-06-26T21:06:50Z

@DennisHeimbigner how can it be applied to the same data multiple times? (Unless the user re-writes the data?) In the case of HDF5 it would be applied before the data are chunked, and the chunking would not affect the results.

@dopplershift is correct - if we make it part of the library, not just netcdf-java, then all existing versions of netcdf-c will be able to read the data for free. When I say netcdf-java I mean the native Java read-only implementation, which is very important. (However, it would also be readable by all existing java+netcdf-c implementations, without upgrading netcdf-c.)

@czender is of course the expert in this code, which is his. @czender would you like to comment on the idempotence of the bitgroom algorithm? I believe that it is idempotent, as long as the same starting point is chosen each time.

DennisHeimbigner · 2021-06-26T23:06:44Z

how can it be applied to the same data multiple times? (Unless the user re-writes the data?) In the case of HDF5 it would be applied before the data are chunked, and the chunking would not affect the results.

You have to take all possibilities into acct. So you have to assume some user will write the same data more than once. Possibly in two different programs. Otherwise you have some
kind of write once situation of which the user needs to be aware.

DennisHeimbigner · 2021-06-26T23:09:40Z

At this point, I am mostly convinced by Ed's arguments that we should build this into the library as a non-filter.
It would need to be added it to nczarr as well.
How about netcdf-3?

edwardhartnett · 2021-06-27T06:56:35Z

I believe it should also be applied to classic formats. This will benefit users in 2 ways:
1 - files can be compressed as a post processing step before storage
2 - data can be stored with correct precision.

Also it will be easier to document and explain if it applies to all binary formats.

DennisHeimbigner · 2021-06-27T22:30:26Z

Some issues that need to be resolved:

What types? Float and double of course.
What about integer types (for simulated fixed point)?
From H5ZBitgroom, there are 5 parameters:

no. of significant digits to keep -- default is 3
typesize (in bytes)
missing value flag -- 0 => do not use missing value, !0 => use
missing value part 1 (what byte order?)
missing value part 2

We can get the type from the variable's type.
Can we get the missing value from a "missing_value" attribute?
Or is it the same as _FillValue?
Do we want to record the use of quantize + parameters in the file
using some special attribute like "_Quantized"
Idempotent? As mentioned we have to take into account multiple writes.
- Related question: what if the two writes use different parameters?
What name? NumCodecs uses Quantize, Zender uses Bitgroom.
I prefer quantize as more descriptive.

edwardhartnett · 2021-06-28T04:49:10Z

All good questions. Some answers I know now, some remain to be worked out.

Only float and double types can use grooming. For all other types trying to turn on grooming returns error EBADTYPE (or EINVAL?)

As with deflate or other var settings, the settings for grooming cannot be changed once enddef has been called. So the settings are final for the var before any writes are done, and can not be changed, so writes cannot have different parameters.

We do want to record parameters in a non-hidden attribute that is added whenever grooming settings are turned on.

What is in a name? I don't mind what we call it. But quantization has typically referred to bit-shaving (i.e. always using 0s). Bit grooming is when we alternate 0s and 1s.

Questions to consult with @czender on when he returns later this week:

idempotency
fill value

czender · 2021-07-02T20:09:41Z

Howdy all, and sorry for the delayed response due to my off-the-grid vacation. I have just finished reading the last 10 days of this thread which occurred while I was away. I'll try to answer and ask as many questions as I think are still relevant to the discussion.

First, let me clarify that BitGrooming is a pre-processor for compression, and is not itself a compressor. BitGroomed data remain in IEEE format on-disk, hence there is no need for special software for reading. BitGrooming (BG) is a form of quantization, as are BitShaving and BitSetting, and a few other algorithms (DigitRounding, ...) not mentioned here. BG simply eliminates (by alternately shaving and setting) the least significant bits of floating-point mantissas, until the resulting mantissa preserves no more than the user-requested Number of Significant Digits (NSD). An easy to implement alternative would be to no more than the user-requested Number of Significant Bits (NSB). I think most scientists would prefer NSD, since that is the language of instrumental/numerical precision, though computer scientists might prefer NSB.

BG enhances the compression ratio (CR) of subsequent lossless compression algorithms (that for our purposes require netCDF4 format to invoke) by eliminating the randomness beyond the least significant digit. I hope this makes clear that netCDF3 format can contain BG'd data, though its CR is identically 1, so BG'd netCDF3 datasets do not change size. BG'd netCDF4 datasets that are subsequently compressed will be smaller (CR(BG) > CR(no-BG), where CR is input size/output size) than the same datasets were they compressed without first using the BG algorithm.

Idempotency:
My understanding is that BG is idempotent. The output of BG depends solely on the specified NSD. If previously BG'd data are re-BG'd they will not change unless NSD2 (the newer value of NSD) is smaller than NSD1 (the previous value), in which case more bits/digits will be quantized. BG is irreversible since precision can only be eliminated, not created, so attempting NSD2 > NSD1 should probably return an error to avoid confusion, and attempting NSD2 >= NSD1 should be OK, IMHO. The NCO BG implementation includes user-visible variable-metadata to indicate the NSD of BG'd variables. The CCR implementation does not. FWIW, both implementations treat requests to BG integer variables as no-ops (not errors).

Fill Value:
The NCO and CCR implementations of BG sometimes refer to _FillValue as "missing value" in descriptions/documentation. They are synonymous and their usage depends on my own idiosyncratic preferences for distinguishing semantics from syntax.

Please ping me via this thread for any additional clarifications. I am pinging @jtolento, a UCI grad student who has been working to update the BG algorithm to be more efficient.

edwardhartnett · 2021-07-02T20:41:05Z

@czender I think we all understand that bit-grooming (and it's cousins) does not, in itself, compress the data.

For example, the advantage of applying bit-grooming to a netCDF classic file would be that such a file would compress better. That is, we would write such a file, and then gzip the file. That file would be smaller than if we write the file without bit-grooming, and then compress it.

The question I have with fill value is this: what if the user specifies a fill value with more precision than the bit-groomed data? For example, if I have a fill value of 9.99999 and then specify bit-grooming with NSD=3? Then, the values that were 9.99999 will become 9.99 or 10.0, correct? So neither result will be equal to the fill value.

Does that mean that when bit grooming is applied, we must also apply it to the fill value? How do we handle the fact that alternate values of 9.99999 will be rounded different ways by the bit-groom? Do we have two different fill values? Or do we check for fill value as we bit groom and handle it specially (I think this will be the answer).

In terms of changing the NSD, that will not be allowed. Once enddef is called, the NSD is final. Attempts to change it will return NC_ELATEDEF - Attempt to define var properties, like deflate, after enddef.

czender · 2021-07-02T21:25:55Z

@edwardhartnett We are on the same page with respect to BG being a pre-filter.

Regarding fill values: The BG algorithm as implemented in both NCO and CCR does not touch any data equal to the _FillValue attribute, if any. Before the (irreversible) BG procedure is applied to any value, that value is compared to the _FillValue, if any. BG only quantizes valid data, not data equal to _FillValue. That is why the _FillValue argument is input to and not changed by the BG algorithm. Of course the comparison takes time, so it is only performed when has_mss_val is true. A wrinkle to note is that the current implementations only respect explicitly defined _FillValue attributes. Values equal to the default fill values (NC_FILL_DOUBLE, NC_FILL_FLOAT) will be BG'd unless the _FillValue attribute is explicitly defined to be one of those values. I think this behavior moots the rest of your FV questions. If not, let me know.

Regarding NSD: I have no problem with making it immutable in libnetcdf. Just pointing out the consequences of more and less strict behaviors.

edwardhartnett · 2021-07-03T16:09:05Z

OK, but we also need to apply the fill value rule to default fill values. Or figure something else out - perhaps require that a fill value be set before allowing bit-grooming to occur? Or just treat default fill values the same way - that is, check for them, and don't apply the BG algorithm for fill values.

DennisHeimbigner · 2021-07-03T19:01:28Z

If _FillValue has not been set when bitgrooming is defined, we could automatically
set the _FillValue to the default fill value. Assuming that fill has been set for the variable.

czender · 2021-07-03T22:22:43Z

It's easy enough to change the implementation to treat all floating pointing variables as having a _FillValue, either explicitly or implicitly defined. If _FillValue is explicitly defined then there would be two comparisons prior to BG'ing a value, first to the explicit _FillValue attribute, and second to the default fill value for that type. If _FillValue is implicitly defined to be the default value for that type, then there is only one comparison. I have never been sure whether this is the intended way to use the default _FillValue, though you are the experts so now it's clearer to me.

edwardhartnett · 2021-08-30T03:59:44Z

WRT fill values, the CCR code and the code in my PR both ignore data that matches the fill value, when doing quantization. This works with the defined fill value, if the user has defined one, or the default fill value, if the user has not defined one.

On to another question: should quantize work with scalars? I believe the answer is yes...

edwardhartnett · 2021-10-02T12:03:59Z

Thanks @WardF ! I'm so happy to see this great new feature make it's way into the hands of users! ;-)

@DennisHeimbigner

Release Notes {#RELEASE_NOTES} ============= \brief Release notes file for the netcdf-c package. This file contains a high-level description of this package's evolution. Releases are in reverse chronological order (most recent first). Note that, as of netcdf 4.2, the `netcdf-c++` and `netcdf-fortran` libraries have been separated into their own libraries. ## 4.9.3 - TBD ## 4.9.2 - March 14, 2023 This is the maintenance release which adds support for HDF5 version 1.14.0, in addition to a handful of other changes and bugfixes. * Fix 'make distcheck' error in run_interop.sh. See [Github #????](https://github.com/Unidata/netcdf-c/pull/????). * Update `nc-config` to remove inclusion from automatically-detected `nf-config` and `ncxx-config` files, as the wrong files could be included in the output. This is in support of [GitHub #2274](Unidata/netcdf-c#2274). * Update H5FDhttp.[ch] to work with HDF5 version 1.13.2 and later. See [Github #2635](Unidata/netcdf-c#2635). * [Bug Fix] Update DAP code to enable CURLOPT_ACCEPT_ENCODING by default. See [Github #2630](Unidata/netcdf-c#2630). * [Bug Fix] Fix byterange failures for certain URLs. See [Github #2649](Unidata/netcdf-c#2649). * [Bug Fix] Fix 'make distcheck' error in run_interop.sh. See [Github #2631](Unidata/netcdf-c#2631). * [Enhancement] Update `nc-config` to remove inclusion from automatically-detected `nf-config` and `ncxx-config` files, as the wrong files could be included in the output. This is in support of [GitHub #2274](Unidata/netcdf-c#2274). * [Enhancement] Update H5FDhttp.[ch] to work with HDF5 version 1.14.0. See [Github #2615](Unidata/netcdf-c#2615). ## 4.9.1 - February 2, 2023 ## Known Issues * A test in the `main` branch of `netcdf-cxx4` is broken by this rc; this will bear further investigation, but not being treated as a roadblock for the release candidate. * The new document, `netcdf-c/docs/filter_quickstart.md` is in rough-draft form. * Race conditions exist in some of the tests when run concurrently with large numbers of processors ## What's Changed from v4.9.0 (automatically generated) * Fix nc_def_var_fletcher32 operation by \@DennisHeimbigner in Unidata/netcdf-c#2403 * Merge relevant info updates back into `main` by \@WardF in Unidata/netcdf-c#2387 * Add manual GitHub actions triggers for the tests. by \@WardF in Unidata/netcdf-c#2404 * Use env variable USERPROFILE instead of HOME for windows and mingw. by \@DennisHeimbigner in Unidata/netcdf-c#2405 * Make public a limited API for programmatic access to internal .rc tables by \@DennisHeimbigner in Unidata/netcdf-c#2408 * Fix typo in CMakeLists.txt by \@georgthegreat in Unidata/netcdf-c#2412 * Fix choice of HOME dir by \@DennisHeimbigner in Unidata/netcdf-c#2416 * Check for libxml2 development files by \@WardF in Unidata/netcdf-c#2417 * Updating Doxyfile.in with doxygen-1.8.17, turned on WARN_AS_ERROR, added doxygen build to CI run by \@edwardhartnett in Unidata/netcdf-c#2377 * updated release notes by \@edwardhartnett in Unidata/netcdf-c#2392 * increase read block size from 1 KB to 4 MB by \@wkliao in Unidata/netcdf-c#2319 * fixed RELEASE_NOTES.md by \@edwardhartnett in Unidata/netcdf-c#2423 * Fix pnetcdf tests in cmake by \@WardF in Unidata/netcdf-c#2437 * Updated CMakeLists to avoid corner case cmake error by \@WardF in Unidata/netcdf-c#2438 * Add `--disable-quantize` to configure by \@WardF in Unidata/netcdf-c#2439 * Fix the way CMake handles -DPLUGIN_INSTALL_DIR by \@DennisHeimbigner in Unidata/netcdf-c#2430 * fix and test quantize mode for NC_CLASSIC_MODEL by \@edwardhartnett in Unidata/netcdf-c#2445 * Guard _declspec(dllexport) in support of #2446 by \@WardF in Unidata/netcdf-c#2460 * Ensure that netcdf_json.h does not interfere with ncjson. by \@DennisHeimbigner in Unidata/netcdf-c#2448 * Prevent cmake writing to source dir by \@magnusuMET in Unidata/netcdf-c#2463 * more quantize testing and adding pre-processor constant NC_MAX_FILENAME to nc_tests.h by \@edwardhartnett in Unidata/netcdf-c#2457 * Provide a default enum const when fill value does not match any enum constant by \@DennisHeimbigner in Unidata/netcdf-c#2462 * Fix support for reading arrays of HDF5 fixed size strings by \@DennisHeimbigner in Unidata/netcdf-c#2466 * fix musl build by \@magnusuMET in Unidata/netcdf-c#1701 * Fix AWS SDK linking errors by \@dzenanz in Unidata/netcdf-c#2470 * Address jump-misses-init issue. by \@WardF in Unidata/netcdf-c#2488 * Remove stray merge conflict markers by \@WardF in Unidata/netcdf-c#2493 * Add support for Zarr string type to NCZarr by \@DennisHeimbigner in Unidata/netcdf-c#2492 * Fix some problems with PR 2492 by \@DennisHeimbigner in Unidata/netcdf-c#2497 * Fix some bugs in the blosc filter wrapper by \@DennisHeimbigner in Unidata/netcdf-c#2461 * Add option to control accessing external servers by \@DennisHeimbigner in Unidata/netcdf-c#2491 * Changed attribute case in documentation by \@WardF in Unidata/netcdf-c#2482 * Adding all-error-codes.md back in to distribution documentation. by \@WardF in Unidata/netcdf-c#2501 * Update hdf5 version in github actions. by \@WardF in Unidata/netcdf-c#2504 * Minor update to doxygen function documentation by \@gsjaardema in Unidata/netcdf-c#2451 * Fix some addtional errors in NCZarr by \@DennisHeimbigner in Unidata/netcdf-c#2503 * Cleanup szip handling some more by \@DennisHeimbigner in Unidata/netcdf-c#2421 * Check for zstd development headers in autotools by \@WardF in Unidata/netcdf-c#2507 * Add new options to nc-config by \@WardF in Unidata/netcdf-c#2509 * Cleanup built test sources in nczarr_test by \@DennisHeimbigner in Unidata/netcdf-c#2508 * Fix inconsistency in netcdf_meta.h by \@WardF in Unidata/netcdf-c#2512 * Small fix in nc-config.in by \@WardF in Unidata/netcdf-c#2513 * For loop initial declarations are only allowed in C99 mode by \@gsjaardema in Unidata/netcdf-c#2517 * Fix some dependencies in tst_nccopy3 by \@WardF in Unidata/netcdf-c#2518 * Update plugins/Makefile.am by \@WardF in Unidata/netcdf-c#2519 * Fix prereqs in ncdump/tst_nccopy4 in order to avoid race conditions. by \@WardF in Unidata/netcdf-c#2520 * Move construction of VERSION file to end of the build by \@DennisHeimbigner in Unidata/netcdf-c#2527 * Add draft filter quickstart guide by \@WardF in Unidata/netcdf-c#2531 * Turn off extraneous debug output by \@DennisHeimbigner in Unidata/netcdf-c#2537 * typo fix by \@wkliao in Unidata/netcdf-c#2538 * replace 4194304 with READ_BLOCK_SIZE by \@wkliao in Unidata/netcdf-c#2539 * Rename variable to avoid function name conflict by \@ibaned in Unidata/netcdf-c#2550 * Add Cygwin CI and stop installing unwanted plugins by \@DWesl in Unidata/netcdf-c#2529 * Merge subset of v4.9.1 files back into main development branch by \@WardF in Unidata/netcdf-c#2530 * Add a Filter quickstart guide document by \@WardF in Unidata/netcdf-c#2524 * Fix race condition in ncdump (and other) tests. by \@DennisHeimbigner in Unidata/netcdf-c#2552 * Make dap4 reference dap instead of hard-wired to be disabled. by \@WardF in Unidata/netcdf-c#2553 * Suppress nczarr_test/tst_unknown filter test by \@DennisHeimbigner in Unidata/netcdf-c#2557 * Add fenceposting for HAVE_DECL_ISINF and HAVE_DECL_ISNAN by \@WardF in Unidata/netcdf-c#2559 * Add an old static file. by \@WardF in Unidata/netcdf-c#2575 * Fix infinite loop in file inferencing by \@DennisHeimbigner in Unidata/netcdf-c#2574 * Merge Wellspring back into development branch by \@WardF in Unidata/netcdf-c#2560 * Allow ncdump -t to handle variable length string attributes by \@srherbener in Unidata/netcdf-c#2584 * Fix an issue I introduced with make distcheck by \@WardF in Unidata/netcdf-c#2590 * make UDF0 not require NC_NETCDF4 by \@jedwards4b in Unidata/netcdf-c#2586 * Expose user-facing documentation related to byterange DAP functionality. by \@WardF in Unidata/netcdf-c#2596 * Fix Memory Leak by \@DennisHeimbigner in Unidata/netcdf-c#2598 * CI: Change autotools CI build to out-of-tree build. by \@DWesl in Unidata/netcdf-c#2577 * Update github action configuration scripts. by \@WardF in Unidata/netcdf-c#2600 * Update the filter quickstart guide. by \@WardF in Unidata/netcdf-c#2602 * Fix symbol export on Windows by \@WardF in Unidata/netcdf-c#2604 ## New Contributors * \@georgthegreat made their first contribution in Unidata/netcdf-c#2412 * \@dzenanz made their first contribution in Unidata/netcdf-c#2470 * \@DWesl made their first contribution in Unidata/netcdf-c#2529 * \@srherbener made their first contribution in Unidata/netcdf-c#2584 * \@jedwards4b made their first contribution in Unidata/netcdf-c#2586 **Full Changelog**: Unidata/netcdf-c@v4.9.0...v4.9.1 ### 4.9.1 - Release Candidate 2 - November 21, 2022 #### Known Issues * A test in the `main` branch of `netcdf-cxx4` is broken by this rc; this will bear further investigation, but not being treated as a roadblock for the release candidate. * The new document, `netcdf-c/docs/filter_quickstart.md` is in rough-draft form. #### Changes * [Bug Fix] Fix a race condition when testing missing filters. See [Github #2557](Unidata/netcdf-c#2557). * [Bug Fix] Fix some race conditions due to use of a common file in multiple shell scripts . See [Github #2552](Unidata/netcdf-c#2552). ### 4.9.1 - Release Candidate 1 - October 24, 2022 * [Enhancement][Documentation] Add Plugins Quick Start Guide. See [GitHub #2524](Unidata/netcdf-c#2524) for more information. * [Enhancement] Add new entries in `netcdf_meta.h`, `NC_HAS_BLOSC` and `NC_HAS_BZ2`. See [Github #2511](Unidata/netcdf-c#2511) and [Github #2512](Unidata/netcdf-c#2512) for more information. * [Enhancement] Add new options to `nc-config`: `--has-multifilters`, `--has-stdfilters`, `--has-quantize`, `--plugindir`. See [Github #2509](Unidata/netcdf-c#2509) for more information. * [Bug Fix] Fix some errors detected in PR 2497. [PR #2497](Unidata/netcdf-c#2497) . See [Github #2503](Unidata/netcdf-c#2503). * [Bug Fix] Split the remote tests into two parts: one for the remotetest server and one for all other external servers. Also add a configure option to enable the latter set. See [Github #2491](Unidata/netcdf-c#2491). * [Bug Fix] Fix blosc plugin errors. See [Github #2461](Unidata/netcdf-c#2461). * [Bug Fix] Fix support for reading arrays of HDF5 fixed size strings. See [Github #2466](Unidata/netcdf-c#2466). * [Bug Fix] Fix some errors detected in [PR #2492](Unidata/netcdf-c#2492) . See [Github #2497](Unidata/netcdf-c#2497). * [Enhancement] Add support for Zarr (fixed length) string type in nczarr. See [Github #2492](Unidata/netcdf-c#2492). * [Bug Fix] Split the remote tests into two parts: one for the remotetest server and one for all other external servers. Also add a configure option to enable the latter set. See [Github #2491](Unidata/netcdf-c#2491). * [Bug Fix] Fix support for reading arrays of HDF5 fixed size strings. See [Github #2462](Unidata/netcdf-c#2466). * [Bug Fix] Provide a default enum const when fill value does not match any enum constant for the value zero. See [Github #2462](Unidata/netcdf-c#2462). * [Bug Fix] Fix the json submodule symbol conflicts between libnetcdf and the plugin specific netcdf_json.h. See [Github #2448](Unidata/netcdf-c#2448). * [Bug Fix] Fix quantize with CLASSIC_MODEL files. See [Github #2405](Unidata/netcdf-c#2445). * [Enhancement] Add `--disable-quantize` option to `configure`. * [Bug Fix] Fix CMakeLists.txt to handle all acceptable boolean values for -DPLUGIN_INSTALL_DIR. See [Github #2430](Unidata/netcdf-c#2430). * [Bug Fix] Fix tst_vars3.c to use the proper szip flag. See [Github #2421](Unidata/netcdf-c#2421). * [Enhancement] Provide a simple API to allow user access to the internal .rc file table: supports get/set/overwrite of entries of the form "key=value". See [Github #2408](Unidata/netcdf-c#2408). * [Bug Fix] Use env variable USERPROFILE instead of HOME for windows and mingw. See [Github #2405](Unidata/netcdf-c#2405). * [Bug Fix] Fix the nc_def_var_fletcher32 code in hdf5 to properly test value of the fletcher32 argument. See [Github #2403](Unidata/netcdf-c#2403). ## 4.9.0 - June 10, 2022 * [Enhancement] Add quantize functions nc_def_var_quantize() and nc_inq_var_quantize() to enable lossy compression. See [Github #1548](Unidata/netcdf-c#1548). * [Enhancement] Add zstandard compression functions nc_def_var_zstandard() and nc_inq_var_zstandard(). See [Github #2173](Unidata/netcdf-c#2173). * [Enhancement] Have netCDF-4 logging output one file per processor when used with parallel I/O. See [Github #1762](Unidata/netcdf-c#1762). * [Enhancement] Improve filter installation process to avoid use of an extra shell script. See [Github #2348](Unidata/netcdf-c#2348). * [Bug Fix] Get "make distcheck" to work See [Github #2343](Unidata/netcdf-c#2343). * [Enhancement] Allow the read/write of JSON-valued Zarr attributes to allow for domain specific info such as used by GDAL/Zarr. See [Github #2278](Unidata/netcdf-c#2278). * [Enhancement] Turn on the XArray convention for NCZarr files by default. WARNING, this means that the mode should explicitly specify "nczarr" or "zarr" even if "xarray" or "noxarray" is specified. See [Github #2257](Unidata/netcdf-c#2257). * [Enhancement] Update the documentation to match the current filter capabilities See [Github #2249](Unidata/netcdf-c#2249). * [Enhancement] Update the documentation to match the current filter capabilities. See [Github #2249](Unidata/netcdf-c#2249). * [Enhancement] Support installation of pre-built standard filters into user-specified location. See [Github #2318](Unidata/netcdf-c#2318). * [Enhancement] Improve filter support. More specifically (1) add nc_inq_filter_avail to check if a filter is available, (2) add the notion of standard filters, (3) cleanup szip support to fix interaction with NCZarr. See [Github #2245](Unidata/netcdf-c#2245). * [Enhancement] Switch to tinyxml2 as the default xml parser implementation. See [Github #2170](Unidata/netcdf-c#2170). * [Bug Fix] Require that the type of the variable in nc_def_var_filter is not variable length. See [Github #/2231](Unidata/netcdf-c#2231). * [File Change] Apply HDF5 v1.8 format compatibility when writing to previous files, as well as when creating new files. The superblock version remains at 2 for newly created files. Full backward read/write compatibility for netCDF-4 is maintained in all cases. See [Github #2176](Unidata/netcdf-c#2176). * [Enhancement] Add ability to set dataset alignment for netcdf-4/HDF5 files. See [Github #2206](Unidata/netcdf-c#2206). * [Bug Fix] Improve UTF8 support on windows so that it can use utf8 natively. See [Github #2222](Unidata/netcdf-c#2222). * [Enhancement] Add complete bitgroom support to NCZarr. See [Github #2197](Unidata/netcdf-c#2197). * [Bug Fix] Clean up the handling of deeply nested VLEN types. Marks nc_free_vlen() and nc_free_string as deprecated in favor of ncaux_reclaim_data(). See [Github #2179](Unidata/netcdf-c#2179). * [Bug Fix] Make sure that netcdf.h accurately defines the flags in the open/create mode flags. See [Github #2183](Unidata/netcdf-c#2183). * [Enhancement] Improve support for msys2+mingw platform. See [Github #2171](Unidata/netcdf-c#2171). * [Bug Fix] Clean up the various inter-test dependencies in ncdump for CMake. See [Github #2168](Unidata/netcdf-c#2168). * [Bug Fix] Fix use of non-aws appliances. See [Github #2152](Unidata/netcdf-c#2152). * [Enhancement] Added options to suppress the new behavior from [Github #2135](Unidata/netcdf-c#2135). The options for `cmake` and `configure` are, respectively `-DENABLE_LIBXML2` and `--(enable/disable)-libxml2`. Both of these options defaul to 'on/enabled'. When disabled, the bundled `ezxml` XML interpreter is used regardless of whether `libxml2` is present on the system. * [Enhancement] Support optional use of libxml2, otherwise default to ezxml. See [Github #2135](Unidata/netcdf-c#2135) -- H/T to [Egbert Eich](https://github.com/e4t). * [Bug Fix] Fix several os related errors. See [Github #2138](Unidata/netcdf-c#2138). * [Enhancement] Support byte-range reading of netcdf-3 files stored in private buckets in S3. See [Github #2134](Unidata/netcdf-c#2134) * [Enhancement] Support Amazon S3 access for NCZarr. Also support use of the existing Amazon SDK credentials system. See [Github #2114](Unidata/netcdf-c#2114) * [Bug Fix] Fix string allocation error in H5FDhttp.c. See [Github #2127](Unidata/netcdf-c#2127). * [Bug Fix] Apply patches for ezxml and for selected oss-fuzz detected errors. See [Github #2125](Unidata/netcdf-c#2125). * [Bug Fix] Ensure that internal Fortran APIs are always defined. See [Github #2098](Unidata/netcdf-c#2098). * [Enhancement] Support filters for NCZarr. See [Github #2101](Unidata/netcdf-c#2101) * [Bug Fix] Make PR 2075 long file name be idempotent. See [Github #2094](Unidata/netcdf-c#2094). ## 4.8.1 - August 18, 2021 * [Bug Fix] Fix multiple bugs in libnczarr. See [Github #2066](Unidata/netcdf-c#2066). * [Enhancement] Support windows network paths (e.g. \\svc\...). See [Github #2065](Unidata/netcdf-c#2065). * [Enhancement] Convert to a new representation of the NCZarr meta-data extensions: version 2. Read-only backward compatibility is provided. See [Github #2032](Unidata/netcdf-c#2032). * [Bug Fix] Fix dimension_separator bug in libnczarr. See [Github #2035](Unidata/netcdf-c#2035). * [Bug Fix] Fix bugs in libdap4. See [Github #2005](Unidata/netcdf-c#2005). * [Bug Fix] Store NCZarr fillvalue as a singleton instead of a 1-element array. See [Github #2017](Unidata/netcdf-c#2017). * [Bug Fixes] The netcdf-c library was incorrectly determining the scope of dimension; similar to the type scope problem. See [Github #2012](Unidata/netcdf-c#2012) for more information. * [Bug Fix] Re-enable DAP2 authorization testing. See [Github #2011](Unidata/netcdf-c#2011). * [Bug Fix] Fix bug with windows version of mkstemp that causes failure to create more than 26 temp files. See [Github #1998](Unidata/netcdf-c#1998). * [Bug Fix] Fix ncdump bug when printing VLENs with basetype char. See [Github #1986](Unidata/netcdf-c#1986). * [Bug Fixes] The netcdf-c library was incorrectly determining the scope of types referred to by nc_inq_type_equal. See [Github #1959](Unidata/netcdf-c#1959) for more information. * [Bug Fix] Fix bug in use of XGetopt when building under Mingw. See [Github #2009](Unidata/netcdf-c#2009). * [Enhancement] Improve the error reporting when attempting to use a filter for which no implementation can be found in HDF5_PLUGIN_PATH. See [Github #2000](Unidata/netcdf-c#2000) for more information. * [Bug Fix] Fix `make distcheck` issue in `nczarr_test/` directory. See [Github #2007](Unidata/netcdf-c#2007). * [Bug Fix] Fix bug in NCclosedir in dpathmgr.c. See [Github #2003](Unidata/netcdf-c#2003). * [Bug Fix] Fix bug in ncdump that assumes that there is a relationship between the total number of dimensions and the max dimension id. See [Github #2004](Unidata/netcdf-c#2004). * [Bug Fix] Fix bug in JSON processing of strings with embedded quotes. See [Github #1993](Unidata/netcdf-c#1993). * [Enhancement] Add support for the new "dimension_separator" enhancement to Zarr v2. See [Github #1990](Unidata/netcdf-c#1990) for more information. * [Bug Fix] Fix hack for handling failure of shell programs to properly handle escape characters. See [Github #1989](Unidata/netcdf-c#1989). * [Bug Fix] Allow some primitive type names to be used as identifiers depending on the file format. See [Github #1984](Unidata/netcdf-c#1984). * [Enhancement] Add support for reading/writing pure Zarr storage format that supports the XArray _ARRAY_DIMENSIONS attribute. See [Github #1952](Unidata/netcdf-c#1952) for more information. * [Update] Updated version of bzip2 used in filter testing/functionality, in support of [Github #1969](Unidata/netcdf-c#1969). * [Bug Fix] Corrected HDF5 version detection logic as described in [Github #1962](Unidata/netcdf-c#1962). ## 4.8.0 - March 30, 2021 * [Enhancement] Bump the NC_DISPATCH_VERSION from 2 to 3, and as a side effect, unify the definition of NC_DISPATCH_VERSION so it only needs to be defined in CMakeLists.txt and configure.ac. See [Github #1945](Unidata/netcdf-c#1945) for more information. * [Enhancement] Provide better cross platform path name management. This converts paths for various platforms (e.g. Windows, MSYS, etc.) so that they are in the proper format for the executing platform. See [Github #1958](Unidata/netcdf-c#1958) for more information. * [Bug Fixes] The nccopy program was treating -d0 as turning deflation on rather than interpreting it as "turn off deflation". See [Github #1944](Unidata/netcdf-c#1944) for more information. * [Enhancement] Add support for storing NCZarr data in zip files. See [Github #1942](Unidata/netcdf-c#1942) for more information. * [Bug Fixes] Make fillmismatch the default for DAP2 and DAP4; too many servers ignore this requirement. * [Bug Fixes] Fix some memory leaks in NCZarr, fix a bug with long strides in NCZarr. See [Github #1913](Unidata/netcdf-c#1913) for more information. * [Enhancement] Add some optimizations to NCZarr, dosome cleanup of code cruft, add some NCZarr test cases, add a performance test to NCZarr. See [Github #1908](Unidata/netcdf-c#1908) for more information. * [Bug Fix] Implement a better chunk cache system for NCZarr. The cache now uses extendible hashing plus a linked list for provide a combination of expandibility, fast access, and LRU behavior. See [Github #1887](Unidata/netcdf-c#1887) for more information. * [Enhancement] Provide .rc fields for S3 authentication: HTTP.S3.ACCESSID and HTTP.S3.SECRETKEY. * [Enhancement] Give the client control over what parts of a DAP2 URL are URL encoded (i.e. %xx). This is to support the different decoding rules that servers apply to incoming URLS. See [Github #1884](Unidata/netcdf-c#1884) for more information. * [Bug Fix] Fix incorrect time offsets from `ncdump -t`, in some cases when the time `units` attribute contains both a **non-zero** time-of-day, and a time zone suffix containing the letter "T", such as "UTC". See [Github #1866](Unidata/netcdf-c#1866) for more information. * [Bug Fix] Cleanup the NCZarr S3 build options. See [Github #1869](Unidata/netcdf-c#1869) for more information. * [Bug Fix] Support aligned access for selected ARM processors. See [Github #1871](Unidata/netcdf-c#1871) for more information. * [Documentation] Migrated the documents in the NUG/ directory to the dedicated NUG repository found at https://github.com/Unidata/netcdf * [Bug Fix] Revert the internal filter code to simplify it. From the user's point of view, the only visible change should be that (1) the functions that convert text to filter specs have had their signature reverted and renamed and have been moved to netcdf_aux.h, and (2) Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter. Internally, the dispatch table has been modified to get rid of the complex structures. * [Bug Fix] If the HDF5 byte-range Virtual File Driver is available )HDf5 1.10.6 or later) then use it because it has better performance than the one currently built into the netcdf library. * [Bug Fix] Fixed byte-range support with cURL > 7.69. See [Unidata/netcdf-c#1798]. * [Enhancement] Added new test for using compression with parallel I/O: nc_test4/tst_h_par_compress.c. See [Unidata/netcdf-c#1784]. * [Bug Fix] Don't return error for extra calls to nc_redef() for netCDF/HDF5 files, unless classic model is in use. See [Unidata/netcdf-c#1779]. * [Enhancement] Added new parallel I/O benchmark program to mimic NOAA UFS data writes, built when --enable-benchmarks is in configure. See [Unidata/netcdf-c#1777]. * [Bug Fix] Now allow szip to be used on variables with unlimited dimension [Unidata/netcdf-c#1774]. * [Enhancement] Add support for cloud storage using a variant of the Zarr storage format. Warning: this feature is highly experimental and is subject to rapid evolution [https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in]. * [Bug Fix] Fix nccopy to properly set default chunking parameters when not otherwise specified. This can significantly improve performance in selected cases. Note that if seeing slow performance with nccopy, then, as a work-around, specifically set the chunking parameters. [Unidata/netcdf-c#1763]. * [Bug Fix] Fix some protocol bugs/differences between the netcdf-c library and the OPeNDAP Hyrax server. Also cleanup checksum handling [https://github.com/Unidata/netcdf-c/issues/1712].* [Bug Fix] IMPORTANT: Ncgen was not properly handling large data sections. The problem manifests as incorrect ordering of data in the created file. Aside from examining the file with ncdump, the error can be detected by running ncgen with the -lc flag (to produce a C file). Examine the file to see if any variable is written in pieces as opposed to a single call to nc_put_vara. If multiple calls to nc_put_vara are used to write a variable, then it is probable that the data order is incorrect. Such multiple writes can occur for large variables and especially when one of the dimensions is unlimited. * [Bug Fix] Add necessary __declspec declarations to allow compilation of netcdf library without causing errors or (_declspec related) warnings [Unidata/netcdf-c#1725]. * [Enhancement] When a filter is applied twice with different parameters, then the second set is used for writing the dataset [Unidata/netcdf-c#1713]. * [Bug Fix] Now larger cache settings are used for sequential HDF5 file creates/opens on parallel I/O capable builds; see [Github #1716](Unidata/netcdf-c#1716) for more information. * [Bug Fix] Add functions to libdispatch/dnotnc4.c to support dispatch table operations that should work for any dispatch table, even if they do not do anything; functions such as nc_inq_var_filter [Unidata/netcdf-c#1693]. * [Bug Fix] Fixed a scalar annotation error when scalar == 0; see [Github #1707](Unidata/netcdf-c#1707) for more information. * [Bug Fix] Use proper CURLOPT values for VERIFYHOST and VERIFYPEER; the semantics for VERIFYHOST in particular changed. Documented in NUG/DAP2.md. See [Unidata/netcdf-c#1684]. * [Bug Fix][cmake] Correct an issue with parallel filter test logic in CMake-based builds. * [Bug Fix] Now allow nc_inq_var_deflate()/nc_inq_var_szip() to be called for all formats, not just HDF5. Non-HDF5 files return NC_NOERR and report no compression in use. This reverts behavior that was changed in the 4.7.4 release. See [Unidata/netcdf-c#1691]. * [Bug Fix] Compiling on a big-endian machine exposes some missing forward delcarations in dfilter.c. * [File Change] Change from HDF5 v1.6 format compatibility, back to v1.8 compatibility, for newly created files. The superblock changes from version 0 back to version 2. An exception is when using libhdf5 deprecated versions 1.10.0 and 1.10.1, which can only create v1.6 compatible format. Full backward read/write compatibility for netCDF-4 is maintained in all cases. See [Github #951](Unidata/netcdf-c#951). ## 4.7.4 - March 27, 2020 * [Windows] Bumped packaged HDF5 to 1.10.6, HDF4 to 4.2.14, and libcurl to 7.60.0. * [Enhancement] Support has been added for HDF5-1.12.0. See [Unidata/netcdf-c#1528]. * [Bug Fix] Correct behavior for the command line utilities when directly accessing a directory using utf8 characters. See [Github #1669] (Unidata/netcdf-c#1669), [Github #1668] (Unidata/netcdf-c#1668) and [Github #1666] (Unidata/netcdf-c#1666) for more information. * [Bug Fix] Attempts to set filters or chunked storage on scalar vars will now return NC_EINVAL. Scalar vars cannot be chunked, and only chunked vars can have filters. Previously the library ignored these attempts, and always storing scalars as contiguous storage. See [Unidata/netcdf-c#1644]. * [Enhancement] Support has been added for multiple filters per variable. See [Unidata/netcdf-c#1584]. * [Enhancement] Now nc_inq_var_szip retuns 0 for parameter values if szip is not in use for var. See [Unidata/netcdf-c#1618]. * [Enhancement] Now allow parallel I/O with filters, for HDF5-1.10.3 and later. See [Unidata/netcdf-c#1473]. * [Enhancement] Increased default size of cache buffer to 16 MB, from 4 MB. Increased number of slots to 4133. See [Unidata/netcdf-c#1541]. * [Enhancement] Allow zlib compression to be used with parallel I/O writes, if HDF5 version is 1.10.3 or greater. See [Unidata/netcdf-c#1580]. * [Enhancement] Restore use of szip compression when writing data (including writing in parallel if HDF5 version is 1.10.3 or greater). See [Unidata/netcdf-c#1546]. * [Enhancement] Enable use of compact storage option for small vars in netCDF/HDF5 files. See [Unidata/netcdf-c#1570]. * [Enhancement] Updated benchmarking program bm_file.c to better handle very large files. See [Unidata/netcdf-c#1555]. * [Enhancement] Added version number to dispatch table, and now check version with nc_def_user_format(). See [Unidata/netcdf-c#1599]. * [Bug Fix] Fixed user setting of MPI launcher for parallel I/O HDF5 test in h5_test. See [Unidata/netcdf-c#1626]. * [Bug Fix] Fixed problem of growing memory when netCDF-4 files were opened and closed. See [Unidata/netcdf-c#1575 and Unidata/netcdf-c#1571]. * [Enhancement] Increased size of maximum allowed name in HDF4 files to NC_MAX_NAME. See [Unidata/netcdf-c#1631]. ## 4.7.3 - November 20, 2019 * [Bug Fix]Fixed an issue where installs from tarballs will not properly compile in parallel environments. * [Bug Fix] Library was modified so that rewriting the same attribute happens without deleting the attribute, to avoid a limit on how many times this may be done in HDF5. This fix was thought to be in 4.6.2 but was not. See [Unidata/netcdf-c#350]. * [Enhancement] Add a dispatch version number to netcdf_meta.h and libnetcdf.settings, in case we decide to change dispatch table in future. See [Unidata/netcdf-c#1469]. * [Bug Fix] Now testing that endianness can only be set on atomic ints and floats. See [Unidata/netcdf-c#1479]. * [Bug Fix] Fix for subtle error involving var and unlimited dim of the same name, but unrelated, in netCDF-4. See [Unidata/netcdf-c#1496]. * [Enhancement] Update for attribute documentation. See [Unidata/netcdf-c#1512]. * [Bug Fix][Enhancement] Corrected assignment of anonymous (a.k.a. phony) dimensions in an HDF5 file. Now when a dataset uses multiple dimensions of the same size, netcdf assumes they are different dimensions. See [GitHub #1484](Unidata/netcdf-c#1484) for more information. ## 4.7.2 - October 22, 2019 * [Bug Fix][Enhancement] Various bug fixes and enhancements. * [Bug Fix][Enhancement] Corrected an issue where protected memory was being written to with some pointer slight-of-hand. This has been in the code for a while, but appears to be caught by the compiler on OSX, under circumstances yet to be completely nailed down. See [GitHub #1486] (Unidata/netcdf-c#1486) for more information. * [Enhancement] [Parallel IO] Added support for parallel functions in MSVC. See [Github #1492](Unidata/netcdf-c#1492) for more information. * [Enhancement] Added a function for changing the ncid of an open file. This function should only be used if you know what you are doing, and is meant to be used primarily with PIO integration. See [GitHub #1483] (Unidata/netcdf-c#1483) and [GitHub #1487] (Unidata/netcdf-c#1487) for more information. ## 4.7.1 - August 27, 2019 * [Enhancement] Added unit_test directory, which contains unit tests for the libdispatch and libsrc4 code (and any other directories that want to put unit tests there). Use --disable-unit-tests to run without unit tests (ex. for code coverage analysis). See [GitHub #1458] (Unidata/netcdf-c#1458). * [Bug Fix] Remove obsolete _CRAYMPP and LOCKNUMREC macros from code. Also brought documentation up to date in man page. These macros were used in ancient times, before modern parallel I/O systems were developed. Programmers interested in parallel I/O should see nc_open_par() and nc_create_par(). See [GitHub #1459](Unidata/netcdf-c#1459). * [Enhancement] Remove obsolete and deprecated functions nc_set_base_pe() and nc_inq_base_pe() from the dispatch table. (Both functions are still supported in the library, this is an internal change only.) See [GitHub #1468](Unidata/netcdf-c#1468). * [Bug Fix] Reverted nccopy behavior so that if no -c parameters are given, then any default chunking is left to the netcdf-c library to decide. See [GitHub #1436](Unidata/netcdf-c#1436). ## 4.7.0 - April 29, 2019 * [Enhancement] Updated behavior of `pkgconfig` and `nc-config` to allow the use of the `--static` flags, e.g. `nc-config --libs --static`, which will show information for linking against `libnetcdf` statically. See [Github #1360] (Unidata/netcdf-c#1360) and [Github #1257] (Unidata/netcdf-c#1257) for more information. * [Enhancement] Provide byte-range reading of remote datasets. This allows read-only access to, for example, Amazon S3 objects and also Thredds Server datasets via the HTTPService access method. See [GitHub #1251] (Unidata/netcdf-c#1251). * Update the license from the home-brewed NetCDF license to the standard 3-Clause BSD License. This change does not result in any new restrictions; it is merely the adoption of a standard, well-known and well-understood license in place of the historic NetCDF license written at Unidata. This is part of a broader push by Unidata to adopt modern, standardized licensing. ## 4.6.3 - February 28, 2019 * [Bug Fix] Correctly generated `netcdf.pc` generated either by `configure` or `cmake`. If linking against a static netcdf, you would need to pass the `--static` argument to `pkg-config` in order to list all of the downstream dependencies. See [Github #1324](Unidata/netcdf-c#1324) for more information. * Now always write hidden coordinates attribute, which allows faster file opens when present. See [Github #1262](Unidata/netcdf-c#1262) for more information. * Some fixes for rename, including fix for renumbering of varids after a rename (#1307), renaming var to dim without coordinate var. See [Github #1297] (Unidata/netcdf-c#1297). * Fix of NULL parameter causing segfaults in put_vars functions. See [Github #1265] (Unidata/netcdf-c#1265) for more information. * Fix of --enable-benchmark benchmark tests [Github #1211] (Unidata/netcdf-c#1211) * Update the license from the home-brewed NetCDF license to the standard 3-Clause BSD License. This change does not result in any new restrictions; it is merely the adoption of a standard, well-known and well-understood license in place of the historic NetCDF license written at Unidata. This is part of a broader push by Unidata to adopt modern, standardized licensing. * [BugFix] Corrected DAP-related issues on big-endian machines. See [Github #1321] (Unidata/netcdf-c#1321), [Github #1302] (Unidata/netcdf-c#1302) for more information. * [BugFix][Enhancement] Various and sundry bugfixes and performance enhancements, thanks to \@edhartnett, \@gsjarrdema, \@t-b, \@wkliao, and all of our other contributors. * [Enhancement] Extended `nccopy -F` syntax to support multiple variables with a single invocation. See [Github #1311](Unidata/netcdf-c#1311) for more information. * [BugFix] Corrected an issue where DAP2 was incorrectly converting signed bytes, resulting in an erroneous error message under some circumstances. See [GitHub #1317] (Unidata/netcdf-c#1317) for more information. See [Github #1319] (Unidata/netcdf-c#1319) for related information. * [BugFix][Enhancement] Modified `nccopy` so that `_NCProperties` is not copied over verbatim but is instead generated based on the version of `libnetcdf` used when copying the file. Additionally, `_NCProperties` are displayed if/when associated with a netcdf3 file, now. See [GitHub#803] (Unidata/netcdf-c#803) for more information.

edwardhartnett changed the title ~~Provide a way to do bit-smoothing before compression~~ Provide a way to do data quantizing before compression Dec 3, 2019

edwardhartnett changed the title ~~Provide a way to do data quantizing before compression~~ Provide a way to do bit grooming before compression Dec 3, 2019

edwardhartnett mentioned this issue Jun 26, 2021

test, demonstrate, and document using zlib-ng for faster (but backward compatible) compression/decompression #2022

Closed

This was referenced Aug 20, 2021

Adding nc_def_var_quantize()/nc_inq_var_quantize() #2081

Closed

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Merged

edwardhartnett mentioned this issue Sep 4, 2021

Where should I document quantization feature? Unidata/netcdf#46

Open

WardF closed this as completed in #2088 Oct 1, 2021

edwardhartnett mentioned this issue Nov 26, 2021

adding quantize test #2154

Merged

Provide a way to do bit grooming before compression #1548

Provide a way to do bit grooming before compression #1548

Comments

edwardhartnett commented Nov 21, 2019 • edited Loading

czender commented Dec 2, 2019

edwardhartnett commented Dec 3, 2019

czender commented Dec 3, 2019

edwardhartnett commented Dec 3, 2019 • edited Loading

WardF commented Dec 3, 2019

edwardhartnett commented Dec 3, 2019

edwardhartnett commented Dec 3, 2019

lesserwhirls commented Dec 3, 2019

edwardhartnett commented Dec 3, 2019

WardF commented Dec 3, 2019

lesserwhirls commented Dec 3, 2019

edwardhartnett commented Dec 3, 2019

lesserwhirls commented Dec 3, 2019

DennisHeimbigner commented Dec 4, 2019

edwardhartnett commented Dec 4, 2019

czender commented Dec 4, 2019

lesserwhirls commented Dec 4, 2019

czender commented Dec 5, 2019

gsjaardema commented Dec 5, 2019

edwardhartnett commented Dec 5, 2019

lesserwhirls commented Dec 6, 2019

edwardhartnett commented Dec 6, 2019

edhartnett commented Dec 10, 2019

czender commented Dec 10, 2019

WardF commented Dec 10, 2019

edwardhartnett commented Dec 17, 2019

edwardhartnett commented Dec 17, 2019

czender commented Dec 17, 2019

edwardhartnett commented Jun 26, 2021

DennisHeimbigner commented Jun 26, 2021

edwardhartnett commented Jun 26, 2021

DennisHeimbigner commented Jun 26, 2021

edwardhartnett commented Jun 26, 2021

DennisHeimbigner commented Jun 26, 2021

edwardhartnett commented Jun 26, 2021 • edited Loading

DennisHeimbigner commented Jun 26, 2021

dopplershift commented Jun 26, 2021

edwardhartnett commented Jun 26, 2021 • edited Loading

DennisHeimbigner commented Jun 26, 2021

DennisHeimbigner commented Jun 26, 2021

edwardhartnett commented Jun 27, 2021

DennisHeimbigner commented Jun 27, 2021

edwardhartnett commented Jun 28, 2021

czender commented Jul 2, 2021

edwardhartnett commented Jul 2, 2021

czender commented Jul 2, 2021

edwardhartnett commented Jul 3, 2021

DennisHeimbigner commented Jul 3, 2021 • edited Loading

czender commented Jul 3, 2021

edwardhartnett commented Aug 30, 2021

edwardhartnett commented Oct 2, 2021

edwardhartnett commented Nov 21, 2019 •

edited

Loading

edwardhartnett commented Dec 3, 2019 •

edited

Loading

edwardhartnett commented Jun 26, 2021 •

edited

Loading

edwardhartnett commented Jun 26, 2021 •

edited

Loading

DennisHeimbigner commented Jul 3, 2021 •

edited

Loading