-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test, demonstrate, and document using zlib-ng for faster (but backward compatible) compression/decompression #2022
Comments
Interesting! I wasn’t aware of this library/replacement for zlib. Occupied with family stuff but will dig into this. |
This plus #1548 could be a real game-changer for large data producers, speeding write times by an order of magnitude, while increasing compression dramatically. |
I've been using it for several weeks and it is definitely faster. I didn't have to change anything in HDF5 or NetCDF to use it. |
Greg- how did you do that; did you just rename the .so file? |
BTW thanks Ed for finding all these filter improvements. Really appreciated. |
@DennisHeimbigner The zlib-ng build process has a
So all that needs to be done is to point to the correct directory and everything works as it should (just faster) |
@gsjaardema do you know how fast? Also @gsjaardema , @DennisHeimbigner , @WardF, anyone interested in co-authoring a paper about this to AGU this year? We need to get this news out to the peeps, and an AGU extended abstract and poster seem like a great way to do it. What I would love to do is implement the bit-grooming, and then demonstrate the value of both big-grooming and zlib-ng in a poster/paper which can then be distributed to any who are interested... |
(@gsjaardema send me your email and I will send draft abstract...) |
These are some very quick test results. old = system zlib, Time is seconds to write a 100,000,000 element generated mesh. Note that for zlib-ng Compress read speed is also faster.
|
@edwardhartnett For email, just add |
Interesting; @gsjaardema, thanks for the information. We can update the documentation re: configuring the new library so that it acts as a drop-in replacement, and it's nice we won't have to make modifications to the build systems. @edwardhartnett I'd be interested in contributing, sure. I agree this is information we need to get out there. |
@gsjaardema awesome results! The speedup of writing compressed data is no doubt because the compression is now happening so much faster, and then there's less data to be written, also speeding up write times. As a result, we get a faster time writing compressed than uncompressed data. NOAA will be DELIGHTED with these results! Other large data producers, like NASA and ESA will be similarly happy. |
@gsjaardema What kind of storage were you writing to? (e.g. spinning disk, ssd, NVMe, network, etc.) |
@dopplershift It was spinning disk. |
Thanks. Implies the numbers above, while awesome, are definitely a best case scenario (WRT compressed writing being faster anyway). |
OK, I think this issue can be closed. zlib-ng works well, and does indeed function as a drop-in replacement for Zlib. We have tested this on several HPC systems, with the UFS and other software, as well as dedicated testing within the netcdf-c nc_perf directory. Zlib-ng produces data that can also easily and transparently be read by zlib. In other words, it is fully backward compatible. Data producers can use zlib-ng even if their users are still using zlib. Here's a typical performance chart comparing zlib to other compression methods. Notice zlib-ng is about twice as fast as zlib: I recommend that all data producers switch to zlib-ng for better performance, with zero code changes. This will be our recommendation to the NOAA UFS team, and I suspect they will approve of this upgrade to performance. For more detail on our recent compression studies, see our AGU extended abstract: https://www.researchgate.net/publication/357001251_Quantization_and_Next-Generation_Zlib_Compression_for_Fully_Backward-Compatible_Faster_and_More_Effective_Data_Compression_in_NetCDF_Files This paper from Nature is also highly relevant and interesting: "Compressing atmospheric data into its real information content |
This is part of #1545.
There is a new zlib library, zlib-ng: https://github.com/zlib-ng/zlib-ng
This is a drop-in replacement for zlib (when correctly configured). But much faster. And it's fully backward compatible (so they say). That is, we can use zlib-ng and the resulting compressed data can be read by existing zlib releases. (Though reading is also faster if zlib-ng is used there as well).
This should therefore work transparently in netcdf-java.
The goal is to make whatever modifications in the cmake and autotools build systems to support zlib-ng. It will also need to be tested.
I believe very little if any changes will be necessary, but of course this needs to be thoroughly tested. Then it needs to be explained to netCDF users, and the results demonstrated.
The text was updated successfully, but these errors were encountered: