Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for IBM Z hardware-accelerated deflate
IBM Z mainframes starting from version z15 provide DFLTCC instruction, which implements deflate algorithm in hardware with estimated compression and decompression performance orders of magnitude faster than the current zlib and ratio comparable with that of level 1. This patch adds DFLTCC support to zlib. It can be enabled using the following build commands: $ ./configure --dfltcc $ make When built like this, zlib would compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e., to make it used by default) one could either configure with `--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run time. Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. One such use case - reproducible software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH` environment variable is set, the hardware compression is disabled. DFLTCC does not support every single zlib feature, in particular: * `inflate(Z_BLOCK)` and `inflate(Z_TREES)` * `inflateMark()` * `inflatePrime()` * `inflateSyncPoint()` When used, these functions will either switch to software, or, in case this is not possible, gracefully fail. This patch tries to add DFLTCC support in the least intrusive way. All SystemZ-specific code is placed into a separate file, but unfortunately there is still a noticeable amount of changes in the main zlib code. Below is the summary of these changes. DFLTCC takes as arguments a parameter block, an input buffer, an output buffer and a window. Since DFLTCC requires parameter block to be doubleword-aligned, and it's reasonable to allocate it alongside deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and `ZCOPY_STATE()` macros are introduced in order to encapsulate the allocation details. The same is true for window, for which the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced. Software and hardware window formats do not match, therefore, `deflateSetDictionary()`, `deflateGetDictionary()`, `inflateSetDictionary()` and `inflateGetDictionary()` need special handling, which is triggered using the new `DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`, `INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()` macros. `deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC parameter block, which is allocated alongside zlib state, using the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()` macros. The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware and the software deflate implementations when the `deflateParams()` arguments demand this. The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and `INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported calls gracefully fail. The algorithm implemented in the hardware has different compression ratio than the one implemented in software. In order for `deflateBound()` to return the correct results for the hardware implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and `DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced. Actual compression and decompression are handled by the new `DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation with DFLTCC manages the window on its own, calling `updatewindow()` is suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro. In addition to the compression, DFLTCC computes the CRC-32 and Adler-32 checksums, therefore, whenever it's used, the software checksumming is suppressed using the new `DEFLATE_NEED_CHECKSUM()` and `INFLATE_NEED_CHECKSUM()` macros. DFLTCC will refuse to write an End-of-block Symbol if there is no input data, thus in some cases it is necessary to do this manually. In order to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and `flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`. Furthermore, since the block and the stream termination must be handled in software as well, `enum block_state` is moved to `deflate.h`. Since the first call to `dfltcc_inflate()` already needs the window, and it might be not allocated yet, `inflate_ensure_window()` is factored out of `updatewindow()` and made `ZLIB_INTERNAL`. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
- Loading branch information