Skip to content

Commit

Permalink
Add support for IBM Z hardware-accelerated deflate
Browse files Browse the repository at this point in the history
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. It can be enabled using the
following build commands:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and
in software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
to make it used by default) one could either configure with
`--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run
time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH`
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
    * `inflateMark()`
    * `inflatePrime()`
    * `inflateSyncPoint()`

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and
`ZCOPY_STATE()` macros are introduced in order to encapsulate the
allocation details. The same is true for window, for which
the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced.

Software and hardware window formats do not match, therefore,
`deflateSetDictionary()`, `deflateGetDictionary()`,
`inflateSetDictionary()` and `inflateGetDictionary()` need special
handling, which is triggered using the new
`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()`
macros.

`deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()`
macros.

The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware
and the software deflate implementations when the `deflateParams()`
arguments demand this.

The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
calls gracefully fail.

The algorithm implemented in the hardware has different compression
ratio than the one implemented in software. In order for
`deflateBound()` to return the correct results for the hardware
implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and
`DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced.

Actual compression and decompression are handled by the new
`DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation
with DFLTCC manages the window on its own, calling `updatewindow()` is
suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro.

In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
checksums, therefore, whenever it's used, the software checksumming is
suppressed using the new `DEFLATE_NEED_CHECKSUM()` and
`INFLATE_NEED_CHECKSUM()` macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and
`flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`.
Furthermore, since the block and the stream termination must be handled
in software as well, `enum block_state` is moved to `deflate.h`.

Since the first call to `dfltcc_inflate()` already needs the window,
and it might be not allocated yet, `inflate_ensure_window()` is
factored out of `updatewindow()` and made `ZLIB_INTERNAL`.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
  • Loading branch information
iii-i committed Sep 25, 2023
1 parent 559c8ee commit 481ee63
Show file tree
Hide file tree
Showing 17 changed files with 1,371 additions and 59 deletions.
8 changes: 8 additions & 0 deletions Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,14 @@ match.lo: match.S
mv _match.o match.lo
rm -f _match.s

dfltcc.o: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)contrib/s390/dfltcc.c

dfltcc.lo: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
-@mkdir objs 2>/dev/null || test -d objs
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/dfltcc.o $(SRCDIR)contrib/s390/dfltcc.c
-@mv objs/dfltcc.o $@

crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
$(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c

Expand Down
14 changes: 13 additions & 1 deletion compress.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,15 @@

/* @(#) $Id$ */

#define ZLIB_INTERNAL
#include "zutil.h"
#include "zlib.h"

#ifdef DFLTCC
# include "contrib/s390/dfltcc.h"
#else
#define DEFLATE_BOUND_COMPLEN(source_len) 0
#endif

/* ===========================================================================
Compresses the source buffer into the destination buffer. The level
parameter has the same meaning as in deflateInit. sourceLen is the byte
Expand Down Expand Up @@ -70,6 +76,12 @@ int ZEXPORT compress(Bytef *dest, uLongf *destLen, const Bytef *source,
this function needs to be updated.
*/
uLong ZEXPORT compressBound(uLong sourceLen) {
uLong complen = DEFLATE_BOUND_COMPLEN(sourceLen);

if (complen > 0)
/* Architecture-specific code provided an upper bound. */
return complen + ZLIB_WRAPLEN;

return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
(sourceLen >> 25) + 13;
}
24 changes: 24 additions & 0 deletions configure
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ case "$1" in
echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
echo ' [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
echo ' [--includedir=INCLUDEDIR] [--archs="-arch i386 -arch x86_64"]' | tee -a configure.log
echo ' [--dfltcc] [--dfltcc-level-mask=MASK]' | tee -a configure.log
exit 0 ;;
-p*=* | --prefix=*) prefix=`echo $1 | sed 's/.*=//'`; shift ;;
-e*=* | --eprefix=*) exec_prefix=`echo $1 | sed 's/.*=//'`; shift ;;
Expand All @@ -143,6 +144,16 @@ case "$1" in
--sanitize) address=1; shift ;;
--address) address=1; shift ;;
--memory) memory=1; shift ;;
--dfltcc)
CFLAGS="$CFLAGS -DDFLTCC"
OBJC="$OBJC dfltcc.o"
PIC_OBJC="$PIC_OBJC dfltcc.lo"
shift
;;
--dfltcc-level-mask=*)
CFLAGS="$CFLAGS -DDFLTCC_LEVEL_MASK=`echo $1 | sed 's/.*=//'`"
shift
;;
*)
echo "unknown option: $1" | tee -a configure.log
echo "$0 --help for help" | tee -a configure.log
Expand Down Expand Up @@ -834,6 +845,19 @@ EOF
fi
fi

# Check whether sys/sdt.h is available
cat > $test.c << EOF
#include <sys/sdt.h>
int main() { return 0; }
EOF
if try $CC -c $CFLAGS $test.c; then
echo "Checking for sys/sdt.h ... Yes." | tee -a configure.log
CFLAGS="$CFLAGS -DHAVE_SYS_SDT_H"
SFLAGS="$SFLAGS -DHAVE_SYS_SDT_H"
else
echo "Checking for sys/sdt.h ... No." | tee -a configure.log
fi

# test to see if we can use a gnu indirection function to detect and load optimized code at runtime
echo >> configure.log
cat > $test.c <<EOF
Expand Down
4 changes: 4 additions & 0 deletions contrib/README.contrib
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ puff/ by Mark Adler <madler@alumni.caltech.edu>
Small, low memory usage inflate. Also serves to provide an
unambiguous description of the deflate format.

s390/ by Ilya Leoshkevich <iii@linux.ibm.com>
Hardware-accelerated deflate on IBM Z with DEFLATE CONVERSION CALL
instruction.

testzlib/ by Gilles Vollant <info@winimage.com>
Example of the use of zlib

Expand Down
17 changes: 17 additions & 0 deletions contrib/s390/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This directory adds DFLTCC support. In order to enable it, the following
build commands should be used:

$ ./configure --dfltcc
$ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.
Loading

0 comments on commit 481ee63

Please sign in to comment.