Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-103477: Read and write gzip header and trailer with zlib #103478

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iii-i
Copy link
Contributor

@iii-i iii-i commented Apr 12, 2023

RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib optimization [1] that significantly improves deflate and inflate performance on this platform by using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a checksum. At the moment Pyhton's gzip support performs compression and checksum calculation separately, which creates unnecessary overhead on s390x.

The reason is that Python needs to write specific values into gzip header; and when this support was introduced in year 1997, there was indeed no better way to do this.

Since v1.2.2.1 (2011) zlib provides inflateGetHeader() and deflateSetHeader() functions for that, so Python does not have to deal with the exact header and trailer format anymore.

Add new interfaces to zlibmodule.c that make use of these functions:

  • Add mtime argument to zlib.compress().
  • Add mtime and fname arguments to zlib.compressobj().
  • Add gz_header_mtime and gz_header_done propeties to ZlibDecompressor.

In Python modules, replace raw streams with gzip streams, make use of these new interfaces, and remove all mentions of crc32.

[1] madler/zlib#410

@encukou
Copy link
Member

encukou commented Apr 17, 2023

ping @Yhg1s @gpshead as zlib experts

@iii-i
Copy link
Contributor Author

iii-i commented May 31, 2023

Rebased. Could someone have a look please?

Tests / Hypothesis tests on Ubuntu seems to be independent and is affecting other PRs as well.

@iii-i iii-i force-pushed the gh-103477 branch 2 times, most recently from 0ed5988 to 1f28fdd Compare June 29, 2023 12:05
@iii-i
Copy link
Contributor Author

iii-i commented Jun 29, 2023

  • Rebase.
  • Handle gz.name in zlib_Compress_copy_impl().
  • Fix free()/PyMem_Free() mixup.
  • Drop the no longer used _read_exact().
  • Adapt the test_flush_flushes_compressor() test now that
    _read_gzip_header() is gone.

@iii-i
Copy link
Contributor Author

iii-i commented Jul 18, 2023

Rebased.
@Yhg1s & @gpshead could you have a look please?

@rhpvorderman
Copy link
Contributor

This will break

try:
    ...
except BadGzipFile:
    ...

I don't know if this is a big problem though given that theoretically a zlib.error can also occur in such scenarios anyway.

@iii-i
Copy link
Contributor Author

iii-i commented Oct 9, 2023

Thanks for having a look! That's indeed what test_bad_gzip_file() has detected. I thought we could live with that, but now that you brought that up, I found a simple workaround: convert zlib.errors that happen when gz_header_done != 1 to BadGzipFile.

RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate and inflate
performance on this platform by using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead on
s390x.

The reason is that Python needs to write specific values into gzip
header; and when this support was introduced in year 1997, there was
indeed no better way to do this.

Since v1.2.2.1 (2011) zlib provides inflateGetHeader() and
deflateSetHeader() functions for that, so Python does not have to deal
with the exact header and trailer formats anymore.

Add the new interfaces to zlibmodule.c that make use of these
functions:

* Add mtime argument to zlib.compress().
* Add mtime and fname arguments to zlib.compressobj().
* Add gz_header_mtime and gz_header_done propeties to ZlibDecompressor.

In Python modules, replace raw streams with gzip streams, make use of
the new interfaces, and remove all mentions of crc32.

📜🤖 NEWS entry added by blurb_it.

[1] madler/zlib#410
@iii-i
Copy link
Contributor Author

iii-i commented Oct 9, 2023

The Windows failure is:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\cpython\\cpython\\build\\test_python_3232�'

It looks unrelated to me.

@rhpvorderman
Copy link
Contributor

@iii-i I have been thinking about this. I am not a CPython core developer but I did contribute quite heavily to the gzip module and I am maintaining python-isal.

This is a pretty massive code change, only so that zlib can write the header instead of python. But that does not add any extra functionality and it results in the same behaviour. This change only affects so that wbits31 can be set directly to the compress and decompressobj rather than wbits -15, which is only beneficial on one platform that supports it.

Isn't it possible to add a different wbits value for gzip reading and writing that does not read/write headers trailers and simply expose the zstate crc32 using an extra member (.crc) of the decompressobj and compressobj classes? Then all the python writing code can remain the same, all the zlib functions can remain the same and much less logic would need to be implemented. It would still solve your goal of being better for PowerPC, without the massive code overhaul.

@iii-i
Copy link
Contributor Author

iii-i commented Oct 25, 2023

Isn't it possible to add a different wbits value for gzip reading and writing that does not read/write headers trailers and simply expose the zstate crc32 using an extra member (.crc) of the decompressobj and compressobj classes?

Do you mean extending zlib itself? It should be quite simple code-wise, but zlib accepts very few PRs these days, and I fear the chances that this would go through are low.

Also, we could consider creating a dependency on the DFLTCC zlib patch (it's very bad, just for the sake of argument) and say that zlib should update z_stream.adler even for raw streams when DFLTCC is in use, however, there is code out there that relies on this not happening: zlib-ng/zlib-ng#1390

@rhpvorderman
Copy link
Contributor

Do you mean extending zlib itself? It should be quite simple code-wise, but zlib accepts very few PRs these days, and I fear the chances that this would go through are low.

No, I mean that it should be possible to tell the z_stream that already a header has been written and that the compression should be gzip. In that case the header can still be written by python while the CRC and deflate can be calculated in one go. I believe this is already possible in current zlib. Unfortunately, there are no handles in zlibmodule.c to handle this. I suppose this can be done by implementing an extra wbits value in zlibmodule.c (not in zlib itself) to handle this specific use case. This will require much less code than fully exposing the zlib api in the zlibmodule.

I think this can be accomplished using Z_BLOCK:

  The flush parameter of inflate() can be Z_NO_FLUSH, Z_SYNC_FLUSH, Z_FINISH,
Z_BLOCK, or Z_TREES.  Z_SYNC_FLUSH requests that inflate() flush as much
output as possible to the output buffer.  Z_BLOCK requests that inflate()
stop if and when it gets to the next deflate block boundary.  When decoding
the zlib or gzip format, this will cause inflate() to return immediately
after the header and before the first block. 

Feed the zstream some fake header that is correct. Run inflate with Z_BLOCK and the zstream is in a state where it thinks the header is already processed. Now next_in can be set to actually process data.
The same should be able with deflate, allthough zlib.h is much less explicit about that.

Given the use case is speeding up a specific target platform, I think the PR should just do that, not also complicate the current python zlib api as a side effect. My gut feeling is that if these features were truly wanted, they would have been requested years ago. So it is better to keep it simple and just create a simple code path that covers this use case. This is just my opinion however, that of one person.

iii-i added a commit to iii-i/cpython that referenced this pull request Nov 17, 2023
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress() and GzipFile. The performance improvement varies
depending on data being compressed, but it's in the ballpark of 40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

[1] madler/zlib#410
[2] python#103478
@iii-i
Copy link
Contributor Author

iii-i commented Nov 17, 2023

Thank you for the suggestion. I've implemented the compression part of it in #112199. If it's accepted, I will send the decompression part separately.

iii-i added a commit to iii-i/cpython that referenced this pull request Nov 17, 2023
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
iii-i added a commit to iii-i/cpython that referenced this pull request Nov 17, 2023
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
iii-i added a commit to iii-i/cpython that referenced this pull request Nov 17, 2023
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
iii-i added a commit to iii-i/cpython that referenced this pull request Nov 17, 2023
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
@gpshead gpshead self-assigned this Nov 17, 2023
iii-i added a commit to iii-i/cpython that referenced this pull request Jan 29, 2024
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
iii-i added a commit to iii-i/cpython that referenced this pull request Feb 28, 2024
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants