Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-95534: Improve gzip reading speed by 10% #97664

Merged
merged 33 commits into from
Oct 17, 2022
Merged

Conversation

rhpvorderman
Copy link
Contributor

@rhpvorderman rhpvorderman commented Sep 30, 2022

For motivation check the github issue. #95534
Performance figures before (the used gzip file is the tar source distribution from github):

./python -m pyperf timeit -s "import gzip; g=gzip.open('cpython-3.10.7.tar.gz', 'rb'); it=iter(lambda:g.read(128*1024), b'');" "for _ in it: pass"
.....................
Mean +- std dev: 301 ms +- 2 ms

after:

./python -m pyperf timeit -s "import gzip; g=gzip.open('cpython-3.10.7.tar.gz', 'rb'); it=iter(lambda:g.read(128*1024), b'');" "for _ in it: pass"
.....................
Mean +- std dev: 270 ms +- 1 ms

Performance tests where run with all optimizations enabled. I found that --enable-optimizations did not influence the result. So it can be verified without all the PGO stuff.

Change summary:

  • There is now a gzip.READ_BUFFER_SIZE constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks.
  • a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this.
  • The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case.
  • GzipFile.read has been optimized. There is no longer a unconsumed_tail member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. _add_read_data has been inlined, as it was just two calls.

EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks.
Results:
before:

Benchmark 1: cat cpython-3.10.7.tar.gz | ./python -m gzip -d > /dev/null
  Time (mean ± σ):     389.1 ms ±  12.0 ms    [User: 372.7 ms, System: 19.9 ms]
  Range (min … max):   370.9 ms … 410.2 ms    20 runs

After:

Benchmark 1: cat cpython-3.10.7.tar.gz | ./python -m gzip -d > /dev/null
  Time (mean ± σ):     320.5 ms ±  12.1 ms    [User: 306.4 ms, System: 17.6 ms]
  Range (min … max):   300.0 ms … 339.1 ms    20 runs

For comparison: pigz, the fastest zlib utilizing gzip decompressor, on a single thread. (igzip is faster, but utilizes ISA-L).

Benchmark 1: cat cpython-3.10.7.tar.gz | pigz -p 1 -d > /dev/null
  Time (mean ± σ):     293.8 ms ±   8.4 ms    [User: 288.5 ms, System: 17.0 ms]
  Range (min … max):   277.5 ms … 302.7 ms    20 runs

If we take the pure C pigz program as baseline, the amount of python overhead is reduced drastically from 30% to 10%.

@rhpvorderman
Copy link
Contributor Author

Once all the reviewing is done, I will squash the commits.

@Fidget-Spinner
Copy link
Member

Sorry I'm not a gzip expert so I can't review this. However I'd just like to say that you don't need to squash the commits, the core dev will squash them for you when merging!

Thanks for the thorough research and a solution to speeding up gzip!

@rhpvorderman
Copy link
Contributor Author

However I'd just like to say that you don't need to squash the commits, the core dev will squash them for you when merging!

Thank you. It is good to know that "fix lock stuff" and "reorder code" will not get added to the list of commit messages.

Sorry I'm not a gzip expert so I can't review this.

Can you help me with the adress sanitizer? It is not happy about the way I added a new heap type, even though I seem to have done exactly the same thing as for the other heap types in the module. I am at a loss here. It seems to crash in state->ZlibDecompressorType = (PyTypeObject *)PyType_FromModuleAndSpec( . In python-isal I simply use static types as it is much easier, but since both the zlib and bz2 module are all heap types now I figure there must be some important advantage.

@Fidget-Spinner Fidget-Spinner added the performance Performance or resource usage label Sep 30, 2022
Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of my comments are fairly minor, overall I think this code is well written.

Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
Modules/zlibmodule.c Outdated Show resolved Hide resolved
"\n"
" wbits = 15\n"
" zdict\n"
" The predefined compression dictionary. This must be the same\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe what type this must be (usually bytes?) so that nobody is confused thinking it is a Python dict.

Consider using O! in your format below to have the API do a type check for you rather than failing later on when its use is attempted via the buffer API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with O! is that it cannot correctly accept bytes and bytearray simultaneously. Or is there a better solution for this?

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Oct 2, 2022

most of my comments are fairly minor, overall I think this code is well written.

As stated, most of the code is copied from the Cpython codebase is elsewhere, so I do not want to take credits for it. Thank you for the review. I have updated the code according to your comments.

EDIT: I did notice a significant oversight when revisiting: I have not added any testing code for _ZlibDecompressor. This can be very easily be arranged (Python-isal already has a full test suite). On the other hand _ZlibDecompressor's only use case is usage in the gzip module. So it can also be considered sufficiently tested using the current test suite.

EDIT: Added the test suite (adapted from test_bz2.py).

@rhpvorderman
Copy link
Contributor Author

As a heads up, I see this has the "awaiting changes" label. The changes have already been made 12 days ago.

@Fidget-Spinner
Copy link
Member

@rhpvorderman you can re-trigger the bot. See the comment here #97664 (comment).

@rhpvorderman
Copy link
Contributor Author

Whoop sorry, 🤦‍♂️ . Should have read that better. Usually I contribute to much smaller projects where such things are commonly not used.

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from gpshead October 14, 2022 11:17
@gpshead gpshead merged commit eae7dad into python:main Oct 17, 2022
@rhpvorderman rhpvorderman deleted the gh-95534 branch October 17, 2022 05:20
carljm added a commit to carljm/cpython that referenced this pull request Oct 17, 2022
* main: (31 commits)
  pythongh-95913: Move subinterpreter exper removal to 3.11 WhatsNew (pythonGH-98345)
  pythongh-95914: Add What's New item describing PEP 670 changes (python#98315)
  Remove unused arrange_output_buffer function from zlibmodule.c. (pythonGH-98358)
  pythongh-98174: Handle EPROTOTYPE under macOS in test_sendfile_fallback_close_peer_in_the_middle_of_receiving (python#98316)
  pythonGH-98327: Reduce scope of catch_warnings() in _make_subprocess_transport (python#98333)
  pythongh-93691: Compiler's code-gen passes location around instead of holding it on the global compiler state (pythonGH-98001)
  pythongh-97669: Create Tools/build/ directory (python#97963)
  pythongh-95534: Improve gzip reading speed by 10% (python#97664)
  pythongh-95913: Forward-port int/str security change to 3.11 What's New in main (python#98344)
  pythonGH-91415: Mention alphabetical sort ordering in the Sorting HOWTO (pythonGH-98336)
  pythongh-97930: Merge with importlib_resources 5.9 (pythonGH-97929)
  pythongh-85525: Remove extra row in doc (python#98337)
  pythongh-85299: Add note warning about entry point guard for asyncio example (python#93457)
  pythongh-97527: IDLE - fix buggy macosx patch (python#98313)
  pythongh-98307: Add docstring and documentation for SysLogHandler.createSocket (pythonGH-98319)
  pythongh-94808: Cover `PyFunction_GetCode`, `PyFunction_GetGlobals`, `PyFunction_GetModule` (python#98158)
  pythonGH-94597: Deprecate child watcher getters and setters (python#98215)
  pythongh-98254: Include stdlib module names in error messages for NameErrors (python#98255)
  Improve speed. Reduce auxiliary memory to 16.6% of the main array. (pythonGH-98294)
  [doc] Update logging cookbook with an example of custom handling of levels. (pythonGH-98290)
  ...
qkaiser added a commit to onekey-sec/unblob that referenced this pull request Oct 13, 2023
…thon.

The gzip._GzipReader we're inheriting from had some changes to stay up
to date with changes in zlib library and to perform some optimization.

Specifically:

- GzipFile.read has been optimized. There is no longer a unconsumed_tail
  member to write back to padded file. This is instead handled by the
  ZlibDecompressor itself, which has an internal buffer.
- _add_read_data has been inlined, as it was just two calls.

We've adapted our own code to reflect these changes.

More info: python/cpython#97664
qkaiser added a commit to onekey-sec/unblob that referenced this pull request Oct 13, 2023
…thon.

The gzip._GzipReader we're inheriting from had some changes to stay up
to date with changes in zlib library and to perform some optimization.

Specifically:

- GzipFile.read has been optimized. There is no longer a unconsumed_tail
  member to write back to padded file. This is instead handled by the
  ZlibDecompressor itself, which has an internal buffer.
- _add_read_data has been inlined, as it was just two calls.

We've adapted our own code to reflect these changes.

More info: python/cpython#97664
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage sprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants