-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations in flate decompressor and minor code improvements #869
Reduce allocations in flate decompressor and minor code improvements #869
Conversation
according to my local benchmarks, it doesn't affect throughtput anyhow, but leaves constant 1 allocation/op in decompressor benchmarks
this conversion doesn't do anything, and removing it doesn't fail any of the tests
as io.Copy is just an alias for io.CopyBuffer, but passing nil instead of the actual buffer, it is being allocated internally. So better to pass it manually
this operation does nothing, so remoed it
previous variant wasn't actually adding a big misaccuracy, but better to place it anyway just before a benchmarking code
instead of returning func ptr, calling the method directly. This also doesn't bring a lot of performance gains, but on microbenchmarks may be visible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Could you post times in a (comparable) benchmark? Just want to ensure there isn't a speed regression.
this improves readability of the code. Also added a debug print in case next step enumeric is unrecognized. However, everything will silently fail even without this print
@klauspost in |
Sure. Should be easier to pick up by a Fuzzer then. |
Full benchmarks output can be found here: https://gist.github.com/fakefloordiv/395a15a982cda3e43dc4f4833d3b2aac Here's short conclusion: Time per operation (in ns/op):
Throughtput (in mb/s):
Please note: benchmarks were also updated (replaced io.Copy with io.CopyBuffer), this may affect results, too. However, even in this case they simply became a bit more accurate |
Yes. That is why I asked for "comparable benchmarks", meaning the same code. :) I don't think the "CopyBuffer" will make any difference, since both will use the |
You are right, there's completely no difference. Here are results with io.Copy: https://gist.github.com/fakefloordiv/d32971935516f566006f28c6c000a330 |
as flate reader implements WriterTo interface, there's completely no difference what to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | patch | `v1.17.0` -> `v1.17.2` | --- ### Release Notes <details> <summary>klauspost/compress (github.com/klauspost/compress)</summary> ### [`v1.17.2`](https://togithub.com/klauspost/compress/releases/tag/v1.17.2) [Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.1...v1.17.2) #### What's Changed - zstd: Fix corrupted output in "best" by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/876](https://togithub.com/klauspost/compress/pull/876) **Full Changelog**: klauspost/compress@v1.17.1...v1.17.2 ### [`v1.17.1`](https://togithub.com/klauspost/compress/releases/tag/v1.17.1) [Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.0...v1.17.1) #### What's Changed - s2: Fix S2 "best" dictionary wrong encoding by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/871](https://togithub.com/klauspost/compress/pull/871) - flate: Reduce allocations in decompressor and minor code improvements by [@​fakefloordiv](https://togithub.com/fakefloordiv) in [https://github.com/klauspost/compress/pull/869](https://togithub.com/klauspost/compress/pull/869) - s2: Fix EstimateBlockSize on 6&7 length input by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/867](https://togithub.com/klauspost/compress/pull/867) - tests: Fuzzing Coverage Expansion by [@​viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [https://github.com/klauspost/compress/pull/866](https://togithub.com/klauspost/compress/pull/866) - tests: Set FSE decompress fuzzer max limit by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/868](https://togithub.com/klauspost/compress/pull/868) - tests: Fuzzing Coverage Expansion ([#​2](https://togithub.com/klauspost/compress/issues/2)) by [@​viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [https://github.com/klauspost/compress/pull/870](https://togithub.com/klauspost/compress/pull/870) #### New Contributors - [@​viktoriia-lsg](https://togithub.com/viktoriia-lsg) made their first contribution in [https://github.com/klauspost/compress/pull/866](https://togithub.com/klauspost/compress/pull/866) - [@​fakefloordiv](https://togithub.com/fakefloordiv) made their first contribution in [https://github.com/klauspost/compress/pull/869](https://togithub.com/klauspost/compress/pull/869) **Full Changelog**: klauspost/compress@v1.17.0...v1.17.1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4xMDkuNCIsInVwZGF0ZWRJblZlciI6IjM2LjEwOS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
decompressor.step
is now enumeric instead of function pointer. This reduces ALL allocations in decompressor benchmarks to constant 1 (previously this number differed for each case). However, it doesn't bring much gains in terms of throughtput.huffmanBlockDecoder()
, generated by _gen/gen_inflate.go, also used to return a function pointer. Now calling the method directly inside of it. Also doesn't bring much gains, but in microbenchmarks may be visible