-
Notifications
You must be signed in to change notification settings - Fork 352
flate: Faster load+store #1104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flate: Faster load+store #1104
Conversation
📝 WalkthroughWalkthroughInternal little-endian helper Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant HW as HuffmanBitWriter
participant Buf as Output Buffer ([]byte)
participant LE as internal/le.Store64
rect rgb(238,245,255)
note over HW: Writing compressed bits
HW->>LE: Store64(Buf, offset, bits)
note right of LE: Compute address at Buf[offset:]
LE-->>Buf: Write 8 bytes (little-endian)
end
note over HW,LE: Same flow under unsafe-enabled/disabled variants
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
internal/le/unsafe_disabled.go (1)
39-42: Docstring mismatch: now stores at b[i:], not bThe comment still says “Store64 will store v at b.” but the implementation writes at an offset (b[i:]). Update the comment to avoid confusion.
Apply:
-// Store64 will store v at b. +// Store64 will store v at b[i:]. func Store64[I Indexer](b []byte, i I, v uint64) {Optional: consider adding a brief precondition note (len(b) >= int(i)+8) to mirror how call sites rely on headroom.
internal/le/unsafe_enabled.go (1)
49-52: Store64 now writes at an explicit offset — good API for hot pathsThis matches the new call sites and avoids extra slicing/indexing. Given this uses unsafe.Add, it’s worth documenting the precondition that len(b) >= int(i)+8 and that i must be non-negative.
Add a brief precondition comment:
-// Store64 will store v at b[i:]. +// Store64 will store v at b[i:]. +// Precondition: len(b) >= int(i)+8 and i >= 0. func Store64[I Indexer](b []byte, i I, v uint64) {flate/huffman_bit_writer.go (1)
437-439: Approve — no 2‑arg le.Store64 usages found; tiny follow-ups
- Verified: all le.Store64 calls are three-argument (found at flate/huffman_bit_writer.go:438, 854, 881, 903, 928, 949, 1102, 1131).
- Add a short comment by the bytes field declaration documenting the intentional +8 headroom (overwrite-8 / consume-6 invariant).
- Optional: for consistency, change the call using
nat flate/huffman_bit_writer.go:438 tonbytes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
flate/deflate.go(3 hunks)flate/huffman_bit_writer.go(8 hunks)internal/le/unsafe_disabled.go(1 hunks)internal/le/unsafe_enabled.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
flate/deflate.go (2)
internal/le/unsafe_disabled.go (1)
Load32(20-22)internal/le/unsafe_enabled.go (1)
Load32(26-30)
internal/le/unsafe_disabled.go (2)
internal/le/unsafe_enabled.go (1)
Store64(50-52)internal/le/le.go (1)
Indexer(3-5)
flate/huffman_bit_writer.go (2)
internal/le/unsafe_disabled.go (1)
Store64(40-42)internal/le/unsafe_enabled.go (1)
Store64(50-52)
internal/le/unsafe_enabled.go (2)
internal/le/unsafe_disabled.go (1)
Store64(40-42)internal/le/le.go (1)
Indexer(3-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: Analyze (go)
- GitHub Check: fuzz-other (nounsafe)
- GitHub Check: build (1.25.x, ubuntu-latest)
- GitHub Check: build (1.24.x, windows-latest)
- GitHub Check: build (1.25.x, macos-latest)
- GitHub Check: fuzz-other ("noasm,nounsafe")
- GitHub Check: fuzz-zstd (nounsafe)
- GitHub Check: build (1.23.x, windows-latest)
- GitHub Check: fuzz-zstd ("noasm,nounsafe")
- GitHub Check: fuzz-s2 (nounsafe)
- GitHub Check: build (1.24.x, macos-latest)
- GitHub Check: build (1.25.x, windows-latest)
- GitHub Check: build (1.24.x, ubuntu-latest)
- GitHub Check: build (1.23.x, macos-latest)
- GitHub Check: fuzz-s2 ("noasm,nounsafe")
- GitHub Check: build-special
- GitHub Check: build (1.23.x, ubuntu-latest)
🔇 Additional comments (2)
flate/deflate.go (2)
14-15: Import shift to internal/le looks goodDropping encoding/binary here and centralizing loads via internal/le is appropriate and aligns with the PR intent.
366-366: Swap to le.Load32 retains semantics and improves inliningle.Load32(b, 0) is equivalent to binary.LittleEndian.Uint32(b) and should be faster with unsafe-enabled builds; caller precondition len(b) >= 4 still applies. Confirm there are no remaining binary.LittleEndian.Uint32 call sites in flate/ — the provided ripgrep check returned no matches (empty output); re-run or verify manually.
|
It also could speedup gzip compression, Right? So any plan to release a tag for this performance improvement? Thanks! |
|
@phuslu Yes, but much less so, since entropy coding is only a (minor) part of it. This is difference on "huffman" (last column is MiB/s) This is the fastest compression... I am reviewing code for golang/go#75532 - I will probably release once I've submitted that. For now the stdlib will not include "unsafe" code. |
|
Yes, that's true. We also are a modified k8s branch, it already integrated klauspost/compress for speedup gzip/zstd. |
This PR contains the following updates: | Package | Change | Age | Confidence | |---|---|---|---| | [github.com/klauspost/compress](https://github.com/klauspost/compress) | `v1.18.0` -> `v1.18.1` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>klauspost/compress (github.com/klauspost/compress)</summary> ### [`v1.18.1`](https://github.com/klauspost/compress/releases/tag/v1.18.1) [Compare Source](klauspost/compress@v1.18.0...v1.18.1) #### What's Changed - zstd: Fix incorrect buffer size in dictionary encodes by [@​klauspost](https://github.com/klauspost) in [#​1059](klauspost/compress#1059) - s2: check for cap, not len of buffer in EncodeBetter/Best by [@​vdarulis](https://github.com/vdarulis) in [#​1080](klauspost/compress#1080) - zstd: Add simple zstd EncodeTo/DecodeTo functions by [@​klauspost](https://github.com/klauspost) in [#​1079](klauspost/compress#1079) - zlib: Avoiding extra allocation in zlib.reader.Reset by [@​travelpolicy](https://github.com/travelpolicy) in [#​1086](klauspost/compress#1086) - gzhttp: remove redundant err check in zstdReader by [@​ryanfowler](https://github.com/ryanfowler) in [#​1090](klauspost/compress#1090) - Run modernize. Deprecate Go 1.22 by [@​klauspost](https://github.com/klauspost) in [#​1095](klauspost/compress#1095) - flate: Simplify matchlen by [@​klauspost](https://github.com/klauspost) in [#​1101](klauspost/compress#1101) - flate: Add examples by [@​klauspost](https://github.com/klauspost) in [#​1102](klauspost/compress#1102) - flate: Use exact sizes for huffman tables by [@​klauspost](https://github.com/klauspost) in [#​1103](klauspost/compress#1103) - flate: Faster load+store by [@​klauspost](https://github.com/klauspost) in [#​1104](klauspost/compress#1104) - Add notice to S2 about MinLZ by [@​klauspost](https://github.com/klauspost) in [#​1065](klauspost/compress#1065) #### New Contributors - [@​wooffie](https://github.com/wooffie) made their first contribution in [#​1069](klauspost/compress#1069) - [@​vdarulis](https://github.com/vdarulis) made their first contribution in [#​1080](klauspost/compress#1080) - [@​travelpolicy](https://github.com/travelpolicy) made their first contribution in [#​1086](klauspost/compress#1086) - [@​ryanfowler](https://github.com/ryanfowler) made their first contribution in [#​1090](klauspost/compress#1090) **Full Changelog**: <klauspost/compress@v1.18.0...v1.18.1> </details> --- ### Configuration 📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xNTIuOSIsInVwZGF0ZWRJblZlciI6IjQxLjE1Mi45IiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/9786 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org> Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
~10% faster huffman encoding.
Summary by CodeRabbit
Performance
Refactor
Chores
Compatibility