Skip to content

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Sep 21, 2025

~10% faster huffman encoding.

Summary by CodeRabbit

  • Performance

    • Optimized low-level read/write operations to reduce overhead and improve compression throughput.
    • Removed unnecessary dependencies for leaner, faster execution.
  • Refactor

    • Streamlined internal byte-handling logic for clearer, more efficient data writes.
  • Chores

    • Internal maintenance to align implementations across safe/unsafe code paths.
  • Compatibility

    • No user-facing behavior changes; existing workflows continue to work without modification.

@coderabbitai
Copy link

coderabbitai bot commented Sep 21, 2025

📝 Walkthrough

Walkthrough

Internal little-endian helper Store64 was made generic with an explicit offset parameter. Call sites in flate were updated to pass offsets. deflate.go switched 32-bit loads to le.Load32, removing encoding/binary usage. Unsafe-enabled/disabled implementations both updated to write at an indexed position.

Changes

Cohort / File(s) Summary
flate: load/store callsites
flate/deflate.go, flate/huffman_bit_writer.go
Replace encoding/binary 32-bit reads with le.Load32. Update all le.Store64 calls to new signature with explicit offset parameter. Minor comment wording tweak.
internal/le: generic Store64 (unsafe-disabled)
internal/le/unsafe_disabled.go
Change Store64(b []byte, v uint64) to Store64[I Indexer](b []byte, i I, v uint64); write using binary.LittleEndian.PutUint64(b[i:]).
internal/le: generic Store64 (unsafe-enabled)
internal/le/unsafe_enabled.go
Same signature change as above; write via unsafe.Add(unsafe.Pointer(unsafe.SliceData(b)), i). Remove stale commented conversions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant HW as HuffmanBitWriter
  participant Buf as Output Buffer ([]byte)
  participant LE as internal/le.Store64

  rect rgb(238,245,255)
  note over HW: Writing compressed bits
  HW->>LE: Store64(Buf, offset, bits)
  note right of LE: Compute address at Buf[offset:]
  LE-->>Buf: Write 8 bytes (little-endian)
  end

  note over HW,LE: Same flow under unsafe-enabled/disabled variants
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "flate: Faster load+store" is concise and accurately summarizes the primary change — performance-focused modifications to load and store paths in the flate package that enable faster Huffman encoding. It is scoped with the "flate:" prefix, avoids noise, and clearly communicates the intent to a teammate scanning history. The title aligns with the changes described in the diff (replacement of encoding/binary usage with optimized le.Load/Store and related signature updates).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch flate-direct-writes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@klauspost klauspost changed the title flate: Faster load+st0re flate: Faster load+store Sep 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
internal/le/unsafe_disabled.go (1)

39-42: Docstring mismatch: now stores at b[i:], not b

The comment still says “Store64 will store v at b.” but the implementation writes at an offset (b[i:]). Update the comment to avoid confusion.

Apply:

-// Store64 will store v at b.
+// Store64 will store v at b[i:].
 func Store64[I Indexer](b []byte, i I, v uint64) {

Optional: consider adding a brief precondition note (len(b) >= int(i)+8) to mirror how call sites rely on headroom.

internal/le/unsafe_enabled.go (1)

49-52: Store64 now writes at an explicit offset — good API for hot paths

This matches the new call sites and avoids extra slicing/indexing. Given this uses unsafe.Add, it’s worth documenting the precondition that len(b) >= int(i)+8 and that i must be non-negative.

Add a brief precondition comment:

-// Store64 will store v at b[i:].
+// Store64 will store v at b[i:].
+// Precondition: len(b) >= int(i)+8 and i >= 0.
 func Store64[I Indexer](b []byte, i I, v uint64) {
flate/huffman_bit_writer.go (1)

437-439: Approve — no 2‑arg le.Store64 usages found; tiny follow-ups

  • Verified: all le.Store64 calls are three-argument (found at flate/huffman_bit_writer.go:438, 854, 881, 903, 928, 949, 1102, 1131).
  • Add a short comment by the bytes field declaration documenting the intentional +8 headroom (overwrite-8 / consume-6 invariant).
  • Optional: for consistency, change the call using n at flate/huffman_bit_writer.go:438 to nbytes.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e2f5d5 and 5763b10.

📒 Files selected for processing (4)
  • flate/deflate.go (3 hunks)
  • flate/huffman_bit_writer.go (8 hunks)
  • internal/le/unsafe_disabled.go (1 hunks)
  • internal/le/unsafe_enabled.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
flate/deflate.go (2)
internal/le/unsafe_disabled.go (1)
  • Load32 (20-22)
internal/le/unsafe_enabled.go (1)
  • Load32 (26-30)
internal/le/unsafe_disabled.go (2)
internal/le/unsafe_enabled.go (1)
  • Store64 (50-52)
internal/le/le.go (1)
  • Indexer (3-5)
flate/huffman_bit_writer.go (2)
internal/le/unsafe_disabled.go (1)
  • Store64 (40-42)
internal/le/unsafe_enabled.go (1)
  • Store64 (50-52)
internal/le/unsafe_enabled.go (2)
internal/le/unsafe_disabled.go (1)
  • Store64 (40-42)
internal/le/le.go (1)
  • Indexer (3-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Analyze (go)
  • GitHub Check: fuzz-other (nounsafe)
  • GitHub Check: build (1.25.x, ubuntu-latest)
  • GitHub Check: build (1.24.x, windows-latest)
  • GitHub Check: build (1.25.x, macos-latest)
  • GitHub Check: fuzz-other ("noasm,nounsafe")
  • GitHub Check: fuzz-zstd (nounsafe)
  • GitHub Check: build (1.23.x, windows-latest)
  • GitHub Check: fuzz-zstd ("noasm,nounsafe")
  • GitHub Check: fuzz-s2 (nounsafe)
  • GitHub Check: build (1.24.x, macos-latest)
  • GitHub Check: build (1.25.x, windows-latest)
  • GitHub Check: build (1.24.x, ubuntu-latest)
  • GitHub Check: build (1.23.x, macos-latest)
  • GitHub Check: fuzz-s2 ("noasm,nounsafe")
  • GitHub Check: build-special
  • GitHub Check: build (1.23.x, ubuntu-latest)
🔇 Additional comments (2)
flate/deflate.go (2)

14-15: Import shift to internal/le looks good

Dropping encoding/binary here and centralizing loads via internal/le is appropriate and aligns with the PR intent.


366-366: Swap to le.Load32 retains semantics and improves inlining

le.Load32(b, 0) is equivalent to binary.LittleEndian.Uint32(b) and should be faster with unsafe-enabled builds; caller precondition len(b) >= 4 still applies. Confirm there are no remaining binary.LittleEndian.Uint32 call sites in flate/ — the provided ripgrep check returned no matches (empty output); re-run or verify manually.

@klauspost klauspost merged commit 3c0d308 into master Sep 21, 2025
22 checks passed
@klauspost klauspost deleted the flate-direct-writes branch September 21, 2025 12:44
@phuslu
Copy link

phuslu commented Sep 23, 2025

It also could speedup gzip compression, Right? So any plan to release a tag for this performance improvement? Thanks!

@klauspost
Copy link
Owner Author

@phuslu Yes, but much less so, since entropy coding is only a (minor) part of it.

This is difference on "huffman" (last column is MiB/s)

enwik8	gzkp	-2	100000000	63193946	141	676.18
enwik8	gzkp	-2	100000000	63193946	124	768.79

This is the fastest compression...

enwik8	gzkp	1	100000000	41837916	456	208.88
enwik8	gzkp	1	100000000	41837916	449	212.16

I am reviewing code for golang/go#75532 - I will probably release once I've submitted that. For now the stdlib will not include "unsafe" code.

@phuslu
Copy link

phuslu commented Sep 23, 2025

Yes, that's true. We also are a modified k8s branch, it already integrated klauspost/compress for speedup gzip/zstd.

hantang pushed a commit to qundao/mirror-forgejo that referenced this pull request Oct 21, 2025
This PR contains the following updates:

| Package | Change | Age | Confidence |
|---|---|---|---|
| [github.com/klauspost/compress](https://github.com/klauspost/compress) | `v1.18.0` -> `v1.18.1` | [![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fklauspost%2fcompress/v1.18.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fklauspost%2fcompress/v1.18.0/v1.18.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>klauspost/compress (github.com/klauspost/compress)</summary>

### [`v1.18.1`](https://github.com/klauspost/compress/releases/tag/v1.18.1)

[Compare Source](klauspost/compress@v1.18.0...v1.18.1)

#### What's Changed

- zstd: Fix incorrect buffer size in dictionary encodes by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1059](klauspost/compress#1059)
- s2: check for cap, not len of buffer in EncodeBetter/Best by [@&#8203;vdarulis](https://github.com/vdarulis) in [#&#8203;1080](klauspost/compress#1080)
- zstd: Add simple zstd EncodeTo/DecodeTo functions by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1079](klauspost/compress#1079)
- zlib: Avoiding extra allocation in zlib.reader.Reset by [@&#8203;travelpolicy](https://github.com/travelpolicy) in [#&#8203;1086](klauspost/compress#1086)
- gzhttp: remove redundant err check in zstdReader by [@&#8203;ryanfowler](https://github.com/ryanfowler) in [#&#8203;1090](klauspost/compress#1090)
- Run modernize. Deprecate Go 1.22 by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1095](klauspost/compress#1095)
- flate: Simplify matchlen by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1101](klauspost/compress#1101)
- flate: Add examples by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1102](klauspost/compress#1102)
- flate: Use exact sizes for huffman tables by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1103](klauspost/compress#1103)
- flate: Faster load+store by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1104](klauspost/compress#1104)
- Add notice to S2 about MinLZ by [@&#8203;klauspost](https://github.com/klauspost) in [#&#8203;1065](klauspost/compress#1065)

#### New Contributors

- [@&#8203;wooffie](https://github.com/wooffie) made their first contribution in [#&#8203;1069](klauspost/compress#1069)
- [@&#8203;vdarulis](https://github.com/vdarulis) made their first contribution in [#&#8203;1080](klauspost/compress#1080)
- [@&#8203;travelpolicy](https://github.com/travelpolicy) made their first contribution in [#&#8203;1086](klauspost/compress#1086)
- [@&#8203;ryanfowler](https://github.com/ryanfowler) made their first contribution in [#&#8203;1090](klauspost/compress#1090)

**Full Changelog**: <klauspost/compress@v1.18.0...v1.18.1>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xNTIuOSIsInVwZGF0ZWRJblZlciI6IjQxLjE1Mi45IiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/9786
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants