Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUF] Improve Huffman encoding speed #2733

Merged
merged 2 commits into from
Aug 3, 2021
Merged

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Jul 27, 2021

Improve Huffman encoding speed by 20% for gcc and 10% for clang.

  • The compress benchmark measures total compression speed.
  • The compress_literals benchmark compresses the file at the given level. It then extracts the literals from the compressed frame. Then it measures the speed of literal compression on each block of extracted literals.
  • Ratio is exactly the same before/after.
Compiler Benchmark Config Dataset Ratio Speed MB/s (dev) Speed MB/s (huf-cspeed) Speed MB/s (huf-cspeed - dev)
clang compress level_1 enwik7 2.43 280.38 282.75 0.8%
clang compress level_1 silesia 2.88 374.42 377.80 0.9%
clang compress_literals level_1 enwik7 1.49 829.08 916.09 10.5%
clang compress_literals level_1 silesia 1.28 816.08 904.53 10.8%
clang compress_literals level_7 enwik7 1.29 533.26 552.75 3.7%
clang compress_literals level_7 silesia 1.11 714.79 774.91 8.4%
gcc compress level_1 enwik7 2.43 257.76 269.63 4.6%
gcc compress level_1 silesia 2.88 347.37 361.45 4.1%
gcc compress_literals level_1 enwik7 1.49 761.37 913.10 19.9%
gcc compress_literals level_1 silesia 1.28 755.00 903.85 19.7%
gcc compress_literals level_7 enwik7 1.29 502.57 551.76 9.8%
gcc compress_literals level_7 silesia 1.11 676.12 778.51 15.1%
  • x86-64 without BMI2 is also a win across the board (10-20%).
  • x86 (32-bit) is a 10-20% win depending on compiler.

I also added a new Huffman round trip fuzzer in the 2nd commit. It found two minor bugs in Huffman compression that cannot be triggered in zstd. I've run it for a few million iterations and it looks good so far.

Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
@terrelln
Copy link
Contributor Author

Tests are passing now (except for the 6 that are killed by GitHub actions)

}
return !bad;
}

size_t HUF_compressBound(size_t size) { return HUF_COMPRESSBOUND(size); }

/** HUF_CStream_t:
* Huffman uses its own BIT_CStream_t implementation.
* There are three major differences from BIT_CStream_t:
Copy link
Contributor

@Cyan4973 Cyan4973 Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Great explanation

@terrelln terrelln force-pushed the huf-cspeed branch 3 times, most recently from e0ef260 to d8a0797 Compare August 3, 2021 15:09
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
@terrelln terrelln merged commit 6ee70ba into facebook:dev Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants