Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lazy] Skip over incompressible data #3552

Merged

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Mar 14, 2023

Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2.

Dataset Level Compiler CSize ∆ Speed ∆
Random 5 clang-14.0.6 0.0% +704%
Random 5 gcc-12.2.0 0.0% +670%
Random 7 clang-14.0.6 0.0% +679%
Random 7 gcc-12.2.0 0.0% +657%
Random 12 clang-14.0.6 0.0% +1355%
Random 12 gcc-12.2.0 0.0% +1331%
Silesia 5 clang-14.0.6 +0.002% +0.35%
Silesia 5 gcc-12.2.0 +0.002% +2.45%
Silesia 7 clang-14.0.6 +0.001% -1.40%
Silesia 7 gcc-12.2.0 +0.007% +0.13%
Silesia 12 clang-14.0.6 +0.011% +22.70%
Silesia 12 gcc-12.2.0 +0.011% -6.68%
Enwik8 5 clang-14.0.6 0.0% -1.02%
Enwik8 5 gcc-12.2.0 0.0% +0.34%
Enwik8 7 clang-14.0.6 0.0% -1.22%
Enwik8 7 gcc-12.2.0 0.0% -0.72%
Enwik8 12 clang-14.0.6 0.0% +26.19%
Enwik8 12 gcc-12.2.0 0.0% -5.70%

The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it.

I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.

Fixes #3539.

@terrelln terrelln force-pushed the 2023-03-09-fix-row-uncompressible-speed branch 3 times, most recently from acca036 to 6ddbf8d Compare March 14, 2023 23:36
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.

| Dataset | Level | Compiler     | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random  |     5 | clang-14.0.6 |    0.0% |   +704% |
| Random  |     5 | gcc-12.2.0   |    0.0% |   +670% |
| Random  |     7 | clang-14.0.6 |    0.0% |   +679% |
| Random  |     7 | gcc-12.2.0   |    0.0% |   +657% |
| Random  |    12 | clang-14.0.6 |    0.0% |  +1355% |
| Random  |    12 | gcc-12.2.0   |    0.0% |  +1331% |
| Silesia |     5 | clang-14.0.6 | +0.002% |  +0.35% |
| Silesia |     5 | gcc-12.2.0   | +0.002% |  +2.45% |
| Silesia |     7 | clang-14.0.6 | +0.001% |  -1.40% |
| Silesia |     7 | gcc-12.2.0   | +0.007% |  +0.13% |
| Silesia |    12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia |    12 | gcc-12.2.0   | +0.011% |  -6.68% |
| Enwik8  |     5 | clang-14.0.6 |    0.0% |  -1.02% |
| Enwik8  |     5 | gcc-12.2.0   |    0.0% |  +0.34% |
| Enwik8  |     7 | clang-14.0.6 |    0.0% |  -1.22% |
| Enwik8  |     7 | gcc-12.2.0   |    0.0% |  -0.72% |
| Enwik8  |    12 | clang-14.0.6 |    0.0% | +26.19% |
| Enwik8  |    12 | gcc-12.2.0   |    0.0% |  -5.70% |

The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.

I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
@yoniko
Copy link
Contributor

yoniko commented Mar 15, 2023

Overall looks good, I wonder if you have have a benchmark for compression ratio when we have a bunch of random data followed by compressible data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Row hash high levels underperform on incompressible data
4 participants