-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New algorithms for the long distance matcher #2483
Conversation
The fuzzer CI found this bug.
Thanks @mpu ! |
I see there are some remaining minor warnings, notably a minor silent cast warning, but assuming it gets fixed, this PR looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I would just like to move the large arrays out of the stack frame.
We really care about stack space for kernel environments. But, even outside the kernel zstd runs in stack-constrained environments like fibers, and threads which users have configured to have smaller stacks.
lib/compress/zstd_ldm.c
Outdated
BYTE const* const base = ldmState->window.base; | ||
BYTE const* const istart = ip; | ||
ldmRollingHashState_t hashState; | ||
size_t splits[LDM_LOOKAHEAD_SPLITS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved into the ldmState_t
? This is using 512B of stack space.
lib/compress/zstd_ldm.c
Outdated
size_t splits[LDM_LOOKAHEAD_SPLITS]; | ||
struct { | ||
BYTE const* split; | ||
U32 hash; | ||
U32 checksum; | ||
ldmEntry_t* bucket; | ||
} candidates[LDM_LOOKAHEAD_SPLITS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: Can both of these be moved into the LDM state as well? This is using 2KB of stack space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
FYI, for completeness I re-run the entire evaluation on the latest commit and got results pretty much identical to the ones in the PR description. |
Thanks for this excellent speed improvement @mpu ! |
This PR proposes to replace the hashing algorithm used in the long distance matcher (LDM). The replacement proposed is a combination of gear hash and xxhash that offers significant speedups at low compression levels with no measurable regressions on compression rates.
Overview
The original rolling hash algorithm was used for two purposes: first, find split points in the input, and second, compute a checksum over a small window of data. In the new code these two objectives are realized by two different faster algorithms: split points are determined using a gear hash algorithm, and checksums are computed with xxhash. This combination is motivated by the fact that gear hash is a fine content-defined chunking (CDC) algorithm but a very poor checksumming algorithm, and xxhash is a fast checksumming algorithm unsuitable for CDC.
Even greater speed might be achieved by moving to a threaded gear hash, but this requires using recent SIMD instructions (AVX2) and dynamic dispatch based on CPUID. There is currently no prior for such techniques in zstd and I did not have the time budget to pursue this engineering task.
Code
The changes I propose make use of a couple low-level performance tricks:
The tricks are listed roughly in order of impact.
To help the review I would like to point out that
ip
is nowminMatchLength
bytes ahead of where it was in the previous version of the code.The gear hash constants were generated by my computer's pseudo-random number generator
/dev/urandom
.Benchmarks
The baseline I used is the current dev branch (f5b3f64). Each configuration is run 5 times and the best timing is used. Deflate deltas are computed as the difference between the compression ratios in percents (more is better).
--long=27 -1
--long=27 -1
--long=27 -1
--long=27 -1
--long=27 -3
--long=27 -3
--long=27 -3
--long=27 -3
--long=27 -8
--long=27 -8
--long=27 -8
--long=27 -8
--long=30 -1
--long=30 -1
--long=30 -1
--long=30 -1
--long=30 -3
--long=30 -3
--long=30 -3
--long=30 -3
--long=30 -8
--long=30 -8
--long=30 -8
--long=30 -8