New algorithms for the long distance matcher #2483

mpu · 2021-02-04T08:34:22Z

This PR proposes to replace the hashing algorithm used in the long distance matcher (LDM). The replacement proposed is a combination of gear hash and xxhash that offers significant speedups at low compression levels with no measurable regressions on compression rates.

Overview

The original rolling hash algorithm was used for two purposes: first, find split points in the input, and second, compute a checksum over a small window of data. In the new code these two objectives are realized by two different faster algorithms: split points are determined using a gear hash algorithm, and checksums are computed with xxhash. This combination is motivated by the fact that gear hash is a fine content-defined chunking (CDC) algorithm but a very poor checksumming algorithm, and xxhash is a fast checksumming algorithm unsuitable for CDC.

Even greater speed might be achieved by moving to a threaded gear hash, but this requires using recent SIMD instructions (AVX2) and dynamic dispatch based on CPUID. There is currently no prior for such techniques in zstd and I did not have the time budget to pursue this engineering task.

Code

The changes I propose make use of a couple low-level performance tricks:

Test for zero bits in the rolling checksum instead of ones
Mark split criterion branches as UNLIKELY
Unroll the gear hash inner loop to reduce the loop overhead
Process several split points at once and pefetch the corresponding hash table buckets

The tricks are listed roughly in order of impact.

To help the review I would like to point out that ip is now minMatchLength bytes ahead of where it was in the previous version of the code.

The gear hash constants were generated by my computer's pseudo-random number generator /dev/urandom.

Benchmarks

The baseline I used is the current dev branch (f5b3f64). Each configuration is run 5 times and the best timing is used. Deflate deltas are computed as the difference between the compression ratios in percents (more is better).

FILE	CONFIG	DEFLATE Δ	TIME Δ
hhvm-rt.tar	`--long=27 -1`	+00.02	-33.76%
l1m.tar	`--long=27 -1`	+00.00	-34.03%
l1y.tar	`--long=27 -1`	+00.00	-34.12%
l5.tar	`--long=27 -1`	+00.00	-34.38%
hhvm-rt.tar	`--long=27 -3`	+00.02	-26.97%
l1m.tar	`--long=27 -3`	+00.01	-25.54%
l1y.tar	`--long=27 -3`	+00.01	-23.93%
l5.tar	`--long=27 -3`	+00.01	-23.73%
hhvm-rt.tar	`--long=27 -8`	+00.01	-08.00%
l1m.tar	`--long=27 -8`	+00.00	-10.00%
l1y.tar	`--long=27 -8`	+00.00	-09.78%
l5.tar	`--long=27 -8`	+00.00	-08.48%
hhvm-rt.tar	`--long=30 -1`	+00.01	-34.81%
l1m.tar	`--long=30 -1`	-00.01	-41.10%
l1y.tar	`--long=30 -1`	-00.03	-39.76%
l5.tar	`--long=30 -1`	+00.01	-34.18%
hhvm-rt.tar	`--long=30 -3`	+00.01	-26.68%
l1m.tar	`--long=30 -3`	-00.01	-33.51%
l1y.tar	`--long=30 -3`	-00.02	-32.22%
l5.tar	`--long=30 -3`	+00.00	-22.48%
hhvm-rt.tar	`--long=30 -8`	+00.00	-08.26%
l1m.tar	`--long=30 -8`	-00.01	-16.05%
l1y.tar	`--long=30 -8`	-00.03	-14.00%
l5.tar	`--long=30 -8`	+00.01	-07.37%

The fuzzer CI found this bug.

Cyan4973 · 2021-02-04T20:20:09Z

Thanks @mpu !
These are impressive results !

Cyan4973 · 2021-02-04T23:26:51Z

I see there are some remaining minor warnings, notably a minor silent cast warning, but assuming it gets fixed, this PR looks good to me.

terrelln

Looks good to me, I would just like to move the large arrays out of the stack frame.

We really care about stack space for kernel environments. But, even outside the kernel zstd runs in stack-constrained environments like fibers, and threads which users have configured to have smaller stacks.

terrelln · 2021-02-08T21:33:09Z

lib/compress/zstd_ldm.c

+    BYTE const* const base = ldmState->window.base;
+    BYTE const* const istart = ip;
+    ldmRollingHashState_t hashState;
+    size_t splits[LDM_LOOKAHEAD_SPLITS];


Can this be moved into the ldmState_t? This is using 512B of stack space.

terrelln · 2021-02-08T21:35:10Z

lib/compress/zstd_ldm.c

+    size_t splits[LDM_LOOKAHEAD_SPLITS];
+    struct {
+        BYTE const* split;
+        U32 hash;
+        U32 checksum;
+        ldmEntry_t* bucket;
+    } candidates[LDM_LOOKAHEAD_SPLITS];


Same here: Can both of these be moved into the LDM state as well? This is using 2KB of stack space.

terrelln

Looks good to me!

mpu · 2021-02-11T12:49:06Z

FYI, for completeness I re-run the entire evaluation on the latest commit and got results pretty much identical to the ones in the PR description.

Cyan4973 · 2021-02-11T16:38:14Z

Thanks for this excellent speed improvement @mpu !

new core ldm algorithm

9f327c0

facebook-github-bot added the CLA Signed label Feb 4, 2021

deal safely with short inputs in ZSTD_ldm_generateSequences

874a590

The fuzzer CI found this bug.

Cyan4973 approved these changes Feb 8, 2021

View reviewed changes

fix some compiler warnings

e2ad174

mpu force-pushed the ldmgear branch from 1a2a887 to e2ad174 Compare February 8, 2021 19:19

terrelln requested changes Feb 8, 2021

View reviewed changes

relocate large arrays from the stack to ldmState_t

552efca

terrelln approved these changes Feb 10, 2021

View reviewed changes

Cyan4973 merged commit 8884cb8 into facebook:dev Feb 11, 2021

felixhandte mentioned this pull request Mar 2, 2021

Release ZStandard v1.4.9 #2515

Merged

LaurentBonnaud mentioned this pull request Mar 3, 2021

Speeding up deduplication borgbackup/borg#5721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New algorithms for the long distance matcher #2483

New algorithms for the long distance matcher #2483

mpu commented Feb 4, 2021

Cyan4973 commented Feb 4, 2021

Cyan4973 commented Feb 4, 2021

terrelln left a comment

terrelln Feb 8, 2021

terrelln Feb 8, 2021

terrelln left a comment

mpu commented Feb 11, 2021

Cyan4973 commented Feb 11, 2021

New algorithms for the long distance matcher #2483

New algorithms for the long distance matcher #2483

Conversation

mpu commented Feb 4, 2021

Overview

Code

Benchmarks

Cyan4973 commented Feb 4, 2021

Cyan4973 commented Feb 4, 2021

terrelln left a comment

Choose a reason for hiding this comment

terrelln Feb 8, 2021

Choose a reason for hiding this comment

terrelln Feb 8, 2021

Choose a reason for hiding this comment

terrelln left a comment

Choose a reason for hiding this comment

mpu commented Feb 11, 2021

Cyan4973 commented Feb 11, 2021