Simple performance improvements for ldm #2464

mpu · 2021-01-11T10:31:49Z

Description

This PR contains some simple performance improvements to the ldm matcher, together with some other cosmetic changes that I made "en passant" (variable renames, typo fixes, ...).

The changes to the ldm matcher only affect performance and the new code should produce precisely the same output as the baseline. Two changes helped improve the performance relatively uniformly:

Instead of computing the tag with a branch and a fair amount of arithmetic operations in the hot loop of ZSTD_ldm_generateSequences we now compute a tag mask once before entering the loop and use it to check if the rolling hash satisfies the "tag condition". The tag mask is computed in such a way that the new tag condition is equivalent to the original one.
When the tag condition is met but we did not find a match in the ldm hash table (it is the common case), we insert a new entry in the hash table using ZSTD_ldm_insertEntry instead of ZSTD_ldm_makeEntryAndInsertByTag which performs some redundant checks.

Benchmarks

The following files were used to assess the impact of the change:

hhvm-rt.tar (5.9G) - an image of production version of HHVM
l1m.tar (2.0G) - two snapshots of the linux kernel source code taken a month apart
l1y.tar (1.9G) - two snapshots of the linux kernel source code taken a year apart
l5.tar (544M) - the first 544M of an archive containing the linux kernel source code

The results are compiled in the table below. All times are computed as the best over a set of 5 runs.

FILE	CONFIG	DEFLATE Δ	TIME Δ
data/hhvm-rt.tar	`--long=27 -1`	=	- 07.69%
data/l1m.tar	`--long=27 -1`	=	- 07.15%
data/l1y.tar	`--long=27 -1`	=	- 05.76%
data/l5.tar	`--long=27 -1`	=	- 01.63%
data/hhvm-rt.tar	`--long=30 -1`	=	- 05.23%
data/l1m.tar	`--long=30 -1`	=	- 05.97%
data/l1y.tar	`--long=30 -1`	=	- 07.31%
data/l5.tar	`--long=30 -1`	=	- 04.13%
data/hhvm-rt.tar	`--long=27 -8`	=	- 07.26%
data/l1m.tar	`--long=27 -8`	=	- 10.21%
data/l1y.tar	`--long=27 -8`	=	- 05.42%
data/l5.tar	`--long=27 -8`	=	- 06.97%
data/hhvm-rt.tar	`--long=30 -8`	=	- 04.97%
data/l1m.tar	`--long=30 -8`	=	- 06.05%
data/l1y.tar	`--long=30 -8`	=	- 09.55%
data/l5.tar	`--long=30 -8`	=	- 01.70%
data/hhvm-rt.tar	`--long=27 -16`	=	+ 01.20%
data/l1m.tar	`--long=27 -16`	=	- 00.55%
data/l1y.tar	`--long=27 -16`	=	+ 06.68%
data/l5.tar	`--long=27 -16`	=	+ 01.80%
data/hhvm-rt.tar	`--long=30 -16`	=	+ 02.35%
data/l1m.tar	`--long=30 -16`	=	+ 02.68%
data/l1y.tar	`--long=30 -16`	=	+ 01.26%
data/l5.tar	`--long=30 -16`	=	+ 00.35%
data/hhvm-rt.tar	`--long=27 -19`	=	- 00.09%
data/l1m.tar	`--long=27 -19`	=	+ 01.43%
data/l1y.tar	`--long=27 -19`	=	- 00.52%
data/l5.tar	`--long=27 -19`	=	+ 02.94%
data/hhvm-rt.tar	`--long=30 -19`	=	+ 02.88%
data/l1m.tar	`--long=30 -19`	=	- 00.51%
data/l1y.tar	`--long=30 -19`	=	+ 01.14%
data/l5.tar	`--long=30 -19`	=	- 02.13%

senhuang42 · 2021-01-11T13:37:23Z

lib/compress/zstd_ldm.c

+    unsigned const offset = *pOffset;
+
+    *(ZSTD_ldm_getBucket(ldmState, hash, ldmParams) + offset) = entry;
+    *pOffset = (offset + 1) & (((U32)1 << ldmParams.bucketSizeLog) - 1);


nit: I believe the right hand side needs an explicit cast to appease the warning

Thanks for the hint. This should be fixed by my latest commit.

senhuang42 · 2021-01-11T13:58:50Z

Thanks for the improvements, this generally looks pretty good to me!

The circle CI failure can probably get fixed with a rebase on top of the latest dev branch.

terrelln · 2021-01-11T17:07:46Z

I've measured a 2.5% - 4.5% speed improvement on several different files, which is in line with the measurements you've provided.

There are remaining calls to ZSTD_ldm_makeEntryAndInsertByTag() when we do find a match, and in ZSTD_ldm_fillLdmHashTable(). Do you think that it would speed up LDM to replace it with ZSTD_ldm_insertEntry() in ZSTD_ldm_fillLdmHashTable()?

terrelln · 2021-01-11T17:07:32Z

lib/compress/zstd_ldm.c

+            ldmEntry_t entry;
+
+            entry.offset = curr;
+            entry.checksum = checksum;
+            ZSTD_ldm_insertEntry(ldmState, hash, entry, *params);


We also insert when we find an entry right? Could we use ZSTD_ldm_insertEntry() in that case too?

Yes, I thought it'd not bring much of a win but for the sake of consistency I updated it.

mpu · 2021-01-20T09:36:02Z

Thanks for your comments! I rebased on top of a recent dev branch and added one commit to address the feedback.
The performance characteristics should not have changed much. Nonetheless, I will run a final benchmark to make sure that is the case.

Do you think that it would speed up LDM to replace it with ZSTD_ldm_insertEntry() in ZSTD_ldm_fillLdmHashTable()?

In fillLdmHashTable the calls to makeEntryAndInsertByTag are not doing any superfluous work and likely get inlined, so I do not think we would gain anything by removing them.

terrelln

This looks great! I just have one minor nit about a renaming and it looks good to me.

terrelln · 2021-01-20T16:50:59Z

lib/compress/zstd_ldm.c

@@ -299,10 +295,13 @@ static size_t ZSTD_ldm_generateSequences_internal(
    U64 rollingHash = 0;

    while (ip <= ilimit) {
+        U32 const currentOffset = (U32)(ip - base);


nit: this is not an offset it is an "index". Either keep curr or use currentIndex (note: current cannot be used because it conflicts with a macro in the Linux kernel).

I know that it is called offset in the ldmEntry_t struct, but that is a bad name. It should be renamed to index. If you want to include that renaming in this diff or a followup that would be great.

mpu · 2021-01-22T16:10:58Z

I renamed the variable to currentIndex. I also quickly re-run parts of the benchmarks on a relatively small but stable machine and got the following encouraging results.

FILE	CONFIG	DEFLATE Δ	TIME Δ
data/l5.tar	`--long=27 -1`	=	- 07.23%
data/l5.tar	`--long=27 -8`	=	- 01.65%
data/l5.tar	`--long=27 -16`	=	- 00.80%
data/l5.tar	`--long=27 -19`	=	+ 00.12%
data/l5.tar	`--long=30 -1`	=	- 09.20%
data/l5.tar	`--long=30 -8`	=	+ 00.24%
data/l5.tar	`--long=30 -16`	=	- 00.78%
data/l5.tar	`--long=30 -19`	=	+ 00.17%

terrelln

Awesome, thanks for the PR! Looking forward to the next one :)

Cyan4973 · 2021-01-22T18:49:52Z

Thanks @mpu !

I tested your PR on a (fairly stable) desktop-class workstation,
and measured an ~8-9% improvement to LDM top speed (combined with multithreaded (low levels <=4) compression)
in line with this PR announce.

facebook-github-bot added the CLA Signed label Jan 11, 2021

senhuang42 reviewed Jan 11, 2021

View reviewed changes

terrelln reviewed Jan 11, 2021

View reviewed changes

mpu added 2 commits January 20, 2021 00:54

a couple performance improvement changes for ldm

1e65711

fix forgotten numTagBits in getTagMask

e0d5eca

mpu force-pushed the ldmfixes branch from 2fb5bfa to 4bce073 Compare January 20, 2021 09:29

fix warning and remove one more occurrence of makeEntryAndInsertByTag

d6e3de7

mpu force-pushed the ldmfixes branch from 4bce073 to d6e3de7 Compare January 20, 2021 09:40

terrelln reviewed Jan 20, 2021

View reviewed changes

fix a variable name to reflect its nature

aee3dc8

terrelln approved these changes Jan 22, 2021

View reviewed changes

terrelln merged commit f5b3f64 into facebook:dev Jan 22, 2021

mpu deleted the ldmfixes branch January 25, 2021 08:51

felixhandte mentioned this pull request Mar 2, 2021

Release ZStandard v1.4.9 #2515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple performance improvements for ldm #2464

Simple performance improvements for ldm #2464

mpu commented Jan 11, 2021 •

edited

Loading

senhuang42 Jan 11, 2021

mpu Jan 20, 2021

senhuang42 commented Jan 11, 2021

terrelln commented Jan 11, 2021

terrelln Jan 11, 2021

mpu Jan 20, 2021

mpu commented Jan 20, 2021 •

edited

Loading

terrelln left a comment

terrelln Jan 20, 2021

mpu commented Jan 22, 2021

terrelln left a comment

Cyan4973 commented Jan 22, 2021

Simple performance improvements for ldm #2464

Simple performance improvements for ldm #2464

Conversation

mpu commented Jan 11, 2021 • edited Loading

Description

Benchmarks

senhuang42 Jan 11, 2021

Choose a reason for hiding this comment

mpu Jan 20, 2021

Choose a reason for hiding this comment

senhuang42 commented Jan 11, 2021

terrelln commented Jan 11, 2021

terrelln Jan 11, 2021

Choose a reason for hiding this comment

mpu Jan 20, 2021

Choose a reason for hiding this comment

mpu commented Jan 20, 2021 • edited Loading

terrelln left a comment

Choose a reason for hiding this comment

terrelln Jan 20, 2021

Choose a reason for hiding this comment

mpu commented Jan 22, 2021

terrelln left a comment

Choose a reason for hiding this comment

Cyan4973 commented Jan 22, 2021

mpu commented Jan 11, 2021 •

edited

Loading

mpu commented Jan 20, 2021 •

edited

Loading