Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed) #3177

embg · 2022-06-23T16:53:50Z

Summary

Adds a new CCtxParam (prefetchCDictTables)
Exposes it in the largeNbDicts benchmark
Adds it to fuzzers

Description of the optimization

In some situations, zstd uses CDict tables in-place rather than copying them into the working context. (See docs on ZSTD_dictAttachPref_e for details). In such situations, compression speed is seriously impacted when CDict tables are "cold" (outside CPU cache).

This PR adds a CCtxParam (prefetchCDictTables) which instructs zstd to prefetch CDict tables when they are used in-place (specifically in level 1-4 dictMatchState matchfinders). For sufficiently small inputs, the cost of the prefetch will outweigh the benefit. For sufficiently large inputs, zstd will by default memcpy() CDict tables into the working context, so there is no need to prefetch. This parameter is targeted at a middle range of input sizes, where a prefetch is cheap enough to be useful but memcpy() is too expensive.

The exact range of input sizes where this makes sense is best determined by careful experimentation (see below for measurements on one particular machine / dataset which demonstrate 10-20% wins for a particular working set size and input size). Rather than enabling this param for all inputs, the code which calls ZSTD_compress2() should use a size cutoff (tuned via experimentation) to select the best prefetch strategy for each input.

Measurements

I measured the effect of this param on the HTML dataset. I benchmarked on a Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz machine with core isolation and turbo disabled.

We can see that the param is harmful for level3 even in the cold CDict scenario if the inputs are small enough (0-8K). For larger inputs (8-16K) at the same level, we see up to 20% wins. This demonstrates the need for selective application of this param.

felixhandte

LGTM. Just a couple nits.

lib/compress/zstd_double_fast.c

lib/compress/zstd_fast.c

lib/compress/zstd_compress.c

lib/compress/zstd_compress_internal.h

felixhandte

Assuming tests pass, ship it!

embg added 4 commits June 22, 2022 16:13

Add prefetchCDictTables CCtxParam

2a12811

Add docs

93b89fb

add prefetchCDictTables to largeNbDicts

6bd5ac6

Add tests

747e06f

facebook-github-bot added the CLA Signed label Jun 23, 2022

felixhandte added the optimization label Jun 23, 2022

felixhandte reviewed Jun 23, 2022

View reviewed changes

lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved

lib/compress/zstd_fast.c Outdated Show resolved Hide resolved

lib/compress/zstd_compress.c Outdated Show resolved Hide resolved

felixhandte reviewed Jun 23, 2022

View reviewed changes

lib/compress/zstd_compress_internal.h Show resolved Hide resolved

Nits

cb9e341

felixhandte approved these changes Jun 23, 2022

View reviewed changes

embg merged commit e9d6fc8 into facebook:dev Jun 24, 2022

embg deleted the dms_prefetch2 branch June 24, 2022 15:24

Cyan4973 mentioned this pull request Feb 9, 2023

release v1.5.4 #3487

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed) #3177

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed) #3177

embg commented Jun 23, 2022

felixhandte left a comment

felixhandte left a comment

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed) #3177

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed) #3177

Conversation

embg commented Jun 23, 2022

Summary

Description of the optimization

Measurements

felixhandte left a comment

Choose a reason for hiding this comment

felixhandte left a comment

Choose a reason for hiding this comment