Refactor prefetching for the decoding loop #2547

Following #2545, I noticed that one field in `seq_t` is optional, and only used in combination with prefetching. (This may have contributed to static analyzer failure to detect correct initialization). I then wondered if it would be possible to rewrite the code so that this optional part is handled directly by the prefetching code rather than delegated as an option into `ZSTD_decodeSequence()`. This resulted into this refactoring exercise where the prefetching responsibility is better isolated into its own function and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations. Incidently, due to better code locality, it reduces the need to send information around, leading to simplified interface, and smaller state structures.

changed strategy, now unconditionally prefetch the first 2 cache lines, instead of cache lines corresponding to the first and last bytes of the match. This better corresponds to cpu expectation, which should auto-prefetch following cachelines on detecting the sequential nature of the read. This is globally positive, by +5%, though exact gains depend on compiler (from -2% to +15%). The only negative counter-example is gcc-9.

…_prefetch_refactor

This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.

Commits on May 5, 2021

Merge branch 'dev' into d_prefetch_refactor

Cyan4973 committed May 5, 2021

Configuration menu

View commit details

Copy full SHA for 8cde167

Browse repository at this point

Copy the full SHA

8cde167 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor prefetching for the decoding loop #2547

Refactor prefetching for the decoding loop #2547

Commits on Mar 19, 2021

Commits on May 5, 2021

Commits on May 7, 2021