Zstandard v1.5.0
v1.5.0
is a major release featuring large performance improvements as well as API changes.
Performance
Improved Middle-Level Compression Speed
1.5.0 introduces a new default match finder for the compression strategies greedy
, lazy
, and lazy2
, (which map to levels 5-12 for inputs larger than 256K). The optimization brings a massive improvement in compression speed with slight perturbations in compression ratio (< 0.5%) and equal or decreased memory usage.
Benchmarked with gcc, on an i9-9900K:
level | silesia.tar speed delta |
enwik7 speed delta |
---|---|---|
5 | +25% | +25% |
6 | +50% | +50% |
7 | +40% | +40% |
8 | +40% | +50% |
9 | +50% | +65% |
10 | +65% | +80% |
11 | +85% | +105% |
12 | +110% | +140% |
On heavily loaded machines with significant cache contention, we have internally measured even larger gains: 2-3x+ speed at levels 5-7. 🚀
The biggest gains are achieved on files typically larger than 128KB. On files smaller than 16KB, by default we revert back to the legacy match finder which becomes the faster one. This default policy can be overriden manually: the new match finder can be forcibly enabled with the advanced parameter ZSTD_c_useRowMatchFinder
, or through the CLI option --[no-]row-match-finder
.
Note: only CPUs that support SSE2
realize the full extent of this improvement.
Improved High-Level Compression Ratio
Improving compression ratio via block splitting is now enabled by default for high compression levels (16+). The amount of benefit varies depending on the workload. Compressing archives comprised of heavily differing files will see more improvement than compression of single files that don’t vary much entropically (like text files/enwik). At levels 16+, we observe no measurable regression to compression speed.
level 22 compression
file | ratio 1.4.9 | ratio 1.5.0 | ratio % delta |
---|---|---|---|
silesia.tar | 4.021 | 4.041 | +0.49% |
calgary.tar | 3.646 | 3.672 | +0.71% |
enwik7 | 3.579 | 3.579 | +0.0% |
The block splitter can be forcibly enabled on lower compression levels as well with the advanced parameter ZSTD_c_splitBlocks
. When forcibly enabled at lower levels, speed regressions can become more notable. Additionally, since more compressed blocks may be produced, decompression speed on these blobs may also see small regressions.
Faster Decompression Speed
The decompression speed of data compressed with large window settings (such as --long
or --ultra
) has been significantly improved in this version. The gains vary depending on compiler brand and version, with clang
generally benefiting the most.
The following benchmark was measured by compressing enwik9
at level --ultra -22
(with a 128 MB window size) on a core i7-9700K.
Compiler version | D. Speed improvement |
---|---|
gcc-7 | +15% |
gcc-8 | +10 % |
gcc-9 | +5% |
gcc-10 | +1% |
clang-6 | +21% |
clang-7 | +16% |
clang-8 | +16% |
clang-9 | +18% |
clang-10 | +16% |
clang-11 | +15% |
Average decompression speed for “normal” payload is slightly improved too, though the impact is less impressive. Once again, mileage varies depending on exact compiler version, payload, and even compression level. In general, a majority of scenarios see benefits ranging from +1 to +9%. There are also a few outliers here and there, from -4% to +13%. The average gain across all these scenarios stands at ~+4%.
Library Updates
Dynamic Library Supports Multithreading by Default
It was already possible to compile libzstd
with multithreading support. But it was an active operation. By default, the make
build script would build libzstd
as a single-thread-only library.
This changes in v1.5.0
.
Now the dynamic library (typically libzstd.so.1
on Linux) supports multi-threaded compression by default.
Note that this property is not extended to the static library (typically libzstd.a
on Linux) because doing so would have impacted the build script of existing client applications (requiring them to add -pthread
to their recipe), thus potentially breaking their build. In order to avoid this disruption, the static library remains single-threaded by default.
Luckily, this build disruption does not extend to the dynamic library, which can be built with multi-threading support while existing applications linking to libzstd.so
and expecting only single-thread capabilities will be none the wiser, and remain completely unaffected.
The idea is that starting from v1.5.0
, applications can expect the dynamic library to support multi-threading should they need it, which will progressively lead to increased adoption of this capability overtime.
That being said, since the locally deployed dynamic library may, or may not, support multi-threading compression, depending on local build configuration, it’s always better to check this capability at runtime. For this goal, it’s enough to check the return value when changing parameter ZSTD_c_nbWorkers
, and if it results in an error, then multi-threading is not supported.
Q: What if I prefer to keep the libraries in single-thread mode only ?
The target make lib-nomt
will ensure this outcome.
Q: Actually, I want both static and dynamic library versions to support multi-threading !
The target make lib-mt
will generate this outcome.
Promotions to Stable
Moving up to the higher digit 1.5
signals an opportunity to extend the stable portion of zstd
public API.
This update is relatively minor, featuring only a few non-controversial newcomers.
ZSTD_defaultCLevel()
indicates which level is default (applied when selecting level 0
). It completes existing
ZSTD_minCLevel()
and ZSTD_maxCLevel()
.
Similarly, ZSTD_getDictID_fromCDict()
is a straightforward equivalent to already promoted ZSTD_getDictID_fromDDict()
.
Deprecations
Zstd-1.4.0 stabilized a new advanced API which allows users to pass advanced parameters to zstd. We’re now deprecating all the old experimental APIs that are subsumed by the new advanced API. They will be considered for removal in the next Zstd major release zstd-1.6.0. Note that only experimental symbols are impacted. Stable functions, like ZSTD_initCStream()
, remain fully supported.
The deprecated functions are listed below, together with the migration. All the suggested migrations are stable APIs, meaning that once you migrate, the API will be supported forever. See the documentation for the deprecated functions for more details on how to migrate.
- Functions that migrate to
ZSTD_compress2()
with parameter setters:ZSTD_compress_advanced()
: UseZSTD_CCtx_setParameter()
.ZSTD_compress_usingCDict_advanced()
: UseZSTD_CCtx_setParameter()
andZSTD_CCtx_refCDict()
.
- Functions that migrate to
ZSTD_compressStream()
orZSTD_compressStream2()
with parameter setters:ZSTD_initCStream_srcSize()
: UseZSTD_CCtx_setPledgedSrcSize()
.ZSTD_initCStream_usingDict()
: UseZSTD_CCtx_loadDictionary()
.ZSTD_initCStream_usingCDict()
: UseZSTD_CCtx_refCDict()
.ZSTD_initCStream_advanced()
: UseZSTD_CCtx_setParameter()
.ZSTD_initCStream_usingCDict_advanced()
: UseZSTD_CCtx_setParameter()
andZSTD_CCtx_refCDict()
.ZSTD_resetCStream()
: UseZSTD_CCtx_reset()
andZSTD_CCtx_setPledgedSrcSize()
.
- Functions that are deprecated without replacement. We don’t expect any users of these functions. Please open an issue if you use these and have questions about how to migrate.
ZSTD_compressBegin_advanced()
ZSTD_compressBegin_usingCDict_advanced()
Header File Locations
Zstd has slightly re-organized the library layout to move all public headers to the top level lib/
directory. This is for consistency, so all public headers are in lib/
and all private headers are in a sub-directory. If you build zstd from source, this may affect your build system.
lib/common/zstd_errors.h
has moved tolib/zstd_errors.h
.lib/dictBuilder/zdict.h
has moved tolib/zdict.h
.
Single-File Library
We have moved the scripts in contrib/single_file_libs
to build/single_file_libs
. These scripts, originally contributed by @cwoffenden, produce a single compilation-unit amalgamation of the zstd library, which can be convenient for integrating Zstandard into other source trees. This move reflects a commitment on our part to support this tool and this pattern of using zstd going forward.
Windows Release Artifact Format
We are slightly changing the format of the Windows release .zip
files, to match our other release artifacts. The .zip
files now bundle everything in a single folder whose name matches the archive name. The contents of that folder exactly match what was previously included in the root of the archive.
Signed Releases
We have created a signing key for the Zstandard project. This release and all future releases will be signed by this key. See #2520 for discussion.
Changelog
- api: Various functions promoted from experimental to stable API: (#2579-#2581, @senhuang42)
ZSTD_defaultCLevel()
ZSTD_getDictID_fromCDict()
- api: Several experimental functions have been deprecated and will emit a compiler warning (#2582, @senhuang42)
ZSTD_compress_advanced()
ZSTD_compress_usingCDict_advanced()
ZSTD_compressBegin_advanced()
ZSTD_compressBegin_usingCDict_advanced()
ZSTD_initCStream_srcSize()
ZSTD_initCStream_usingDict()
ZSTD_initCStream_usingCDict()
ZSTD_initCStream_advanced()
ZSTD_initCStream_usingCDict_advanced()
ZSTD_resetCStream()
- api:
ZSTDMT_NBWORKERS_MAX
reduced to 64 for 32-bit environments (#2643, @Cyan4973) - perf: Significant speed improvements for middle compression levels (#2494, @senhuang42 & @terrelln)
- perf: Block splitter to improve compression ratio, enabled by default for high compression levels (#2447, @senhuang42)
- perf: Decompression loop refactor, speed improvements on
clang
and for--long
modes (#2614 #2630, @Cyan4973) - perf: Reduced stack usage during compression and decompression entropy stage (#2522 #2524, @terrelln)
- bug: Make the number of physical CPU cores detection more robust (#2517, @PaulBone)
- bug: Improve setting permissions of created files (#2525, @felixhandte)
- bug: Fix large dictionary non-determinism (#2607, @terrelln)
- bug: Fix various dedicated dictionary search bugs (#2540 #2586, @senhuang42 @felixhandte)
- bug: Fix non-determinism test failures on Linux i686 (#2606, @terrelln)
- bug: Fix UBSAN error in decompression (#2625, @terrelln)
- bug: Fix superblock compression divide by zero bug (#2592, @senhuang42)
- bug: Ensure
ZSTD_estimateCCtxSize*()
monotonically increases with compression level (#2538, @senhuang42) - doc: Improve
zdict.h
dictionary training API documentation (#2622, @terrelln) - doc: Note that public
ZSTD_free*()
functions accept NULL pointers (#2521, @animalize) - doc: Add style guide docs for open source contributors (#2626, @Cyan4973)
- tests: Better regression test coverage for different dictionary modes (#2559, @senhuang42)
- tests: Better test coverage of index reduction (#2603, @terrelln)
- tests: OSS-Fuzz coverage for seekable format (#2617, @senhuang42)
- tests: Test coverage for ZSTD threadpool API (#2604, @senhuang42)
- build: Dynamic library built multithreaded by default (#2584, @senhuang42)
- build: Move
zstd_errors.h
andzdict.h
tolib/
root (#2597, @terrelln) - build: Single file library build script moved to
build/
directory (#2618, @felixhandte) - build: Allow
ZSTDMT_JOBSIZE_MIN
to be configured at compile-time, reduce default to 512KB (#2611, @Cyan4973) - build: Fixed Meson build (#2548, @SupervisedThinking & @kloczek)
- build:
ZBUFF_*()
is no longer built by default (#2583, @senhuang42) - build: Fix excessive compiler warnings with clang-cl and CMake (#2600, @nickhutchinson)
- build: Detect presence of
md5
on Darwin (#2609, @felixhandte) - build: Avoid SIGBUS on armv6 (#2633, @bmwiedmann)
- cli:
--progress
flag added to always display progress bar (#2595, @senhuang42) - cli: Allow reading from block devices with
--force
(#2613, @felixhandte) - cli: Fix CLI filesize display bug (#2550, @Cyan4973)
- cli: Fix windows CLI
--filelist
end-of-line bug (#2620, @Cyan4973) - contrib: Various fixes for linux kernel patch (#2539, @terrelln)
- contrib: Seekable format - Decompression hanging edge case fix (#2516, @senhuang42)
- contrib: Seekable format - New seek table-only API (#2113 #2518, @mdittmer @Cyan4973)
- contrib: Seekable format - Fix seek table descriptor check when loading (#2534, @foxeng)
- contrib: Seekable format - Decompression fix for large offsets, (#2594, @azat)
- misc: Automatically published release tarballs available on Github (#2535, @felixhandte)