Zstandard v1.5.1
Notice : it has been brought to our attention that the v1.5.1
library might be built with an executable stack on non-x64
architectures, which could end up being flagged as problematic by some systems with thorough security settings which disallow executable stack. We are currently reviewing the issue. Be aware of it if you build libzstd
for non-x64
architecture.
Zstandard v1.5.1 is a maintenance release, bringing a good number of small refinements to the project. It also offers a welcome crop of performance improvements, as detailed below.
Performance Improvements
Speed improvements for fast compression (levels 1β4)
PRs #2749, #2774, and #2921 refactor single-segment compression for ZSTD_fast
and ZSTD_dfast
, which back compression levels 1 through 4 (as well as the negative compression levels). Speedups in the ~3-5% range are observed. In addition, the compression ratio of ZSTD_dfast
(levels 3 and 4) is slightly improved.
Rebalanced middle compression levels
v1.5.0
introduced major speed improvements for mid-level compression (from 5 to 12), while preserving roughly similar compression ratio. As a consequence, the speed scale became tilted towards faster speed. Unfortunately, the difference between successive levels was no longer regular, and there is a large performance gap just after the impacted range, between levels 12 and 13.
v1.5.1
tries to rebalance parameters so that compression levels can be roughly associated to their former speed budget. Consequently, v1.5.1
mid compression levels feature speeds closer to former v1.4.9
(though still sensibly faster) and receive in exchange an improved compression ratio, as shown in below graph.
Note that, since middle levels only experience a rebalancing, save some special cases, no significant performance differences between versions v1.5.0
and v1.5.1
should be expected: levels merely occupy different positions on the same curve. The situation is a bit different for fast levels (1-4), for which v1.5.1
delivers a small but consistent performance benefit on all platforms, as described in previous paragraph.
Huffman Improvements
Our Huffman code was significantly revamped in this release. Both encoding and decoding speed were improved. Additionally, encoding speed for small inputs was improved even further. Speed is measured on the Silesia corpus by compressing with level 1 and extracting the literals left over after compression. Then compressing and decompressing the literals from each block. Measurements are done on an Intel i9-9900K @ 3.6 GHz.
Compiler | Scenario | v1.5.0 Speed | v1.5.1 Speed | Delta |
---|---|---|---|---|
gcc-11 | Literal compression - 128KB block | 748 MB/s | 927 MB/s | +23.9% |
clang-13 | Literal compression - 128KB block | 810 MB/s | 927 MB/s | +14.4% |
gcc-11 | Literal compression - 4KB block | 223 MB/s | 321 MB/s | +44.0% |
clang-13 | Literal compression - 4KB block | 224 MB/s | 310 MB/s | +38.2% |
gcc-11 | Literal decompression - 128KB block | 1164 MB/s | 1500 MB/s | +28.8% |
clang-13 | Literal decompression - 128KB block | 1006 MB/s | 1504 MB/s | +49.5% |
Overall impact on (de)compression speed depends on the compressibility of the data. Compression speed improves from 1-4%, and decompression speed improves from 5-15%.
PR #2722 implements the Huffman decoder in assembly for x86-64 with BMI2 enabled. We detect BMI2 support at runtime, so this speedup applies to all x86-64 builds running on CPUs that support BMI2. This improves Huffman decoding speed by about 40%, depending on the scenario. PR #2733 improves Huffman encoding speed by 10% for clang and 20% for gcc. PR #2732 drastically speeds up the HUF_sort()
function, which speeds up Huffman tree building for compression. This is a significant speed boost for small inputs, measuring in at a 40% improvement for 4K inputs.
Binary Size and Build Speed
zstd
binary size grew significantly in v1.5.0
due to the new code added for middle compression level speed optimizations. In this release we recover the binary size, and in the process also significantly speed up builds, especially with sanitizers enabled.
Measured on x86-64 compiled with -O3
we measure libzstd.a
size. We regained 161 KB of binary size on gcc, and 293 KB of binary size on clang. Note that these binary sizes are listed for the whole library, optimized for speed over size. The decoder only, with size saving options enabled, and compiled with -Os
or -Oz
can be much smaller.
Version | gcc-11 size | clang-13 size |
---|---|---|
v1.5.1 | 1177 KB | 1167 KB |
v1.5.0 | 1338 KB | 1460 KB |
v1.4.9 | 1137 KB | 1151 KB |
Change log
Featured user-visible changes
- perf: rebalanced compression levels, to better match intended speed/level curve, by @senhuang42 and @Cyan4973
- perf: faster huffman decoder, using
x64
assembly, by @terrelln - perf: slightly faster high speed modes (strategies fast & dfast), by @felixhandte
- perf: smaller binary size and faster compilation times, by @terrelln and @nolange
- perf: new row64 mode, used notably at highest
lazy2
levels 11-12, by @senhuang42 - perf: faster mid-level compression speed in presence of highly repetitive patterns, by @senhuang42
- perf: minor compression ratio improvements for small data at high levels, by @Cyan4973
- perf: reduced stack usage (mostly useful for Linux Kernel), by @terrelln
- perf: faster compression speed on incompressible data, by @bindhvo
- perf: on-demand reduced
ZSTD_DCtx
state size, using build macroZSTD_DECODER_INTERNAL_BUFFER
, at a small cost of performance, by @bindhvo - build: allows hiding static symbols in the dynamic library, using build macro, by @skitt
- build: support for
m68k
(Motorola 68000's), by @Cyan4973 - build: improved
AIX
support, by @Helflym - build: improved meson unofficial build, by @eli-schwartz
- cli : fix : forward
mtime
to output file, by @felixhandte - cli : custom memory limit when training dictionary (#2925), by @embg
- cli : report advanced parameters information when compressing in very verbose mode (
-vv
), by @Svetlitski-FB - cli : advanced commands in the form
--long-param=
can accept negative value arguments, by @binhdvo
PR full list
- Add determinism fuzzers and fix rare determinism bugs by @terrelln in #2648
ZSTD_VecMask_next
: fix incorrect variable name in fallback code path by @dnelson-1901 in #2657- improve tar compatibility by @Cyan4973 in #2660
- Enable SSE2 compression path to work on MSVC by @TrianglesPCT in #2653
- Fix CircleCI Config to Fully Remove
publish-github-release
Job by @felixhandte in #2649 - [CI] Fix zlib-wrapper test by @senhuang42 in #2668
- [CI] Add ARM tests back into CI by @senhuang42 in #2667
- [trace] Refine the ZSTD_HAVE_WEAK_SYMBOLS detection by @terrelln in #2674
- [CI][1/2] Re-do the github actions workflows, migrate various travis and appveyor tests. by @senhuang42 in #2675
- Make GH Actions CI tests run apt-get update before apt-get install by @senhuang42 in #2682
- Add arm64 fuzz test to travis by @senhuang42 in #2686
- Add ldm and block splitter auto-enable to old api by @senhuang42 in #2684
- Add documentation for --patch-from by @binhdvo in #2693
- Make regression test run on every PR by @senhuang42 in #2691
- Initialize "potentially uninitialized" pointers. by @wolfpld in #2654
- Flatten
ZSTD_row_getMatchMask
by @aqrit in #2681 - Update
README
for Travis CI Badge by @gauthamkrishna9991 in #2700 - Fuzzer test with no intrinsics on
S390x
(big endian) by @senhuang42 in #2678 - Fix
--progress
flag to properly control progress display and default β¦ by @binhdvo in #2698 - [bug] Fix entropy repeat mode bug by @senhuang42 in #2697
- Format File Sizes Human-Readable in the cli by @felixhandte in #2702
- Add support for negative values in advanced flags by @binhdvo in #2705
- [fix] Add missing bounds checks during compression by @terrelln in #2709
- Add API for fetching skippable frame content by @binhdvo in #2708
- Add option to use logical cores for default threads by @binhdvo in #2710
- lib/Makefile: Fix small typo in
ZSTD_FORCE_DECOMPRESS_*
build macros by @luisdallos in #2714 - [RFC] Add internal API for converting
ZSTD_Sequence
intoseqStore
by @senhuang42 in #2715 - Optimize zstd decompression by another x% by @danlark1 in #2689
- Include what you use in
zstd_ldm_geartab
by @danlark1 in #2719 - [trace] remove zstd_trace.c reference from freestanding by @heitbaum in #2655
- Remove folder when done with test by @senhuang42 in #2720
- Proactively skip huffman compression based on sampling where non-comp⦠by @binhdvo in #2717
- Add support for MCST LCC compiler by @makise-homura in #2725
- [bug-fix] Fix a determinism bug with the DUBT by @terrelln in #2726
- Fix DDSS Load by @felixhandte in #2729
Z_PREFIX zError
function by @koalabearguo in #2707pzstd
: fix linking for static builds by @jonringer in #2724- [HUF] Improve Huffman encoding speed by @terrelln in #2733
- [HUF] Improve Huffman sorting algorithm by @senhuang42 in #2732
- Set
mtime
on Output Files by @felixhandte in #2742 - [RFC] Rebalance compression levels by @senhuang42 in #2692
- Improve branch misses on FSE symbol spreading by @senhuang42 in #2750
- make
ZSTD_HASHLOG3_MAX
private by @Cyan4973 in #2752 - meson fixups by @eli-schwartz in #2746
- [easy] Fix zstd bench error message by @senhuang42 in #2753
- Reduce test time on TravisCI by @Cyan4973 in #2757
- added
qemu
tests by @Cyan4973 in #2758 - Add 8 bytes to FSE_buildCTable wksp by @senhuang42 in #2761
- minor rebalancing of level 13 by @Cyan4973 in #2762
- Improve compile speed and binary size in
opt
by @senhuang42 in #2763 - [easy] Fix patch-from help msg typo by @senhuang42 in #2769
- Pipelined Implementation of
ZSTD_fast
(~+5% Speed) by @felixhandte in #2749 - meson: fix type error for integer option by @eli-schwartz in #2775
- Fix dictionary training huffman segfault and small speed improvement by @senhuang42 in #2773
- [rsyncable] Ensure
ZSTD_compressBound()
is respected by @terrelln in #2776 - Improve optimal parser performance on small data by @Cyan4973 in #2771
- [rsyncable] Fix test failures by @terrelln in #2777
- Revert opt outlining change by @senhuang42 in #2778
- [build] Add support for ASM files in
Make
+CMake
by @terrelln in #2783 - add
msvc2019
to build.generic.cmd by @animalize in #2787 - [fuzzer] Add Huffman decompression fuzzer by @terrelln in #2784
- Assembly implementation of 4X1 & 4X2 Huffman by @terrelln in #2722
- [huf] Fix compilation when
DYNAMIC_BMI2=0 && BMI2
is supported by @terrelln in #2791 - Use new paramSwitch enum for row matchfinder and block splitter by @senhuang42 in #2788
- Fix
NCountWriteBound
by @senhuang42 in #2779 - [contrib][linux] Fix up SPDX license identifiers by @terrelln in #2794
- [contrib][linux] Reduce stack usage by 80 bytes by @terrelln in #2795
- Reduce stack usage of block splitter by @senhuang42 in #2780
- minor: constify
MatchState*
parameter when possible by @Cyan4973 in #2797 - [build] Fix oss-fuzz build with the dataflow sanitizer by @terrelln in #2799
- [lib] Make lib compatible with
-Wfall-through
excepting legacy by @terrelln in #2796 - [contrib][linux] Fix build after introducing ASM HUF implementation by @solbjorn in #2790
- Smaller code with disabled features by @nolange in #2805
- [huf] Fix OSS-Fuzz assert by @terrelln in #2808
- Skip most long matches in lazy hash table update by @senhuang42 in #2755
- add missing BUNDLE DESTINATION by @3nids in #2810
- [contrib][linux] Fix
-Wundef
inside Linux kernel tree by @solbjorn in #2802 - [contrib][linux-kernel] Add standard warnings and
-Werror
to CI by @terrelln in #2803 - Add AIX support in Makefile by @Helflym in #2747
- Limit train samples by @stanjo74 in #2809
- [multiple-ddicts] Fix
NULL
checks by @terrelln in #2817 - [ldm] Fix
ZSTD_c_ldmHashRateLog
bounds check by @terrelln in #2819 - [binary-tree] Fix underflow of
nbCompares
by @terrelln in #2820 - Enhance streaming_compression examples. by @marxin in #2813
- Pipelined Implementation of
ZSTD_dfast
by @felixhandte in #2774 - Fix a C89 error in msvc by @animalize in #2800
- [asm] Switch to C style comments by @terrelln in #2825
- Support thread pool section in HTML documentation. by @marxin in #2822
- Reduce size of
dctx
by reutilizing dst buffer by @binhdvo in #2751 - [lazy] Speed up compilation times by @terrelln in #2828
- separate compression level tables into their own file by @Cyan4973 in #2830
- minor : change build macro to
ZSTD_DECODER_INTERNAL_BUFFER
by @Cyan4973 in #2829 - Fix oss fuzz test error by @binhdvo in #2837
- Move mingw tests from appveyor to github actions by @binhdvo in #2838
- Improvements to verbose mode output by @Svetlitski-FB in #2839
- Use unused functions to appease Visual Studio by @senhuang42 in #2846
- Backport zstd patch from LKML by @terrelln in #2849
- Fix fullbench CI failure by @binhdvo in #2851
- Fix Determinism Bug: Avoid Reducing Indices to Reserved Values by @felixhandte in #2850
ZSTD_copy16()
uses ZSTD_memcpy() by @animalize in #2836- Display command line parameters with concrete values in verbose mode by @Svetlitski-FB in #2847
- Reduce function size in fast & dfast by @terrelln in #2863
- [linux-kernel] Don't inline function in
zstd_opt.c
by @terrelln in #2864 - Remove executable flag from GNU_STACK segment by @ko-zu in #2857
- [linux-kernel] Don't add
-O3
toCFLAGS
by @terrelln in #2866 - Support Swift Package Manager by @cntrump in #2858
- Determinism: Avoid Mapping Window into Reserved Indices during Reduction by @felixhandte in #2869
- Clarify documentation for
-c
by @binhdvo in #2883 - Fix build for cygwin/bsd by @binhdvo in #2882
- Move visual studio tests from per-release to per-PR by @senhuang42 in #2845
- Fix SPM warning: umbrella header for module 'libzstd' does not include header 'xxx.h' by @cntrump in #2872
- Add detection when compiling with Clang and Ninja under Windows by @jannkoeker in #2877
- [contrib][pzstd] Fix build issue with gcc-5 by @terrelln in #2889
- [bmi2] Add
lzcnt
andbmi
target attributes by @terrelln in #2888 - [test] Test that the exec-stack bit isn't set on libzstd.so by @terrelln in #2886
- Solve the bug of extra output newline character by @15596858998 in #2876
- [zdict] Remove
ZDICT_CONTENTSIZE_MIN
restriction forZDICT_finalizeDictionary
by @terrelln in #2887 - Explicitly hide static symbols by @skitt in #2501
- Makefile: sort all wildcard file list expansions by @kanavin in #2895
- merge #2501 by @Cyan4973 in #2894
- Makefile: fix build for mingw by @sapiippo in #2687
- [CircleCI] Fix short-tests-0 by @terrelln in #2892
- Zstandard compiles and run on
m68k
cpus by @Cyan4973 in #2896 - Improve zstd_opt build speed and size by @terrelln in #2898
- [CI] Add
cmake
windows build by @terrelln in #2900 - Disable Multithreading in CMake Builds for Android by @felixhandte in #2899
- Avoid Using Deprecated Functions in Deprecated Code by @felixhandte in #2897
- [asm] Share portability macros and restrict ASM further by @terrelln in #2893
- fixbug CLI's -D fails when the argument is not a regular file by @15596858998 in #2890
- Apply
FORCE_MEMORY_ACCESS=1
to legacy by @Hello71 in #2907 - [lib] Fix libzstd.pc for lib-mt builds by @ericonr in #2659
- Imply
-q
when stderr is not a tty by @binhdvo in #2884 - Fix Up #2659; Build
libzstd.pc
Whenever Building the Lib on Unix by @felixhandte in #2912 - Remove possible
NULL
pointer addition by @terrelln in #2916 - updated
xxHash
to latestv0.8.1
by @Cyan4973 in #2914 - Reject Irregular Dictionary Files by @felixhandte in #2910
x32
compatibility by @Cyan4973 in #2922- typo: Small spelling mistake in example by @IAL32 in #2923
- add test case by @15596858998 in #2905
- Stagger Stepping in Negative Levels by @felixhandte in #2921
- Fix performance degradation with
-m32
by @binhdvo in #2926 - Reduce tables to 8bit by @nolange in #2930
- simplify SSE implementation of row_lazy match finder by @Cyan4973 in #2929
- Allow user to specify memory limit for dictionary training by @embg in #2925
- fixed incorrect rowlog initialization by @Cyan4973 in #2931
- rebalance lazy compression levels by @Cyan4973 in #2934
New Contributors
- @dnelson-1901 made their first contribution in #2657
- @TrianglesPCT made their first contribution in #2653
- @binhdvo made their first contribution in #2693
- @wolfpld made their first contribution in #2654
- @aqrit made their first contribution in #2681
- @gauthamkrishna9991 made their first contribution in #2700
- @luisdallos made their first contribution in #2714
- @danlark1 made their first contribution in #2689
- @heitbaum made their first contribution in #2655
- @makise-homura made their first contribution in #2725
- @koalabearguo made their first contribution in #2707
- @jonringer made their first contribution in #2724
- @eli-schwartz made their first contribution in #2746
- @abxhr made their first contribution in #2798
- @solbjorn made their first contribution in #2790
- @nolange made their first contribution in #2805
- @3nids made their first contribution in #2810
- @Helflym made their first contribution in #2747
- @stanjo74 made their first contribution in #2809
- @Svetlitski-FB made their first contribution in #2839
- @cntrump made their first contribution in #2858
- @rex4539 made their first contribution in #2856
- @jannkoeker made their first contribution in #2877
- @yoniko made their first contribution in #2885
- @15596858998 made their first contribution in #2876
- @kanavin made their first contribution in #2895
- @sapiippo made their first contribution in #2687
- @supperPants made their first contribution in #2891
- @Hello71 made their first contribution in #2907
- @ericonr made their first contribution in #2659
- @IAL32 made their first contribution in #2923
- @embg made their first contribution in #2925
Full Changelog: v1.5.0...v1.5.1