Pipelined Implementation of ZSTD_dfast #2774

felixhandte · 2021-09-09T21:21:57Z

This PR takes the ideas from #2749 and applies them to the double-fast implementation.

Description

We start by pulling a single-segment copy out so that we can work on it separately from the DMS implementation.

This implementation makes two changes to how the input is parsed:

Instead of checking ip + 1 when we find a short match, we check ip + step. This is a pretty minimal change to the parsing behavior, since step is almost always 1.
We write back ip + 1 into the hash table even when we take a long match at ip (instead of only in the short match path). It costs us basically nothing to do this because we've already hashed it. This improves compression ratio.

Unlike the fast implementation, whose pipelining includes speculative work that we might throw away, this implementation doesn't do any additional work. It just moves some of it earlier. In particular, the crucial observation is that when we do not take a long match at the current position, we are guaranteed to inspect the next long position, either by taking a short match and checking the next one or by not taking the short match and moving on to the next position. So we can frontload that loading work some.

Benchmarks

Silesia Results Table

dickens     gcc-4.8    3 |   99.9  100.3 ( +0.400%) |  2.769  2.779 ( +0.361%)
dickens     gcc-5      3 |   99.1   99.2 ( +0.101%) |  2.769  2.779 ( +0.361%)
dickens     gcc-6      3 |  101.5   99.0 ( -2.463%) |  2.769  2.779 ( +0.361%)
dickens     gcc-7      3 |  101.8   99.7 ( -2.063%) |  2.769  2.779 ( +0.361%)
dickens     gcc-8      3 |   96.5   97.9 ( +1.451%) |  2.769  2.779 ( +0.361%)
dickens     gcc-10     3 |  100.4   99.5 ( -0.896%) |  2.769  2.779 ( +0.361%)
dickens     clang-6.0  3 |  103.7  103.0 ( -0.675%) |  2.769  2.779 ( +0.361%)
dickens     clang-7    3 |  100.4   98.0 ( -2.390%) |  2.769  2.779 ( +0.361%)
dickens     clang-8    3 |  100.7   99.1 ( -1.589%) |  2.769  2.779 ( +0.361%)
dickens     clang-9    3 |  102.3   94.5 ( -7.625%) |  2.769  2.779 ( +0.361%)
dickens     clang-11   3 |  100.8  100.2 ( -0.595%) |  2.769  2.779 ( +0.361%)
dickens     clang-12   3 |  100.7   99.2 ( -1.490%) |  2.769  2.779 ( +0.361%)
dickens     gcc-4.8    4 |  101.9  101.8 ( -0.098%) |  2.827  2.841 ( +0.495%)
dickens     gcc-5      4 |   98.5   95.8 ( -2.741%) |  2.827  2.841 ( +0.495%)
dickens     gcc-6      4 |   98.9   96.9 ( -2.022%) |  2.827  2.841 ( +0.495%)
dickens     gcc-7      4 |   98.8   97.6 ( -1.215%) |  2.827  2.841 ( +0.495%)
dickens     gcc-8      4 |   98.6  101.7 ( +3.144%) |  2.827  2.841 ( +0.495%)
dickens     gcc-10     4 |   95.1  100.7 ( +5.889%) |  2.827  2.841 ( +0.495%)
dickens     clang-6.0  4 |  102.1  100.1 ( -1.959%) |  2.827  2.841 ( +0.495%)
dickens     clang-7    4 |   97.9   97.9 ( +0.000%) |  2.827  2.841 ( +0.495%)
dickens     clang-8    4 |  100.8   98.6 ( -2.183%) |  2.827  2.841 ( +0.495%)
dickens     clang-9    4 |   96.8   97.3 ( +0.517%) |  2.827  2.841 ( +0.495%)
dickens     clang-11   4 |  100.0   97.1 ( -2.900%) |  2.827  2.841 ( +0.495%)
dickens     clang-12   4 |   98.9   98.1 ( -0.809%) |  2.827  2.841 ( +0.495%)
enwik8      gcc-4.8    3 |  109.3  110.5 ( +1.098%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-5      3 |  104.8  106.3 ( +1.431%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-6      3 |  107.1  106.2 ( -0.840%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-7      3 |  105.0  106.1 ( +1.048%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-8      3 |  105.7  108.4 ( +2.554%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-10     3 |  105.1  108.1 ( +2.854%) |  2.809  2.820 ( +0.392%)
enwik8      clang-6.0  3 |  111.7  111.1 ( -0.537%) |  2.809  2.820 ( +0.392%)
enwik8      clang-7    3 |  107.6  109.6 ( +1.859%) |  2.809  2.820 ( +0.392%)
enwik8      clang-8    3 |  109.2  110.2 ( +0.916%) |  2.809  2.820 ( +0.392%)
enwik8      clang-9    3 |  109.0  109.0 ( +0.000%) |  2.809  2.820 ( +0.392%)
enwik8      clang-11   3 |  112.5  111.8 ( -0.622%) |  2.809  2.820 ( +0.392%)
enwik8      clang-12   3 |  107.5  109.9 ( +2.233%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-4.8    4 |  103.1  106.5 ( +3.298%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-5      4 |  100.3   99.7 ( -0.598%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-6      4 |  101.9  104.4 ( +2.453%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-7      4 |  102.8  100.8 ( -1.946%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-8      4 |   99.7  100.9 ( +1.204%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-10     4 |   99.3  104.3 ( +5.035%) |  2.864  2.877 ( +0.454%)
enwik8      clang-6.0  4 |  106.5  105.7 ( -0.751%) |  2.864  2.877 ( +0.454%)
enwik8      clang-7    4 |  102.4  103.5 ( +1.074%) |  2.864  2.877 ( +0.454%)
enwik8      clang-8    4 |  104.2  104.6 ( +0.384%) |  2.864  2.877 ( +0.454%)
enwik8      clang-9    4 |  102.4  106.4 ( +3.906%) |  2.864  2.877 ( +0.454%)
enwik8      clang-11   4 |  106.8  103.6 ( -2.996%) |  2.864  2.877 ( +0.454%)
enwik8      clang-12   4 |  105.2  104.1 ( -1.046%) |  2.864  2.877 ( +0.454%)
enwik9      gcc-4.8    3 |  121.7  117.3 ( -3.615%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-5      3 |  113.0  122.1 ( +8.053%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-6      3 |  120.3  123.2 ( +2.411%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-7      3 |  123.3  125.4 ( +1.703%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-8      3 |  124.2  121.7 ( -2.013%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-10     3 |  123.2  125.1 ( +1.542%) |  3.191  3.203 ( +0.376%)
enwik9      clang-6.0  3 |  124.8  125.1 ( +0.240%) |  3.191  3.203 ( +0.376%)
enwik9      clang-7    3 |  119.1  123.5 ( +3.694%) |  3.191  3.203 ( +0.376%)
enwik9      clang-8    3 |  119.5  119.2 ( -0.251%) |  3.191  3.203 ( +0.376%)
enwik9      clang-9    3 |  120.0  120.2 ( +0.167%) |  3.191  3.203 ( +0.376%)
enwik9      clang-11   3 |  123.0  124.6 ( +1.301%) |  3.191  3.203 ( +0.376%)
enwik9      clang-12   3 |  120.8  123.8 ( +2.483%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-4.8    4 |  114.4  111.9 ( -2.185%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-5      4 |  110.1  115.8 ( +5.177%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-6      4 |  115.5  114.4 ( -0.952%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-7      4 |  117.8  117.3 ( -0.424%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-8      4 |  113.5  116.7 ( +2.819%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-10     4 |  111.1  119.1 ( +7.201%) |  3.253  3.267 ( +0.430%)
enwik9      clang-6.0  4 |  117.7  120.0 ( +1.954%) |  3.253  3.267 ( +0.430%)
enwik9      clang-7    4 |  114.7  115.7 ( +0.872%) |  3.253  3.267 ( +0.430%)
enwik9      clang-8    4 |  113.8  118.9 ( +4.482%) |  3.253  3.267 ( +0.430%)
enwik9      clang-9    4 |  116.2  117.2 ( +0.861%) |  3.253  3.267 ( +0.430%)
enwik9      clang-11   4 |  117.1  118.4 ( +1.110%) |  3.253  3.267 ( +0.430%)
enwik9      clang-12   4 |  110.6  114.9 ( +3.888%) |  3.253  3.267 ( +0.430%)
mozilla     gcc-4.8    3 |  148.5  152.1 ( +2.424%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-5      3 |  147.3  152.5 ( +3.530%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-6      3 |  145.2  151.6 ( +4.408%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-7      3 |  149.7  154.8 ( +3.407%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-8      3 |  150.3  152.4 ( +1.397%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-10     3 |  147.5  154.4 ( +4.678%) |  2.768  2.771 ( +0.108%)
mozilla     clang-6.0  3 |  156.4  150.0 ( -4.092%) |  2.768  2.771 ( +0.108%)
mozilla     clang-7    3 |  147.5  153.0 ( +3.729%) |  2.768  2.771 ( +0.108%)
mozilla     clang-8    3 |  145.8  153.8 ( +5.487%) |  2.768  2.771 ( +0.108%)
mozilla     clang-9    3 |  151.1  149.0 ( -1.390%) |  2.768  2.771 ( +0.108%)
mozilla     clang-11   3 |  146.5  152.5 ( +4.096%) |  2.768  2.771 ( +0.108%)
mozilla     clang-12   3 |  145.4  151.8 ( +4.402%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-4.8    4 |  136.0  139.3 ( +2.426%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-5      4 |  135.2  140.3 ( +3.772%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-6      4 |  129.5  139.3 ( +7.568%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-7      4 |  140.2  142.4 ( +1.569%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-8      4 |  135.2  140.9 ( +4.216%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-10     4 |  137.0  141.9 ( +3.577%) |  2.798  2.801 ( +0.107%)
mozilla     clang-6.0  4 |  140.9  137.1 ( -2.697%) |  2.798  2.801 ( +0.107%)
mozilla     clang-7    4 |  133.5  139.8 ( +4.719%) |  2.798  2.801 ( +0.107%)
mozilla     clang-8    4 |  136.9  139.5 ( +1.899%) |  2.798  2.801 ( +0.107%)
mozilla     clang-9    4 |  137.0  137.6 ( +0.438%) |  2.798  2.801 ( +0.107%)
mozilla     clang-11   4 |  138.7  139.5 ( +0.577%) |  2.798  2.801 ( +0.107%)
mozilla     clang-12   4 |  136.6  144.4 ( +5.710%) |  2.798  2.801 ( +0.107%)
mr          gcc-4.8    3 |  117.3  115.8 ( -1.279%) |  2.811  2.810 ( -0.036%)
mr          gcc-5      3 |  115.4  117.5 ( +1.820%) |  2.811  2.810 ( -0.036%)
mr          gcc-6      3 |  118.3  115.9 ( -2.029%) |  2.811  2.810 ( -0.036%)
mr          gcc-7      3 |  120.4  119.2 ( -0.997%) |  2.811  2.810 ( -0.036%)
mr          gcc-8      3 |  118.4  119.6 ( +1.014%) |  2.811  2.810 ( -0.036%)
mr          gcc-10     3 |  116.3  116.2 ( -0.086%) |  2.811  2.810 ( -0.036%)
mr          clang-6.0  3 |  119.6  115.1 ( -3.763%) |  2.811  2.810 ( -0.036%)
mr          clang-7    3 |  112.6  115.0 ( +2.131%) |  2.811  2.810 ( -0.036%)
mr          clang-8    3 |  114.0  117.1 ( +2.719%) |  2.811  2.810 ( -0.036%)
mr          clang-9    3 |  114.6  114.1 ( -0.436%) |  2.811  2.810 ( -0.036%)
mr          clang-11   3 |  109.2  114.2 ( +4.579%) |  2.811  2.810 ( -0.036%)
mr          clang-12   3 |  115.1  114.2 ( -0.782%) |  2.811  2.810 ( -0.036%)
mr          gcc-4.8    4 |  111.8  108.7 ( -2.773%) |  2.861  2.859 ( -0.070%)
mr          gcc-5      4 |  114.0  112.5 ( -1.316%) |  2.861  2.859 ( -0.070%)
mr          gcc-6      4 |  110.2  114.1 ( +3.539%) |  2.861  2.859 ( -0.070%)
mr          gcc-7      4 |  111.2  110.0 ( -1.079%) |  2.861  2.859 ( -0.070%)
mr          gcc-8      4 |  110.8  115.3 ( +4.061%) |  2.861  2.859 ( -0.070%)
mr          gcc-10     4 |  109.4  107.0 ( -2.194%) |  2.861  2.859 ( -0.070%)
mr          clang-6.0  4 |  115.7  109.3 ( -5.532%) |  2.861  2.859 ( -0.070%)
mr          clang-7    4 |  108.9  109.8 ( +0.826%) |  2.861  2.859 ( -0.070%)
mr          clang-8    4 |  110.1  108.4 ( -1.544%) |  2.861  2.859 ( -0.070%)
mr          clang-9    4 |  109.0  108.3 ( -0.642%) |  2.861  2.859 ( -0.070%)
mr          clang-11   4 |  115.2  107.4 ( -6.771%) |  2.861  2.859 ( -0.070%)
mr          clang-12   4 |  112.0  111.9 ( -0.089%) |  2.861  2.859 ( -0.070%)
nci         gcc-4.8    3 |  419.4  412.4 ( -1.669%) | 11.740 11.800 ( +0.511%)
nci         gcc-5      3 |  409.7  415.5 ( +1.416%) | 11.740 11.800 ( +0.511%)
nci         gcc-6      3 |  413.8  415.2 ( +0.338%) | 11.740 11.800 ( +0.511%)
nci         gcc-7      3 |  417.2  413.6 ( -0.863%) | 11.740 11.800 ( +0.511%)
nci         gcc-8      3 |  410.4  413.1 ( +0.658%) | 11.740 11.800 ( +0.511%)
nci         gcc-10     3 |  416.2  408.7 ( -1.802%) | 11.740 11.800 ( +0.511%)
nci         clang-6.0  3 |  424.2  399.5 ( -5.823%) | 11.740 11.800 ( +0.511%)
nci         clang-7    3 |  419.5  422.3 ( +0.667%) | 11.740 11.800 ( +0.511%)
nci         clang-8    3 |  433.3  413.4 ( -4.593%) | 11.740 11.800 ( +0.511%)
nci         clang-9    3 |  433.1  424.2 ( -2.055%) | 11.740 11.800 ( +0.511%)
nci         clang-11   3 |  438.4  412.1 ( -5.999%) | 11.740 11.800 ( +0.511%)
nci         clang-12   3 |  426.4  423.3 ( -0.727%) | 11.740 11.800 ( +0.511%)
nci         gcc-4.8    4 |  423.8  424.0 ( +0.047%) | 11.750 11.800 ( +0.426%)
nci         gcc-5      4 |  420.2  422.8 ( +0.619%) | 11.750 11.800 ( +0.426%)
nci         gcc-6      4 |  389.7  409.8 ( +5.158%) | 11.750 11.800 ( +0.426%)
nci         gcc-7      4 |  425.4  421.3 ( -0.964%) | 11.750 11.800 ( +0.426%)
nci         gcc-8      4 |  418.1  421.9 ( +0.909%) | 11.750 11.800 ( +0.426%)
nci         gcc-10     4 |  425.0  418.0 ( -1.647%) | 11.750 11.800 ( +0.426%)
nci         clang-6.0  4 |  435.3  394.6 ( -9.350%) | 11.750 11.800 ( +0.426%)
nci         clang-7    4 |  427.3  424.4 ( -0.679%) | 11.750 11.800 ( +0.426%)
nci         clang-8    4 |  444.6  417.7 ( -6.050%) | 11.750 11.800 ( +0.426%)
nci         clang-9    4 |  441.1  419.2 ( -4.965%) | 11.750 11.800 ( +0.426%)
nci         clang-11   4 |  437.2  419.2 ( -4.117%) | 11.750 11.800 ( +0.426%)
nci         clang-12   4 |  426.8  406.0 ( -4.873%) | 11.750 11.800 ( +0.426%)
ooffice     gcc-4.8    3 |   99.6  107.4 ( +7.831%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-5      3 |   98.6  108.6 (+10.142%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-6      3 |  101.1  103.9 ( +2.770%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-7      3 |  101.9  108.6 ( +6.575%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-8      3 |  100.3  107.8 ( +7.478%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-10     3 |  101.0  109.6 ( +8.515%) |  1.956  1.957 ( +0.051%)
ooffice     clang-6.0  3 |  108.1  107.8 ( -0.278%) |  1.956  1.957 ( +0.051%)
ooffice     clang-7    3 |   93.5  105.5 (+12.834%) |  1.956  1.957 ( +0.051%)
ooffice     clang-8    3 |   92.6  104.3 (+12.635%) |  1.956  1.957 ( +0.051%)
ooffice     clang-9    3 |   97.0  107.5 (+10.825%) |  1.956  1.957 ( +0.051%)
ooffice     clang-11   3 |   96.0  108.0 (+12.500%) |  1.956  1.957 ( +0.051%)
ooffice     clang-12   3 |   98.3  104.4 ( +6.205%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-4.8    4 |   94.2   98.5 ( +4.565%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-5      4 |   94.2   98.6 ( +4.671%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-6      4 |   93.1   96.2 ( +3.330%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-7      4 |   92.8   99.7 ( +7.435%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-8      4 |   90.9   98.5 ( +8.361%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-10     4 |   94.1   99.6 ( +5.845%) |  2.003  2.004 ( +0.050%)
ooffice     clang-6.0  4 |   98.0   99.4 ( +1.429%) |  2.003  2.004 ( +0.050%)
ooffice     clang-7    4 |   89.1   97.7 ( +9.652%) |  2.003  2.004 ( +0.050%)
ooffice     clang-8    4 |   90.8   94.9 ( +4.515%) |  2.003  2.004 ( +0.050%)
ooffice     clang-9    4 |   90.2   97.4 ( +7.982%) |  2.003  2.004 ( +0.050%)
ooffice     clang-11   4 |   91.6   99.3 ( +8.406%) |  2.003  2.004 ( +0.050%)
ooffice     clang-12   4 |   91.0  101.3 (+11.319%) |  2.003  2.004 ( +0.050%)
osdb        gcc-4.8    3 |  142.6  145.0 ( +1.683%) |  2.867  2.876 ( +0.314%)
osdb        gcc-5      3 |  134.2  148.3 (+10.507%) |  2.867  2.876 ( +0.314%)
osdb        gcc-6      3 |  140.9  145.1 ( +2.981%) |  2.867  2.876 ( +0.314%)
osdb        gcc-7      3 |  138.7  142.1 ( +2.451%) |  2.867  2.876 ( +0.314%)
osdb        gcc-8      3 |  136.4  143.4 ( +5.132%) |  2.867  2.876 ( +0.314%)
osdb        gcc-10     3 |  136.8  145.7 ( +6.506%) |  2.867  2.876 ( +0.314%)
osdb        clang-6.0  3 |  141.3  145.4 ( +2.902%) |  2.867  2.876 ( +0.314%)
osdb        clang-7    3 |  137.9  150.4 ( +9.065%) |  2.867  2.876 ( +0.314%)
osdb        clang-8    3 |  132.5  147.7 (+11.472%) |  2.867  2.876 ( +0.314%)
osdb        clang-9    3 |  135.6  139.3 ( +2.729%) |  2.867  2.876 ( +0.314%)
osdb        clang-11   3 |  134.9  151.0 (+11.935%) |  2.867  2.876 ( +0.314%)
osdb        clang-12   3 |  129.2  141.1 ( +9.211%) |  2.867  2.876 ( +0.314%)
osdb        gcc-4.8    4 |  127.3  132.4 ( +4.006%) |  2.885  2.895 ( +0.347%)
osdb        gcc-5      4 |  123.3  135.7 (+10.057%) |  2.885  2.895 ( +0.347%)
osdb        gcc-6      4 |  124.5  133.6 ( +7.309%) |  2.885  2.895 ( +0.347%)
osdb        gcc-7      4 |  125.1  133.7 ( +6.875%) |  2.885  2.895 ( +0.347%)
osdb        gcc-8      4 |  121.4  136.8 (+12.685%) |  2.885  2.895 ( +0.347%)
osdb        gcc-10     4 |  124.8  142.6 (+14.263%) |  2.885  2.895 ( +0.347%)
osdb        clang-6.0  4 |  132.6  135.4 ( +2.112%) |  2.885  2.895 ( +0.347%)
osdb        clang-7    4 |  129.4  134.9 ( +4.250%) |  2.885  2.895 ( +0.347%)
osdb        clang-8    4 |  130.9  135.2 ( +3.285%) |  2.885  2.895 ( +0.347%)
osdb        clang-9    4 |  120.0  132.5 (+10.417%) |  2.885  2.895 ( +0.347%)
osdb        clang-11   4 |  129.3  138.6 ( +7.193%) |  2.885  2.895 ( +0.347%)
osdb        clang-12   4 |  122.2  131.8 ( +7.856%) |  2.885  2.895 ( +0.347%)
reymont     gcc-4.8    3 |  127.7  117.1 ( -8.301%) |  3.392  3.413 ( +0.619%)
reymont     gcc-5      3 |  123.5  124.1 ( +0.486%) |  3.392  3.413 ( +0.619%)
reymont     gcc-6      3 |  130.6  131.0 ( +0.306%) |  3.392  3.413 ( +0.619%)
reymont     gcc-7      3 |  127.5  129.9 ( +1.882%) |  3.392  3.413 ( +0.619%)
reymont     gcc-8      3 |  127.1  122.1 ( -3.934%) |  3.392  3.413 ( +0.619%)
reymont     gcc-10     3 |  124.3  126.0 ( +1.368%) |  3.392  3.413 ( +0.619%)
reymont     clang-6.0  3 |  127.6  127.6 ( +0.000%) |  3.392  3.413 ( +0.619%)
reymont     clang-7    3 |  125.3  126.8 ( +1.197%) |  3.392  3.413 ( +0.619%)
reymont     clang-8    3 |  127.1  126.7 ( -0.315%) |  3.392  3.413 ( +0.619%)
reymont     clang-9    3 |  126.1  124.5 ( -1.269%) |  3.392  3.413 ( +0.619%)
reymont     clang-11   3 |  124.5  125.5 ( +0.803%) |  3.392  3.413 ( +0.619%)
reymont     clang-12   3 |  122.8  125.9 ( +2.524%) |  3.392  3.413 ( +0.619%)
reymont     gcc-4.8    4 |  127.7  119.0 ( -6.813%) |  3.429  3.453 ( +0.700%)
reymont     gcc-5      4 |  123.6  125.6 ( +1.618%) |  3.429  3.453 ( +0.700%)
reymont     gcc-6      4 |  128.9  135.8 ( +5.353%) |  3.429  3.453 ( +0.700%)
reymont     gcc-7      4 |  128.7  130.0 ( +1.010%) |  3.429  3.453 ( +0.700%)
reymont     gcc-8      4 |  133.3  119.9 (-10.053%) |  3.429  3.453 ( +0.700%)
reymont     gcc-10     4 |  124.7  124.4 ( -0.241%) |  3.429  3.453 ( +0.700%)
reymont     clang-6.0  4 |  130.1  129.6 ( -0.384%) |  3.429  3.453 ( +0.700%)
reymont     clang-7    4 |  128.6  126.2 ( -1.866%) |  3.429  3.453 ( +0.700%)
reymont     clang-8    4 |  129.0  127.8 ( -0.930%) |  3.429  3.453 ( +0.700%)
reymont     clang-9    4 |  129.6  122.3 ( -5.633%) |  3.429  3.453 ( +0.700%)
reymont     clang-11   4 |  127.9  127.1 ( -0.625%) |  3.429  3.453 ( +0.700%)
reymont     clang-12   4 |  125.7  126.8 ( +0.875%) |  3.429  3.453 ( +0.700%)
samba       gcc-4.8    3 |  201.9  206.8 ( +2.427%) |  4.320  4.342 ( +0.509%)
samba       gcc-5      3 |  201.4  211.6 ( +5.065%) |  4.320  4.342 ( +0.509%)
samba       gcc-6      3 |  205.6  208.4 ( +1.362%) |  4.320  4.342 ( +0.509%)
samba       gcc-7      3 |  205.3  205.6 ( +0.146%) |  4.320  4.342 ( +0.509%)
samba       gcc-8      3 |  205.7  210.4 ( +2.285%) |  4.320  4.342 ( +0.509%)
samba       gcc-10     3 |  204.8  202.8 ( -0.977%) |  4.320  4.342 ( +0.509%)
samba       clang-6.0  3 |  209.9  201.9 ( -3.811%) |  4.320  4.342 ( +0.509%)
samba       clang-7    3 |  201.3  207.8 ( +3.229%) |  4.320  4.342 ( +0.509%)
samba       clang-8    3 |  196.2  200.8 ( +2.345%) |  4.320  4.342 ( +0.509%)
samba       clang-9    3 |  200.8  204.5 ( +1.843%) |  4.320  4.342 ( +0.509%)
samba       clang-11   3 |  202.7  207.8 ( +2.516%) |  4.320  4.342 ( +0.509%)
samba       clang-12   3 |  202.0  200.6 ( -0.693%) |  4.320  4.342 ( +0.509%)
samba       gcc-4.8    4 |  194.5  200.7 ( +3.188%) |  4.349  4.373 ( +0.552%)
samba       gcc-5      4 |  190.0  206.1 ( +8.474%) |  4.349  4.373 ( +0.552%)
samba       gcc-6      4 |  198.1  193.5 ( -2.322%) |  4.349  4.373 ( +0.552%)
samba       gcc-7      4 |  199.3  187.9 ( -5.720%) |  4.349  4.373 ( +0.552%)
samba       gcc-8      4 |  190.6  192.5 ( +0.997%) |  4.349  4.373 ( +0.552%)
samba       gcc-10     4 |  194.9  193.9 ( -0.513%) |  4.349  4.373 ( +0.552%)
samba       clang-6.0  4 |  196.0  188.5 ( -3.827%) |  4.349  4.373 ( +0.552%)
samba       clang-7    4 |  188.4  196.8 ( +4.459%) |  4.349  4.373 ( +0.552%)
samba       clang-8    4 |  180.4  192.2 ( +6.541%) |  4.349  4.373 ( +0.552%)
samba       clang-9    4 |  195.8  194.1 ( -0.868%) |  4.349  4.373 ( +0.552%)
samba       clang-11   4 |  196.2  195.2 ( -0.510%) |  4.349  4.373 ( +0.552%)
samba       clang-12   4 |  192.1  195.2 ( +1.614%) |  4.349  4.373 ( +0.552%)
sao         gcc-4.8    3 |   75.3   84.7 (+12.483%) |  1.306  1.306 ( +0.000%)
sao         gcc-5      3 |   76.7   86.6 (+12.907%) |  1.306  1.306 ( +0.000%)
sao         gcc-6      3 |   76.4   81.4 ( +6.545%) |  1.306  1.306 ( +0.000%)
sao         gcc-7      3 |   73.7   85.8 (+16.418%) |  1.306  1.306 ( +0.000%)
sao         gcc-8      3 |   74.8   81.2 ( +8.556%) |  1.306  1.306 ( +0.000%)
sao         gcc-10     3 |   73.8   78.6 ( +6.504%) |  1.306  1.306 ( +0.000%)
sao         clang-6.0  3 |   81.4   84.4 ( +3.686%) |  1.306  1.306 ( +0.000%)
sao         clang-7    3 |   71.7   84.7 (+18.131%) |  1.306  1.306 ( +0.000%)
sao         clang-8    3 |   71.3   83.1 (+16.550%) |  1.306  1.306 ( +0.000%)
sao         clang-9    3 |   72.5   84.0 (+15.862%) |  1.306  1.306 ( +0.000%)
sao         clang-11   3 |   73.4   86.9 (+18.392%) |  1.306  1.306 ( +0.000%)
sao         clang-12   3 |   72.9   85.9 (+17.833%) |  1.306  1.306 ( +0.000%)
sao         gcc-4.8    4 |   69.7   77.9 (+11.765%) |  1.337  1.337 ( +0.000%)
sao         gcc-5      4 |   69.6   77.4 (+11.207%) |  1.337  1.337 ( +0.000%)
sao         gcc-6      4 |   70.5   74.8 ( +6.099%) |  1.337  1.337 ( +0.000%)
sao         gcc-7      4 |   68.7   75.1 ( +9.316%) |  1.337  1.337 ( +0.000%)
sao         gcc-8      4 |   69.0   74.3 ( +7.681%) |  1.337  1.337 ( +0.000%)
sao         gcc-10     4 |   66.8   71.8 ( +7.485%) |  1.337  1.337 ( +0.000%)
sao         clang-6.0  4 |   73.8   74.7 ( +1.220%) |  1.337  1.337 ( +0.000%)
sao         clang-7    4 |   65.4   74.8 (+14.373%) |  1.337  1.337 ( +0.000%)
sao         clang-8    4 |   64.8   73.0 (+12.654%) |  1.337  1.337 ( +0.000%)
sao         clang-9    4 |   62.0   77.4 (+24.839%) |  1.337  1.337 ( +0.000%)
sao         clang-11   4 |   67.7   77.3 (+14.180%) |  1.337  1.337 ( +0.000%)
sao         clang-12   4 |   68.2   76.0 (+11.437%) |  1.337  1.337 ( +0.000%)
webster     gcc-4.8    3 |  126.6  126.8 ( +0.158%) |  3.403  3.420 ( +0.500%)
webster     gcc-5      3 |  117.1  124.2 ( +6.063%) |  3.403  3.420 ( +0.500%)
webster     gcc-6      3 |  122.9  124.0 ( +0.895%) |  3.403  3.420 ( +0.500%)
webster     gcc-7      3 |  126.3  125.5 ( -0.633%) |  3.403  3.420 ( +0.500%)
webster     gcc-8      3 |  127.6  124.4 ( -2.508%) |  3.403  3.420 ( +0.500%)
webster     gcc-10     3 |  123.5  124.7 ( +0.972%) |  3.403  3.420 ( +0.500%)
webster     clang-6.0  3 |  128.3  122.5 ( -4.521%) |  3.403  3.420 ( +0.500%)
webster     clang-7    3 |  124.3  125.2 ( +0.724%) |  3.403  3.420 ( +0.500%)
webster     clang-8    3 |  121.0  120.2 ( -0.661%) |  3.403  3.420 ( +0.500%)
webster     clang-9    3 |  123.1  122.3 ( -0.650%) |  3.403  3.420 ( +0.500%)
webster     clang-11   3 |  120.7  118.8 ( -1.574%) |  3.403  3.420 ( +0.500%)
webster     clang-12   3 |  117.9  122.1 ( +3.562%) |  3.403  3.420 ( +0.500%)
webster     gcc-4.8    4 |  124.4  122.4 ( -1.608%) |  3.455  3.475 ( +0.579%)
webster     gcc-5      4 |  114.5  121.4 ( +6.026%) |  3.455  3.475 ( +0.579%)
webster     gcc-6      4 |  118.2  118.1 ( -0.085%) |  3.455  3.475 ( +0.579%)
webster     gcc-7      4 |  120.9  119.8 ( -0.910%) |  3.455  3.475 ( +0.579%)
webster     gcc-8      4 |  121.0  122.4 ( +1.157%) |  3.455  3.475 ( +0.579%)
webster     gcc-10     4 |  124.3  119.6 ( -3.781%) |  3.455  3.475 ( +0.579%)
webster     clang-6.0  4 |  124.6  120.1 ( -3.612%) |  3.455  3.475 ( +0.579%)
webster     clang-7    4 |  122.6  119.1 ( -2.855%) |  3.455  3.475 ( +0.579%)
webster     clang-8    4 |  120.2  118.0 ( -1.830%) |  3.455  3.475 ( +0.579%)
webster     clang-9    4 |  118.9  121.1 ( +1.850%) |  3.455  3.475 ( +0.579%)
webster     clang-11   4 |  116.2  120.7 ( +3.873%) |  3.455  3.475 ( +0.579%)
webster     clang-12   4 |  121.4  121.2 ( -0.165%) |  3.455  3.475 ( +0.579%)
xml         gcc-4.8    3 |  313.2  308.7 ( -1.437%) |  8.357  8.363 ( +0.072%)
xml         gcc-5      3 |  306.9  313.1 ( +2.020%) |  8.357  8.363 ( +0.072%)
xml         gcc-6      3 |  300.0  304.7 ( +1.567%) |  8.357  8.363 ( +0.072%)
xml         gcc-7      3 |  313.5  312.9 ( -0.191%) |  8.357  8.363 ( +0.072%)
xml         gcc-8      3 |  315.3  315.4 ( +0.032%) |  8.357  8.363 ( +0.072%)
xml         gcc-10     3 |  308.6  310.1 ( +0.486%) |  8.357  8.363 ( +0.072%)
xml         clang-6.0  3 |  318.1  311.3 ( -2.138%) |  8.357  8.363 ( +0.072%)
xml         clang-7    3 |  309.8  308.6 ( -0.387%) |  8.357  8.363 ( +0.072%)
xml         clang-8    3 |  311.6  310.6 ( -0.321%) |  8.357  8.363 ( +0.072%)
xml         clang-9    3 |  313.1  312.9 ( -0.064%) |  8.357  8.363 ( +0.072%)
xml         clang-11   3 |  321.2  318.7 ( -0.778%) |  8.357  8.363 ( +0.072%)
xml         clang-12   3 |  315.9  315.1 ( -0.253%) |  8.357  8.363 ( +0.072%)
xml         gcc-4.8    4 |  313.2  317.0 ( +1.213%) |  8.384  8.390 ( +0.072%)
xml         gcc-5      4 |  305.4  313.7 ( +2.718%) |  8.384  8.390 ( +0.072%)
xml         gcc-6      4 |  292.7  311.8 ( +6.525%) |  8.384  8.390 ( +0.072%)
xml         gcc-7      4 |  310.0  312.3 ( +0.742%) |  8.384  8.390 ( +0.072%)
xml         gcc-8      4 |  316.9  313.2 ( -1.168%) |  8.384  8.390 ( +0.072%)
xml         gcc-10     4 |  310.3  308.9 ( -0.451%) |  8.384  8.390 ( +0.072%)
xml         clang-6.0  4 |  319.9  315.7 ( -1.313%) |  8.384  8.390 ( +0.072%)
xml         clang-7    4 |  310.3  309.6 ( -0.226%) |  8.384  8.390 ( +0.072%)
xml         clang-8    4 |  316.3  292.8 ( -7.430%) |  8.384  8.390 ( +0.072%)
xml         clang-9    4 |  319.4  317.2 ( -0.689%) |  8.384  8.390 ( +0.072%)
xml         clang-11   4 |  331.8  317.2 ( -4.400%) |  8.384  8.390 ( +0.072%)
xml         clang-12   4 |  315.3  319.7 ( +1.395%) |  8.384  8.390 ( +0.072%)
x-ray       gcc-4.8    3 |   71.3   77.0 ( +7.994%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-5      3 |   69.8   77.5 (+11.032%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-6      3 |   70.6   77.1 ( +9.207%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-7      3 |   68.6   75.6 (+10.204%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-8      3 |   67.4   74.4 (+10.386%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-10     3 |   68.6   72.3 ( +5.394%) |  1.393  1.393 ( +0.000%)
x-ray       clang-6.0  3 |   75.7   76.3 ( +0.793%) |  1.393  1.393 ( +0.000%)
x-ray       clang-7    3 |   64.9   79.3 (+22.188%) |  1.393  1.393 ( +0.000%)
x-ray       clang-8    3 |   70.9   77.4 ( +9.168%) |  1.393  1.393 ( +0.000%)
x-ray       clang-9    3 |   66.1   74.8 (+13.162%) |  1.393  1.393 ( +0.000%)
x-ray       clang-11   3 |   68.2   75.3 (+10.411%) |  1.393  1.393 ( +0.000%)
x-ray       clang-12   3 |   66.3   76.8 (+15.837%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-4.8    4 |   65.8   68.4 ( +3.951%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-5      4 |   65.2   68.0 ( +4.294%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-6      4 |   64.9   67.0 ( +3.236%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-7      4 |   62.0   64.9 ( +4.677%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-8      4 |   62.4   66.3 ( +6.250%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-10     4 |   62.7   67.0 ( +6.858%) |  1.484  1.484 ( +0.000%)
x-ray       clang-6.0  4 |   67.9   65.3 ( -3.829%) |  1.484  1.484 ( +0.000%)
x-ray       clang-7    4 |   64.5   69.0 ( +6.977%) |  1.484  1.484 ( +0.000%)
x-ray       clang-8    4 |   66.1   70.8 ( +7.110%) |  1.484  1.484 ( +0.000%)
x-ray       clang-9    4 |   61.9   67.7 ( +9.370%) |  1.484  1.484 ( +0.000%)
x-ray       clang-11   4 |   64.7   67.4 ( +4.173%) |  1.484  1.484 ( +0.000%)
x-ray       clang-12   4 |   62.4   67.0 ( +7.372%) |  1.484  1.484 ( +0.000%)
silesia.tar gcc-4.8    3 |  146.2  149.0 ( +1.915%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-5      3 |  142.0  139.1 ( -2.042%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-6      3 |  146.6  150.0 ( +2.319%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-7      3 |  143.5  147.6 ( +2.857%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-8      3 |  144.8  145.5 ( +0.483%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-10     3 |  143.1  146.2 ( +2.166%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-6.0  3 |  147.7  147.3 ( -0.271%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-7    3 |  142.6  148.0 ( +3.787%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-8    3 |  141.3  150.4 ( +6.440%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-9    3 |  143.6  150.2 ( +4.596%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-11   3 |  143.7  149.5 ( +4.036%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-12   3 |  142.8  149.4 ( +4.622%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-4.8    4 |  135.9  139.8 ( +2.870%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-5      4 |  134.4  138.1 ( +2.753%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-6      4 |  134.2  139.7 ( +4.098%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-7      4 |  135.5  140.1 ( +3.395%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-8      4 |  137.0  140.0 ( +2.190%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-10     4 |  138.3  136.0 ( -1.663%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-6.0  4 |  141.4  138.2 ( -2.263%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-7    4 |  137.9  142.2 ( +3.118%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-8    4 |  134.8  140.7 ( +4.377%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-9    4 |  140.0  140.7 ( +0.500%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-11   4 |  135.9  138.3 ( +1.766%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-12   4 |  138.2  132.2 ( -4.342%) |  3.237  3.246 ( +0.278%)

Benchmarked on, as usual, an Intel Xeon E5-2680 v4 @ 2.40GHz.

On the whole we see improvements in ratio, and improvements on speed on less-compressible inputs. It seems like very compressible inputs are neutral on speed of even maybe slightly slower.

Status

This PR is believed to be speed-positive, ratio-positive, and correct.

To-Do:

lib/compress/zstd_double_fast.c

Cyan4973 · 2021-09-15T16:59:00Z

lib/compress/zstd_double_fast.c

+        if (offset_1 > maxRep) offsetSaved = offset_1, offset_1 = 0;
+    }
+
+_start:


Could it be written as a loop, rather than a goto ?

Note : this may impact some variables' scope.

Yeah, I thought about it. To a rough approximation, the original implementation looks like this:

size_t dfast() { while (ip < ilimit) { if (search()) goto _match; continue; _match: store(); } return; }

This PR changes it to something like this:

size_t dfast() { _start: if (ip >= ilimit) goto _cleanup; init(); do { if (search()) goto _match; } while (ip < ilimit); _cleanup: return; _match: store(); goto _start; }

Admittedly, this abuses gotos a bunch. It is quite close to the assembly that gets generated though, which helps me think about the flow.

I believe it could be rewritten to instead look like this:

size_t dfast() { if (ip < ilimit) { init(); do { if (search()) goto _match; continue; _match: store(); init(); } while (ip < ilimit); } return; }

This solves the abuse of goto, but has two init() blocks, which I don't like. I'm also somewhat attracted to moving the match code outside of the tight loop because it's then clearer what the hot loop is.

Alternatively, it could maybe be structured like this, which only has one init() block, but has two loops nested.

I think this would improve the variable scoping concerns.

size_t dfast() { while (ip < ilimit) { init(); do { if (search()) goto _match; } while (ip < ilimit); break; _match: store(); } return; }

I have a version of this last approach here. Performance looks good on gcc-10, but it's slower on clang-12. :/

Without modifying the general structure of the current code,
could the _start: / goto _start (and just this one) be converted into a loop ?

It seems it wouldn't impact the code structure, hence should be essentially equivalent for the compiler,

yet it would reduce the nb of goto
and allows a (slight) reduction in scope of several variables
that don't need to retain their values between loop iterations.

Cyan4973 · 2021-09-15T18:49:59Z

The algorithm itself looks fine.
I'll go on later trying to benchmark it to confirm the improvements.

In term of coding style, there is a heavy reliance on goto statements.
I'm not "firmly opposed" to goto, they have their use, but that doesn't mean I like to see many of them around.
This situation seems to have consequences on variable lifetimes,
which are then extended to the entire function,
making it more difficult to track their usage and role.

I wonder if that's always necessary.
Whenever a logic could also be explained easily with loop with a well defined scope,
I believe it's preferable for maintenance.

An improvement could be to convert "some" of these goto into loops,
whenever it feels easy enough to convert.

Aside from maybe a latency win in the loop, this means that when we find a short match, we've already done the hash we need to check the next long match.

This costs a little ratio, unfortunately.

Since we're now hashing the position ahead even if we find a long match and don't search that next position, we can write it back into the hashtable even in long matches. This seems to cost us no speed, and improves compression ratio slightly!

This lookup can be advanced to before the short match check because either way we will use it (in the next loop iter or in `_search_next_long`).

This test depended on `_extDict` and `_noDict` compressing identically, which is not a guarantee we make, AFAIK.

Cyan4973 · 2021-10-07T16:11:53Z

benchmark feedback :
I confirm seeing this PR improving both speed and compression ratio of dfast strategy.
Surprisingly (to me), the compression ratio is improved more than I expected, while the compression speed is improved less than I anticipated. Maybe some of the speed gains is consumed into more search or parsing work leading to the compression ratio gain ? (not clear, I need to look at the code in more details).

Anyway, both impacts are fairly small, and both are positive. So, from a measurement perspective, it looks like a pure improvement.

Cyan4973 · 2021-10-13T17:51:16Z

It's a pity that you could not use the new loop to reduce the scope of some local variables, but as you mentioned that it does negatively impact performance, I guess we'll leave it there.

Another small comments is that I noticed in results.csv that, when it comes to "smaller" data (~100 KB), there are several samples where the compression ratio ends up being worse. The impact seems to remain small, so it's not a deal breaker, but this is a good reminder that the "extra work" of updating hash tables more often doesn't necessarily translate into better compression ratio. Consequences are more fuzzy.

facebook-github-bot added the CLA Signed label Sep 9, 2021

felixhandte added the optimization label Sep 14, 2021

senhuang42 reviewed Sep 15, 2021

View reviewed changes

lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved

senhuang42 reviewed Sep 15, 2021

View reviewed changes

lib/compress/zstd_double_fast.c Show resolved Hide resolved

Cyan4973 reviewed Sep 15, 2021

View reviewed changes

lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved

Cyan4973 reviewed Sep 15, 2021

View reviewed changes

felixhandte changed the title ~~[WIP] Pipelined Implementation of ZSTD_dfast~~ Pipelined Implementation of ZSTD_dfast Sep 28, 2021

felixhandte added 16 commits October 5, 2021 14:54

Extract Single-Segment Variant of ZSTD_dfast

258c062

Track Step Rather than Recalculating (+0.5% Speed)

1bdf041

Extract Working Variables

072ffaa

Pull Match Found Stuff Out of the Loop

a1ac720

Hash Long One Position Ahead (+2.5% Speed)

db4e1b5

Aside from maybe a latency win in the loop, this means that when we find a short match, we've already done the hash we need to check the next long match.

Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed)

39f2491

This costs a little ratio, unfortunately.

Advance Long Index Lookup (+0.5% Speed)

6ae44c0

This lookup can be advanced to before the short match check because either way we will use it (in the next loop iter or in `_search_next_long`).

Search One Last Position

2cdfad5

Nit: Unnest Blocks that Don't Declare Anything

47fd762

Nit: Rename Function

fcab484

Fall Back in _extDict to New _noDict Rather than Old Merged Impl

051b473

Simplify DMS Implementation by Removing noDict Support

62536ef

Update results.csv

c2c3283

Fix Flaky Test

168d0a3

This test depended on `_extDict` and `_noDict` compressing identically, which is not a guarantee we make, AFAIK.

Style: Add Comments to Variables and Move a Couple into the Loop

79ca830

felixhandte force-pushed the zstd-dfast-pipelined-single branch from 9c6e56f to 79ca830 Compare October 5, 2021 20:18

Convert Outer Control Structure to Loop

0bfc935

Cyan4973 approved these changes Oct 13, 2021

View reviewed changes

felixhandte merged commit 23c1a2d into facebook:dev Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelined Implementation of ZSTD_dfast #2774

Pipelined Implementation of ZSTD_dfast #2774

felixhandte commented Sep 9, 2021 •

edited

Loading

Cyan4973 Sep 15, 2021 •

edited

Loading

felixhandte Sep 15, 2021

felixhandte Sep 15, 2021

Cyan4973 Oct 7, 2021

Cyan4973 commented Sep 15, 2021

Cyan4973 commented Oct 7, 2021 •

edited

Loading

Cyan4973 commented Oct 13, 2021

Pipelined Implementation of ZSTD_dfast #2774

Pipelined Implementation of ZSTD_dfast #2774

Conversation

felixhandte commented Sep 9, 2021 • edited Loading

Description

Benchmarks

Status

To-Do:

Cyan4973 Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

felixhandte Sep 15, 2021

Choose a reason for hiding this comment

felixhandte Sep 15, 2021

Choose a reason for hiding this comment

Cyan4973 Oct 7, 2021

Choose a reason for hiding this comment

Cyan4973 commented Sep 15, 2021

Cyan4973 commented Oct 7, 2021 • edited Loading

Cyan4973 commented Oct 13, 2021

felixhandte commented Sep 9, 2021 •

edited

Loading

Cyan4973 Sep 15, 2021 •

edited

Loading

Cyan4973 commented Oct 7, 2021 •

edited

Loading