-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lib] Add ZSTD_c_deterministicRefPrefix #2616
Conversation
efea18d
to
1a30e62
Compare
We have an internal use case currently, |
@@ -559,6 +559,11 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter param) | |||
bounds.upperBound = (int)ZSTD_urm_enableRowMatchFinder; | |||
return bounds; | |||
|
|||
case ZSTD_c_deterministicRefPrefix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened question :
flag name : ZSTD_c_deterministicRefPrefix
the name describes pretty well the objective, not the how.
In general, this is a good rule to select a name.
However, in this case, I was wondering :
in which case would anyone select "no" to this property ?
"Do you want a deterministic output ?" "No, I'm fine with unpredictable randomness"
I guess, one only answers "no" if there is something in exchange.
In this case, the only answer I could think of would probably be more speed
(and I don't even know how much speed we are talking about, it could be negligible).
Even then, making the effort to benefit from this speed only makes sense if the user consistently organizes layout so that the prefix always stands just before the destination buffer. In which case ... the output is deterministic.
So it feels strange to have to answer "no" to ZSTD_c_deterministicRefPrefix
in order to benefit from better speed by concatenating output buffer right after dictionary prefix.
So, I was wondering : presuming someone would be interested in better speed, by ensuring that dictionary prefix and output buffer are contiguous, what would be the "right" parameter name that would feel "correct"?
I could imagine a 3-stages parameter : always separated, always contiguous (in which case it must fail if buffers are not contiguous), automatic (hence non deterministic).
Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be fine to add this, it would be simple to do. But, it does add more complexity to the API, so there should be some gain, like speed.
I guess that means I should measure the speed difference between ext-dict & prefix dictionaries... I will do that, and if there is a significant gain, I can extend the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall measuring something like a 5% speed gain on a subset of the httparchive
corpus with a 16KB dict level 1, for contiguous vs. non-contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix-mode is ~2.5% faster at level 1, 1.3% faster at level 3, and 2.5% faster at level 5.
I think that loss is not worth the complexity, especially because the user would probably have to copy their buffers to ensure they're contiguous. I will update the comments to explain the tradeoff a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with the outcome.
I'm surprised though that prefix mode (which I understand as "contiguous buffers") ends up being slower at level 5. The expectation was that it would be at least as fast, if only by a tiny margin, but not slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised though that prefix mode (which I understand as "contiguous buffers") ends up being slower at level 5. The expectation was that it would be at least as fast, if only by a tiny margin, but not slower.
That was a typo, it is 2.5% faster.
1a30e62
to
172b4b6
Compare
This flag forces zstd to always load the prefix in ext-dict mode, even if it happens to be contiguous, to force determinism. It also applies to dictionaries that are re-processed. A determinism test case is also added, which fails without `ZSTD_c_deterministicRefPrefix` and passes with it set. Question: Should this be the default behavior? It isn't in this PR.
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.
A determinism test case is also added, which fails without
ZSTD_c_deterministicRefPrefix
and passes with it set.Question: Should this be the default behavior? It isn't in this PR.