-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about training dictionary functions #2566
Comments
These notes come from
|
What do you mean by that? |
doc/zstd_compression_format.md has these words:
"If a dictionary is provided by an external source, it should be loaded with great care, its content considered untrusted."
Does it mean vulnerable?
|
It depends on what is associated with the word "vulnerable". If it means that there are some known exploitable vulnerabilities, then it's too strong. Essentially, an externally provided dictionary content, which could have been tampered with by a 3rd party, is as dangerous as any compressed payload sent to the decoder. Good thing is, |
The FAQ covers the questions asked in Issue facebook#2566. It first covers why you would want to use a dictionary, then what a dictionary is, and finally it tells you how to train a dictionary, and clarifies some of the parameters. There is definitely more that could be said about some of the advanced trainers, but this should be a good start.
The FAQ covers the questions asked in Issue facebook#2566. It first covers why you would want to use a dictionary, then what a dictionary is, and finally it tells you how to train a dictionary, and clarifies some of the parameters. There is definitely more that could be said about some of the advanced trainers, but this should be a good start.
Addressed in #2622 |
Currently, the stable API has two functions for training dictionary [1]:
ZDICT_finalizeDictionary()
's doc said specifying compression level is helpful for compression [2]:But
ZDICT_trainFromBuffer()
usesZSTD_CLEVEL_DEFAULT
as compression level [3], does the trained dictionary have sub-optimal effect for other compression levels?If this is an API defect, maybe
ZDICT_finalizeDictionary()
function can be allowed to acceptNULL
as basis dictionary. In this case, it only trains the dictionary, doesn't finalize the dictionary. This will not break the API, and it's backward compatible. ThenZDICT_trainFromBuffer()
function can be marked as obsolete.Moreover, it would be nice to describe
ZDICT_finalizeDictionary()
function from user's perspective, for example:[1] https://github.com/facebook/zstd/blob/v1.4.9/lib/dictBuilder/zdict.h#L40-L109
[2] https://github.com/facebook/zstd/blob/v1.4.9/lib/dictBuilder/zdict.h#L80-L83
[3] https://github.com/facebook/zstd/blob/v1.4.9/lib/dictBuilder/zdict.c#L1116
The text was updated successfully, but these errors were encountered: