Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support references to multiple DDicts #2446

Merged
merged 11 commits into from
Jan 8, 2021

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented Dec 29, 2020

This PR adds a new experimental param ZSTD_d_refMultipleDDicts which controls the behavior of ZSTD_refDDict().

If enabled, then multiple calls to ZSTD_refDDict() will persist the multiple referenced ZSTD_DDict* in memory rather than overwriting the last reference. Under the hood, we use a hash set w/ linear probing to keep track of the ZSTD_DDict*. Example usage is in the included unit test in fuzzer.c, but is basically as simple as:

ZSTD_DCtx* dctx = ZSTD_createDCtx();
ZSTD_DCtx_setParameter(dctx, ZSTD_d_refMultipleDDicts, ZSTD_rmd_refMultipleDDicts);
ZSTD_DCtx_refDDict(dctx, ddict1);
ZSTD_DCtx_refDDict(dctx, ddict2);
ZSTD_DCtx_refDDict(dctx, ddict3);
...
ZSTD_decompress...

The goal is no changes to the existing API, leaving existing behavior/usages of ZSTD_refDDict() unchanged.
Only when we call setParameter() with the new experimental param can behavior change.

There are still issues with this PR, but I wanted to put this up to get some initial comments to see if this was roughly the direction we wanted to head in.

Open questions/followups:

  • Hash table allocation entry point (is refDDict() best?)
  • Performance (hash table param tuning/hash function)
  • API Extensions? (un-reference a dictionary)
  • Add streaming unit test
  • Still need to test more extensively
  • Some code cleanup

@senhuang42 senhuang42 changed the title Multiple ddicts v3 [RFC] Support references to multiple DDicts Dec 29, 2020
* Returns an index between [0, hashSet->ddictPtrTableSize]
*/
static size_t ZSTD_DDictHashSet_getIndex(const ZSTD_DDictHashSet* hashSet, U32 dictID) {
U32 hash = XXH64(&dictID, sizeof(U32), 0);
Copy link
Contributor Author

@senhuang42 senhuang42 Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I can't quite figure out is why on certain builds/CI tests, zstd_decompress.c can't see XXH64, but can see XXH64_digest(), etc.? As far as I can tell, xxhash.h is within the graph of dependencies, and locally, it seems fine. Is there an additional macro I need to invoke?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the linux kernel test ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do see a line about Replacing XXH64 prefix with xxh64 in the logs, and I assume it might be related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just need to change this line:

(re.compile(r"([^\w]|^)(?P<orig>XXH64)_"), self._xxh64_prefix)

Something like: re.compile(r"([^\w]|^)(?P<orig>XXH64)[\(_]")

return 0;
}
if (idx == hashSet->ddictPtrTableSize - 1) {
idx = 0;
Copy link
Contributor

@Cyan4973 Cyan4973 Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor (potential suggestion) : could also use a mask if ddictPtrTableSize is a clean power of 2.

@senhuang42 senhuang42 force-pushed the multiple_ddicts_v3 branch 2 times, most recently from 3544d64 to 5f5d100 Compare January 7, 2021 22:03
@senhuang42 senhuang42 merged commit 69085db into facebook:dev Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants