Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data corruption in niche use case #1659

Merged
merged 4 commits into from
Jun 24, 2019
Merged

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Jun 22, 2019

  • Extract the overflow correction into a helper function.
  • Load the dictionary ZSTD_CHUNKSIZE_MAX = 512 MB bytes at a time
    and overflow correct between each chunk.
  • Add a test case that made overflow asserts fail, and now passes.
    I couldn't figure out how to make a fast test that causes data corruption.

Data corruption could happen when all these conditions are true:

  • You are using multithreading mode
  • Your overlap size is >= 512 MB (implies window size >= 512 MB)
  • You are using a strategy >= ZSTD_btlazy
  • You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.

Fixes #1653.

* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
  and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decoding error on spruce genome
3 participants