-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Data corruption on SpeedBestCompression level #875
Comments
As restic 0.15.2 is not affected, the regression must have occurred between v1.16.0 and v1.16.7 . |
Thanks for the detailed report. I will set up the reproducer and look for a fix tomorrow. If at all possible, I will see if I can make a recovery possible. |
I get the reproducer just by plugging the input into the fuzz test. |
I got the issue isolated, but I need to find the root cause - even if I plan to add more safety code around that area. Unfortunately there doesn't seem to be a recovery, since it is an "offset 0" self-reference that has been added. |
Pretty crazy setup needed for this. I will send a PR with fix and details. |
Regression from #784 and followup #793 Fixes #875 A 0 offset backreference was possible when "improve" was successful twice in a row in the "skipBeginning" part, only finding 2 (previously unmatches) length 4 matches, but where start offset decreased by 2 in both cases. This would result in output where the end offset would equal to the next 's', thereby doing a self-reference. Add a general check in "improve" and just reject these. Will also guard against similar issues in the future. This also hints at some potentially suboptimal hash indexing - but I will take that improvement separately. Fuzz test set updated.
Fix in #876 - I will leave fuzz tests running for a few hours and merge+release later today. |
Thanks a lot for fixing this so quickly! |
* zstd: Fix corrupted output in "best" Regression from #784 and followup #793 Fixes #875 A 0 offset backreference was possible when "improve" was successful twice in a row in the "skipBeginning" part, only finding 2 (previously unmatches) length 4 matches, but where start offset decreased by 2 in both cases. This would result in output where the end offset would equal to the next 's', thereby doing a self-reference. Add a general check in "improve" and just reject these. Will also guard against similar issues in the future. This also hints at some potentially suboptimal hash indexing - but I will take that improvement separately. Fuzz test set updated.
Merged and released as |
Fixes 2 possible data corruption issues. See: - klauspost/compress#875 - klauspost/compress#922 Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Fixes 2 possible data corruption issues. See: - klauspost/compress#875 - klauspost/compress#922 Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Following a bug report at restic/restic#4523, I've traced the data corruption back to the zstd library.
The following two minimized data chunks return corrupt data after compressing it at SpeedBestCompression level and decompressing it afterwards. There is no data corruption at the default compression level. I've tested versions v1.16.7 (via restic using Go 1.20.7) and v1.17.1 (using Go 1.21.3).
correct-3d0e366bad4a5e9b443f407b114756a0f5a8153dbc242e0b2601c707136815eb.bin-minimized.txt
correct-dc82d97be7683ecd41097ab02d7b15de81e8bbcd1c476c50b254b1f458090929.bin-minimized.txt
To produce the minimized examples, I've used the following code snippet, which also serves as a reproducer:
Run it using
go mod init test go mod tidy go run main.go correct-3d0e366bad4a5e9b443f407b114756a0f5a8153dbc242e0b2601c707136815eb.bin
The text was updated successfully, but these errors were encountered: