Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compression/decompression error #173

Closed
ghood opened this issue Apr 19, 2016 · 9 comments
Closed

compression/decompression error #173

ghood opened this issue Apr 19, 2016 · 9 comments
Assignees
Labels

Comments

@ghood
Copy link

ghood commented Apr 19, 2016

When compressing and then decompressing a test file, I don't get the original file back:
$ ./zstd -V
* zstd command line interface 64-bits 0.6.0, by Yann Collet ***
$ ./zstd test -o test.zst
Compressed 10688440 bytes into 4209313 bytes ==> 39.38%
$ ./zstd -d test.zst -o test.decoded
Successfully decoded 10688440 bytes
$ cmp test test.decoded
test test.decoded differ: byte 2588022, line 3678
$ ls -l test

-rw-r--r-- 1 ghood staff 10688440 Apr 19 14:12 test
-rw-r--r-- 1 ghood staff 10688440 Apr 19 14:25 test.decoded
-rw-r--r-- 1 ghood staff 4209313 Apr 19 14:24 test.zst
$ md5sum test

21c75c134b09e734bd4d5b78e749b1fd test
6e5dcc872da801fd07103fba6a7ee097 test.decoded
83dad083093ae38d3838de71666ff024 test.zst
$ uname -a
Linux axon 3.13.0-43-generic #72 SMP Mon Jan 12 10:50:11 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

The zipped test file is attached.
test.zip

Anyone know what is going wrong?
Thanks.
--Greg

@luben
Copy link
Contributor

luben commented Apr 19, 2016

Looks the but was introduced with v0.4.6, it works as expected with v0.4.5

@Cyan4973
Copy link
Contributor

I confirm both reports : the test file fails round-trip test at -1 setting for all versions starting v0.4.6.
Now onto bug chasing ...

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 2, 2016

Preliminary assessment :

This looks like an error in the huffman decoder.

The incorrect byte is right in the middle of a long literal section. Which means : the problem is unrelated to match finding and decoding. It just randomly happens here at -1 setting after v0.4.6 changes "just by chance", but it could happen with any other mode (albeit with a fairly low probability).

huff0 has an internal fuzzer to experience and test as many strange situations as possible, but somehow it did not found this one. I believe a first task would be to create a test which reproduce the problem. This would make it possible to both control its correction and continuously check in future versions that it remains safe.

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 3, 2016

The guilty code is the quad-symbol huffman decoder.
A quickfix is to disable this code (typically by commenting this line ).

Now, of course, a better fix is to find the bug and fix it.
And as said, an even better outcome is to find a way to trigger the bug without relying on a 3rd party data blob. Which is not that easy : it really looks like a limit condition which is not trivial to create by hand.

@luben
Copy link
Contributor

luben commented May 3, 2016

The good news is it's not data corruption. Thanks for the workaround. BTW, I don's notice speed difference.

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 3, 2016

The quad-symbol decoder is only enabled when certain conditions are met (large huffman block, good enough compression ratio). Therefore in many circumstances, it's not even used. Even when it is, the difference in total decompression speed is typically small, because it's only a fraction of the work. Only pathological cases can see noticeable speed differences.

@ghood
Copy link
Author

ghood commented May 3, 2016

Thanks for the workaround! After commenting out that line, I have so far put about 1TB through the decompressor without errors.

Cyan4973 added a commit that referenced this issue May 5, 2016
work-around : disabled automatic selection of huff0 quad-decoder (see #173)
@Cyan4973
Copy link
Contributor

Cyan4973 commented May 5, 2016

Latest update of the "dev" branch should fix this issue,
both for current decoder, and for legacy decoders.

@Cyan4973
Copy link
Contributor

Fixed in v0.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants