-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading gzip files generates a CRC check failed error (version >= 0.7.0) #60
Comments
@fjossandon I was a bit surprised at first. From 0.6.1 to 0.7.0 I removed lots of custom code to make igzip.py work. I had solved some incompatibilities in isal_zlib so isal_zlib and zlib supported the same calls to Decompressobj in the same way. This caused a massive reduction of code. Basically the read methods of GzipFile in igzip are the same as those in CPython's gzip.py. Thanks to your providing of reproducing files I was able to find the error. It was an assumption in gzip.py in the buffer that it uses. I managed to find the error and will upload a bugfix release today. |
I also found that the error is triggered by every multi-member gzip file. I created a very small reproducer (92K) and will add this to my test suite. Thanks for providing the tests files! |
Excellent work!!! I just tested the 0.8.1 and the error is gone.
Then I will take down the shared files, glad that they helped you with this. Thanks for the fix! |
Lots of thanks again for writing such an extensive bug report including reproducing files. You really made it a lot easier for me to fix it. Your help is very much appreciated! |
Hello @rhpvorderman,
Yesterday, it happened to me and other bioinformaticians that the program that we were using (cutadapt) crashed unexpectedly when trying to open some gzipped files, which was the first time something like this happened: marcelm/cutadapt#520
But using zcat and "gzip -t" on the files does not return any error, and they can be decompressed fine with "gzip -d", even running the same cutadapt command in different environments (python 3.6 and 3.8 were tested too) with the same version resulted in a crash for some environments and not for others. It took a long search and tests with a collegue, until we figure out that the key difference between crashing and not crashing was the version installed of the isal dependency (which uses the latest version when creating a docker image)... Using versions 0.8.0 and 0.7.0 generate the CRC error, but using 0.6.1 and 0.5.0 did not, so it seems the bug was introduced in 0.7.0, and keeping the intermediate dependencies the same but reverting isal to 0.6.1 allow it to work:
In my case, I was processing a folder where all gzipped files came from a source where they were created at the same time, but only a portion consistently crashed and the others not. So to help you have a test case, I uploaded the files pair that I was using with the cutadapt example above, so you can reproduce it on your own, I couldn't find smaller ones that reproduced this error.
https://drive.google.com/drive/folders/1eTmLbd9WINctLb48pzn57_Ohp1amwZah?usp=sharing
Best regards,
The text was updated successfully, but these errors were encountered: