Description
Hello @rhpvorderman,
Yesterday, it happened to me and other bioinformaticians that the program that we were using (cutadapt) crashed unexpectedly when trying to open some gzipped files, which was the first time something like this happened: marcelm/cutadapt#520
fossandon@ubuntu:~/Documents/download$ cutadapt -a 'AACTTTYARCAAYGGATCTC;max_error_rate=0.1;min_overlap=20' -A 'TGATCCYTCCGCAGGT;max_error_rate=0.5;min_overlap=16' --pair-adapters --pair-filter any --cores 2 --output 94477_R1.fastq --paired-output 94477_R2.fastq 94477_S175_L001_R1_001.fastq.gz 94477_S175_L001_R2_001.fastq.gz
This is cutadapt 3.3 with Python 3.6.9
Command line parameters: -a AACTTTYARCAAYGGATCTC;max_error_rate=0.1;min_overlap=20 -A TGATCCYTCCGCAGGT;max_error_rate=0.5;min_overlap=16 --pair-adapters --pair-filter any --cores 2 --output 94477_R1.fastq --paired-output 94477_R2.fastq 94477_S175_L001_R1_001.fastq.gz 94477_S175_L001_R2_001.fastq.gz
Processing reads on 2 cores in paired-end mode ...
[ 8<---------] 00:00:03 88,831 reads @ 26.0 µs/read; 2.31 M reads/minuteERROR: Traceback (most recent call last):
File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 556, in run
dnaio.read_paired_chunks(f, f2, self.buffer_size)):
File "/home/fossandon/.local/lib/python3.6/site-packages/dnaio/chunks.py", line 118, in read_paired_chunks
bufend1 = f.readinto(memoryview(buf1)[start1:]) + start1 # type: ignore
File "/usr/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/usr/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.6/gzip.py", line 454, in read
self._read_eof()
File "/usr/lib/python3.6/gzip.py", line 501, in _read_eof
hex(self._crc)))
OSError: CRC check failed 0x88b1f != 0x6fe5d9e4
ERROR: Traceback (most recent call last):
File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 626, in run
raise e
OSError: CRC check failed 0x88b1f != 0x6fe5d9e4
Traceback (most recent call last):
File "/home/fossandon/.local/bin/cutadapt", line 8, in <module>
sys.exit(main_cli())
File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 848, in main_cli
main(sys.argv[1:])
File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 913, in main
stats = r.run()
File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 825, in run
raise e
OSError: CRC check failed 0x88b1f != 0x6fe5d9e4
But using zcat and "gzip -t" on the files does not return any error, and they can be decompressed fine with "gzip -d", even running the same cutadapt command in different environments (python 3.6 and 3.8 were tested too) with the same version resulted in a crash for some environments and not for others. It took a long search and tests with a collegue, until we figure out that the key difference between crashing and not crashing was the version installed of the isal dependency (which uses the latest version when creating a docker image)... Using versions 0.8.0 and 0.7.0 generate the CRC error, but using 0.6.1 and 0.5.0 did not, so it seems the bug was introduced in 0.7.0, and keeping the intermediate dependencies the same but reverting isal to 0.6.1 allow it to work:
299 3047 0.0 8 2963 57 11 6 5 2 3
300 8 0.0 8 0 0 0 0 0 3 4 0 1
301 15028 0.0 8 0 14646 270 64 24 15 8 0 1
WARNING:
One or more of your adapter sequences may be incomplete.
Please see the detailed output above.
fossandon@ubuntu:~/Documents/temp$ pip3 list | egrep "cutadapt|dnaio|isal|xopen"
cutadapt 3.3
dnaio 0.5.0 /home/fossandon/.local/lib/python3.6/site-packages
isal 0.6.1
xopen 1.1.0
In my case, I was processing a folder where all gzipped files came from a source where they were created at the same time, but only a portion consistently crashed and the others not. So to help you have a test case, I uploaded the files pair that I was using with the cutadapt example above, so you can reproduce it on your own, I couldn't find smaller ones that reproduced this error.
https://drive.google.com/drive/folders/1eTmLbd9WINctLb48pzn57_Ohp1amwZah?usp=sharing
Best regards,