-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastqc can't detect '@' in fastq.gz file #123
Comments
What is the name of your fastq file? Does it actually end with fastq.gz? Is it definitely gzip compressed? Can you post the command you're using and the full output of the program please. FastQC definitely supports reading directly from gzipped files so there's something else going on here. |
Hi Simon,
Thanks for the quick response. The name of my fastq file is IDNP-MW-30_2-2695942_S14_R1.fastq.gz. I've double checked the head of each fastq.gz file using zcat and verified that indeed it does include '@' therefore none of my files are corrupted. Not sure what's going on. Below is the error log. Note that I am using my university's cluster and am pushing commands through snakemake.
fastqc results/trimmed/IDNP-MW-30_2-2695942_S14_R1.fastq.gz results/trimmed/IDNP-MW-30_2-2695942_S14_R2.fastq.gz --outdir results/fastqc
Activating environment modules: fastqc/0.12.0
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) CCconfig 4) imkl/2020.1.217 7) libfabric/1.10.1
2) gentoo/2020 5) intel/2020.1.217 8) openmpi/4.0.3
3) gcccore/.9.3.0 6) ucx/1.8.0 9) StdEnv/2020
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Failed to process results/trimmed/IDNP-MW-30_2-2695942_S14_R1.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@' at line 1
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:162)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:92)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:163)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:125)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Failed to process results/trimmed/IDNP-MW-30_2-2695942_S14_R2.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@' at line 1
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:162)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:92)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:163)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:125)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
…_______________
Cameron So, M.Sc
Ph.D Student, Hargreaves & Schoen Labs
Department of Biology, McGill University
Twitter<https://twitter.com/cameron_so> Website<http://www.cameronso.ca/>
________________________________
From: Simon Andrews ***@***.***>
Sent: Tuesday, September 12, 2023 6:07 AM
To: s-andrews/FastQC ***@***.***>
Cc: Cameron So ***@***.***>; Author ***@***.***>
Subject: Re: [s-andrews/FastQC] fastqc can't detect '@' in fastq.gz file (Issue #123)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
What is the name of your fastq file? Does it actually end with fastq.gz? Is it definitely gzip compressed?
Can you post the command you're using and the full output of the program please.
FastQC definitely supports reading directly from gzipped files so there's something else going on here.
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AM2Z26Y5WQOKBCBYIMXQH7TX2AX4TANCNFSM6AAAAAA4UHEWZE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
That's very strange. On the face of it it looks like it's just finding something else at the start, but it could be mis-detecting the file type. It's not something super simple like there being a blank line at the top of the file is it? Could you post the output of
So we can see what's happening at the top of the file. Can you also try running:
..and see if fastqc processes that OK. |
Hi, I am acutually facing the same issue on fastqc v0.11.9 right now. I tried running your commands:
followed by:
|
To add to the thread:
zcat IDNP-MW-30_2-2695942_S14_L001_R1_001.fastq.gz | nl | head
1 @A01433:320:HL5JCDSX5:1:1101:2338:1031 1:N:0:TCCAACTGAA+NGTCCGTAGG
2 CAGAAATTTGAATGATGCGTCGCCGGCACAAAGGCCGTGCGATCCGACGAGTTATCATGAATCATCAAAGCGACAGGCAGAGCCTGCGTCGACCTTTTATCTAATAAATGCATCCCTTCCAGAAGTCGGGGTTTGTTGCACGTATTAGCTC
3 +
4 FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFF
5 @A01433:320:HL5JCDSX5:1:1101:3568:1031 1:N:0:TCCAACTGAA+NGTCCGTAGG
6 ACAGAGGATTTCACCACTGCTCTCTAACACTAGAGAGACTTCTCTCTGCTTTCTTAACCAAGAAAGACTTCTCTCTTTTTCTAAAGAGTAACTGGTAAGCAAGATAACTTCTCAGGCTCAAGAAAGATCCTTCGTATGATAATCGATTACG
7 +
8 FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF,FFFFFFF,FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFF:FFFFF
9 @A01433:320:HL5JCDSX5:1:1101:11360:1031 1:N:0:TCCAACTGAA+NGTCCGTAGG
10 CAATCCTAATCCATATCCCAATTCCAATCCCAGTCCAAATCCCAATCCCAAATCCAAATCCCAATCCCAATCCCAATCCAAATACCAATTACAAAACCCAATCCCAATCCAAATCCAAAACCACATACAAAATCCCAATACAAAATCACAA
zcat IDNP-MW-30_2-2695942_S14_L001_R1_001.fastq.gz | fastqc stdin:IDNP-MW-30_2-2695942_S14_L001_R1_001
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Started analysis of stdin:IDNP-MW-30_2-2695942_S14_L001_R1_001
Analysis complete for stdin:IDNP-MW-30_2-2695942_S14_L001_R1_001
…_______________
Cameron So, M.Sc
Ph.D Student, Hargreaves & Schoen Labs
Department of Biology, McGill University
Twitter<https://twitter.com/cameron_so> Website<http://www.cameronso.ca/>
________________________________
From: mmaeke ***@***.***>
Sent: Wednesday, September 13, 2023 11:09 AM
To: s-andrews/FastQC ***@***.***>
Cc: Cameron So ***@***.***>; Author ***@***.***>
Subject: Re: [s-andrews/FastQC] fastqc can't detect '@' in fastq.gz file (Issue #123)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi, I am acutually facing the same issue on fastqc v0.11.9 right now.
I tried running your commands:
zcat ERR4674036_1.fastq.gz | nl | head
1 @ERR4674036.1 V1:1:HYLGGDSXX:4:1101:10004:10457/1
2 ATAAAAATTGAAAATGCAAAACCAAAACAAAATAATTTAGAAAAATTACTTCTTGACAATGCTCCAATACGAATAATTCTAAATATTATTATTGTATAAAGAATTAAAAGACCAACAGATCCTATAAAACCAAATTCCTCTGAAAATAAAG
3 +
4 FFFFFFFFFFF:FFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
5 @ERR4674036.2 V1:1:HYLGGDSXX:4:1101:10004:10864/1
6 CATAATTTTTTTGGTAGAGAAATCTAATATTTTTCCATTACAATTTAAAGTCATACTTTTAATCTTCTTTTCCAAATTTATGCTAAAATCTTATATGGAAGATACAAAGTCACAATATCATCAAATAATAAGTAAATCAAGAAAAATATTT
7 +
8 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FFFFF:FFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:F
9 @ERR4674036.3 V1:1:HYLGGDSXX:4:1101:10004:14278/1
10 TAGCCGAGATGCATGCAGCCTTGCTACAAATGCATCAAGGGCTTCTCGGTTTGTTACATCAGACACGCGGTGAATTTGAACGTGATAATCCATATTCAATTTTTTAACTGGCACAAAACCGACGGGATAACCCAAAACCGTTTTCACTGCG
followed by:
zcat ERR4674036_1.fastq.gz | fastqc stdin:ERR4674036_1
Started analysis of stdin:ERR4674036_1
Failed to process file stdin:ERR4674036_1
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
at java.base/java.lang.Thread.run(Thread.java:829) ```
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AM2Z263YEWM7BOYICKJCHJLX2HECHANCNFSM6AAAAAA4UHEWZE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I have the same problem, fastqc v0.12.0 doesn't work on zipped files but it seems to work on unzipped files |
I installed v0.11.9 and everything works as it used to :) |
When we've taken in data files where this has happened it's nearly always been because the file was corrupt and when the file was unzipped it generated some rubbish output before dying, but if the program saw the rubbish first then you got an error about the @ being missing rather than the later error from the decompression. If you want to test this then you can try running something like:
..and you'll very likely see an error saying that you got an error trying to decompress the file. |
So nothing changed in the decompression code between 11.9 and 12. The only thing which may have changed is the way that the program decides whether a file is compressed or not, so if your file is named in a nonstandard way (ie it says .gz on the end but it's not gzipped or vice versa) then it could be that, but otherwise there should be no difference. |
As a standing offer to anyone who is posting on these threads, if you are able to share your problematic file with us then we're very happy to take a look and tell you exactly what's going wrong. We can set up a temporary FTP site for you to push your data to if you can't make it available to us directly. If there are any corner cases we're missing then we'd be very pleased to find examples to reproduce the problem so we can fix it. |
I've tried unzipping my files using zcat and did not run into any issues. I'm happy to share a file to identify any other errors. |
Hi,
With version 0.12.0, I cannot run fastqc on my fastq.gz files using my university's cluster. When I unzip the files, then it is possible. The error message that arises for fastq.gz files is the inability to detect the '@' on line 1.
The text was updated successfully, but these errors were encountered: