Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastqc can't detect '@' in fastq.gz file #123

Open
socameron opened this issue Sep 12, 2023 · 11 comments
Open

fastqc can't detect '@' in fastq.gz file #123

socameron opened this issue Sep 12, 2023 · 11 comments

Comments

@socameron
Copy link

Hi,

With version 0.12.0, I cannot run fastqc on my fastq.gz files using my university's cluster. When I unzip the files, then it is possible. The error message that arises for fastq.gz files is the inability to detect the '@' on line 1.

@s-andrews
Copy link
Owner

What is the name of your fastq file? Does it actually end with fastq.gz? Is it definitely gzip compressed?

Can you post the command you're using and the full output of the program please.

FastQC definitely supports reading directly from gzipped files so there's something else going on here.

@socameron
Copy link
Author

socameron commented Sep 13, 2023 via email

@s-andrews
Copy link
Owner

That's very strange. On the face of it it looks like it's just finding something else at the start, but it could be mis-detecting the file type. It's not something super simple like there being a blank line at the top of the file is it?

Could you post the output of

zcat IDNP-MW-30_2-2695942_S14_R1.fastq.gz | nl | head

So we can see what's happening at the top of the file.

Can you also try running:

zcat IDNP-MW-30_2-2695942_S14_R1.fastq.gz | fastqc stdin:IDNP-MW-30_2-2695942_S14_R1

..and see if fastqc processes that OK.

@mmaeke
Copy link

mmaeke commented Sep 13, 2023

Hi, I am acutually facing the same issue on fastqc v0.11.9 right now.

I tried running your commands:

zcat ERR4674036_1.fastq.gz  | nl | head

     1  @ERR4674036.1 V1:1:HYLGGDSXX:4:1101:10004:10457/1
     2  ATAAAAATTGAAAATGCAAAACCAAAACAAAATAATTTAGAAAAATTACTTCTTGACAATGCTCCAATACGAATAATTCTAAATATTATTATTGTATAAAGAATTAAAAGACCAACAGATCCTATAAAACCAAATTCCTCTGAAAATAAAG
     3  +
     4  FFFFFFFFFFF:FFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
     5  @ERR4674036.2 V1:1:HYLGGDSXX:4:1101:10004:10864/1
     6  CATAATTTTTTTGGTAGAGAAATCTAATATTTTTCCATTACAATTTAAAGTCATACTTTTAATCTTCTTTTCCAAATTTATGCTAAAATCTTATATGGAAGATACAAAGTCACAATATCATCAAATAATAAGTAAATCAAGAAAAATATTT
     7  +
     8  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FFFFF:FFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:F
     9  @ERR4674036.3 V1:1:HYLGGDSXX:4:1101:10004:14278/1
    10  TAGCCGAGATGCATGCAGCCTTGCTACAAATGCATCAAGGGCTTCTCGGTTTGTTACATCAGACACGCGGTGAATTTGAACGTGATAATCCATATTCAATTTTTTAACTGGCACAAAACCGACGGGATAACCCAAAACCGTTTTCACTGCG 

followed by:

zcat ERR4674036_1.fastq.gz  | fastqc stdin:ERR4674036_1

Started analysis of stdin:ERR4674036_1
Failed to process file stdin:ERR4674036_1
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
        at java.base/java.lang.Thread.run(Thread.java:829) ```
 

@socameron
Copy link
Author

socameron commented Sep 13, 2023 via email

@anitasalamon
Copy link

I have the same problem, fastqc v0.12.0 doesn't work on zipped files but it seems to work on unzipped files

@anitasalamon
Copy link

I installed v0.11.9 and everything works as it used to :)

@s-andrews
Copy link
Owner

When we've taken in data files where this has happened it's nearly always been because the file was corrupt and when the file was unzipped it generated some rubbish output before dying, but if the program saw the rubbish first then you got an error about the @ being missing rather than the later error from the decompression.

If you want to test this then you can try running something like:

zcat yourfile.fq.gz > /dev/null

..and you'll very likely see an error saying that you got an error trying to decompress the file.

@s-andrews
Copy link
Owner

I installed v0.11.9 and everything works as it used to :)

So nothing changed in the decompression code between 11.9 and 12. The only thing which may have changed is the way that the program decides whether a file is compressed or not, so if your file is named in a nonstandard way (ie it says .gz on the end but it's not gzipped or vice versa) then it could be that, but otherwise there should be no difference.

@s-andrews
Copy link
Owner

As a standing offer to anyone who is posting on these threads, if you are able to share your problematic file with us then we're very happy to take a look and tell you exactly what's going wrong. We can set up a temporary FTP site for you to push your data to if you can't make it available to us directly. If there are any corner cases we're missing then we'd be very pleased to find examples to reproduce the problem so we can fix it.

@socameron
Copy link
Author

As a standing offer to anyone who is posting on these threads, if you are able to share your problematic file with us then we're very happy to take a look and tell you exactly what's going wrong. We can set up a temporary FTP site for you to push your data to if you can't make it available to us directly. If there are any corner cases we're missing then we'd be very pleased to find examples to reproduce the problem so we can fix it.

I've tried unzipping my files using zcat and did not run into any issues. I'm happy to share a file to identify any other errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants