-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hisat2 treats fastq input files as fastq.gz files, resulting in zero mapping rate #1373
Comments
Can you make sure that your datatype is not set to fastq.gz or fastqsanger.gz ? |
thanks for your reply @mvdbeek. the datatype is fastq. I even don't know how to convert it to fastq.gz or fastqsanger.gz in galaxy. I also tried to upload a fastq.gz file to run the job, the uploaded file was automatically converted to fastq in galaxy. |
@mvdbeek I have the same issue as @MingChen0919 |
I'm not quite sure what you mean by that. It should be sufficient to make sure that the datatype is set to fastq or fastqsanger (without the .gz) if your files are not compressed. @MoHeydarian made an excellent video on how to do that here. If that still does not work it may be helpful to contact your local galaxy administrator or to provide some small sample data that we can test. |
Not sure if this is related but I have a similar one as well here: bgruening/galaxytools#598 Is this a Galaxy bug? |
@MingChen0919 Which Galaxy server are you using? And what version of Galaxy, if it's not a public instance? |
@mvdbeek what I mean is that: the original command line generated by galaxy was
I could run the alignment job and got correct answer by running the modified command line directly from the terminal like below.
I am pretty sure my reads files are in fastq format, and I also tried to set the datatype to make sure it is fastq or fastqsanger. I also tried the fastq groomer tool. @bgruening It didn't give me any error message. But the overall mapping rate was zero, which I know was impossible. By checking the job command line, I found that my fastq reads file were linked to .fastq.gz files. @nsoranzo I am using a galaxy image on jetstream. The galaxy version is 16.07. |
The fact that both this bug and bgruening/galaxytools#598 were experienced on Galaxy pre-17.01 (were compressed FASTQ formats were introduced) seems to indicate that the tools may not work correctly under Galaxy versions which lack these datatypes. @raphenya What Galaxy version are you using? |
Yeah, I wonder if an unknown datatype will map to Urgh, so if we change this back to checking for the extension we should be OK, or alternatively we ship the compressed datatypes with the tools that need them. |
Actually it seems to be equivalent to
which seems to be obviously a Galaxy bug. This code has been around since the first registered commit in 2006! galaxyproject/galaxy@f788a34#diff-f6e9dd2399db7b16a8a299cc0292520dR36 |
Ping @jmchilton, we need your advice here! |
@nsoranzo We are running galaxy version 16.04 |
I had hoped that setting |
We need to more aggressive warnings for incompatible profile versions for sure - there is an open issue for that. I'd say we should also patch is_of_type on older Galaxy versions but I'm not sure that would help anyone at this point - this tool is marked as incompatible with versions that are exhibiting this bug. If someone wants to ping me post-GCC I could give fixing is_of_type a shot. |
@jmchilton I'm working on a Galaxy PR for that, I'll add as reviewer when I open it. |
@nsoranzo Super fantastic - thanks so much! |
…nown Without this fix, the Cheetah expression: $dataset.is_of_type('unknown_ext') in a tool command would be equivalent to: $dataset.is_of_type('txt') meaning that if the dataset datatype is a subclass of Text, the expression would evaluate to True without any warning. xref. galaxyproject/tools-iuc#1373 Also add missing `xml` datatype to `test/functional/tools/sample_datatypes_conf.xml` which is needed by 3 test tools.
The fix actually still needs to be backported, reopening. |
…nown Without this fix, the Cheetah expression: $dataset.is_of_type('unknown_ext') in a tool command would be equivalent to: $dataset.is_of_type('txt') meaning that if the dataset datatype is a subclass of Text, the expression would evaluate to True without any warning. xref. galaxyproject/tools-iuc#1373 Also add missing `xml` datatype to `test/functional/tools/sample_datatypes_conf.xml` which is needed by 3 test tools.
The fix has been backported to Galaxy releases 16.07 and later, closing. |
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
See galaxyproject/tools-iuc#1373 which was fixed and back-ported to Galaxy 16.07, galaxyproject/galaxy#4224 galaxyproject/galaxy#4230 This would still break with other non-compressed FASTA subclasses, but this is intended as a stop-gap until the last few elderly Galaxy servers in use are updated.
Here is my job command line from running galaxy hisat2:
My reads files are fastq files, not fastq.gz files. hisat2 treats the fastq files as fastq.gz files. If I run the job directly from the command line, I got this error message:
The job was able to get executed after removing the .gz extensions from linked file names.
The text was updated successfully, but these errors were encountered: