Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PacBio CCS interleave error #56

Closed
Robvh-git opened this issue Oct 24, 2024 · 3 comments
Closed

PacBio CCS interleave error #56

Robvh-git opened this issue Oct 24, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Robvh-git
Copy link

Hello,

I'm trying to extract the ITS region from PacBio CCS reads using itsxpress by running following command:
itsxpress --fastq ITS__1.fq.gz --single_end --log logfile.txt --outfile ITS_select_ITS__1.fastq --region ALL --threads 23

this gives me the follow error message:

2024-10-24 13:53:35,899: INFO     Verifying the input sequences.
2024-10-24 13:53:35,978: ERROR    'File may be interleaved. ITSxpress will run with errors. Check BBmap reformat.sh to split interleaved files before using ITSxpress.
2024-10-24 13:53:35,978: ERROR    There appears to be an issue reading the input file ITS__1.fq.gz.
2024-10-24 13:53:35,978: ERROR    ITSxpress terminated with errors. See the log file for details.
2024-10-24 13:53:35,978: ERROR   

(the log file doesn't give additional info)

As this is PacBios CCS data, this data is not interleaved, and I'm unsure why itsxpress thinks it is. If - just to test - I use reformat.sh to 'de-interleave' this data, the interleaved files give the same error, so it is something about the format.

In QIIME2, I get the same error.

Do you have any idea how to fix this issue? I see people using ITSxpress on PacBio data, so it should be possible.

I've attached the file.

ITS__1.fq.gz

@arivers
Copy link
Member

arivers commented Nov 13, 2024

Sorry, I just saw this request. I was able to reproduce the error. I'll look into what might be causing it.

@arivers arivers added the bug Something isn't working label Nov 13, 2024
@arivers
Copy link
Member

arivers commented Nov 13, 2024

This issue is caused by the way the function _check_fastqs uses sequence headers to determine if a file is interleaved here. I disabled it and trimming worked. I will fix the validation logic and push an update. There are so many different ways to process PacBio data, this is a Pacbio header format I have not seen before.

@arivers
Copy link
Member

arivers commented Nov 14, 2024

I fixed this issue in v2.1.3, which I just released.

@arivers arivers closed this as completed Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants