Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncated gzip (not bgzf) leads to infinite loop when parsign a fastq files #1579

Closed
goranvinterhalter opened this issue Mar 3, 2023 · 3 comments · Fixed by #1582
Closed
Assignees

Comments

@goranvinterhalter
Copy link

Hi All,

Parsing a truncated fastq.gz file (not bgzf) leads to infinite loop problems.

There is a fix for this in klib (in this PR) since 2017.

Could it be this was overlooked in htslib or is there another reason why this fix is not in kseq.h?

@jkbonfield
Copy link
Contributor

Thanks for this bug report.

You are correct in that we overlooked this revision. I'm now watching Heng's klib, so incase there are other bug fixes we can review. I'll also look to see if there are other bug fixes we should have incorporated.

@jkbonfield
Copy link
Contributor

Curiously the test data in attractivechaos/klib#78 doesn't trigger problems for samtools view and test/test_view. They correctly identify the broken CRC. I'll still review the changes, but could you please explain what command you're using to hit this bug?

Or are you using htslib/kseq.h directly from your own tool?

@goranvinterhalter
Copy link
Author

goranvinterhalter commented Mar 7, 2023

I'm using it directly from my own tool.
I can confirm the klib version works, in case of a corrupt ".gz" file the kseq_read returns -3.
Note it has to be regular 'gz' not 'bgzf' compressed fastq file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants