Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix always falling back to non-bgzip reader && allow .bgz extension #109

Merged
merged 1 commit into from
Jun 7, 2019

Conversation

zamaudio
Copy link
Contributor

@zamaudio zamaudio commented Jun 6, 2019

Previously, files with .gz extension were assumed to be bgzip files.
Now, .gz is assumed gzip and .bgz is assumed bgzip.

I think it makes more sense this way, and hope the author agrees.

EDIT: This is not an adequate assumption, but I discovered that bgzip reader was never being used and is fixed below.

@brentp
Copy link
Owner

brentp commented Jun 6, 2019

I think all that's required to get the desired effect is to change the single line:

	if len(config.Annotation) < runtime.GOMAXPROCS(0) && strings.HasSuffix(queryFile, ".gz") {	

to

	if len(config.Annotation) < runtime.GOMAXPROCS(0) && (strings.HasSuffix(queryFile, ".gz") 
      || strings.HasSuffix(queryFile, ".bgz")) {	

this will make it so the query file can be .bgz, and bix has been updated to support ".bgz" extension.

@zamaudio
Copy link
Contributor Author

zamaudio commented Jun 7, 2019

Hi @brentp, the line you posted above has slightly different behaviour than my PR, it will try to open both .gz and .bgz extensions as block gzipped input. But my PR allows regular gzip to be opened as well. With the current code, if you try to open a regular gzipped vcf file that has not been bgzipped bgzip file with a .gz extension, sometimes vcfanno crashes.
Ideally vcfanno would detect which kind of compression it is, but my go skills are not quite up to doing that.
Do you know of a way to detect if a file with .gz extension is block gzipped or regular gzipped that is less naive than reading the file extension? I think that would solve the issue more robustly and not change existing behaviour of your program.

@brentp
Copy link
Owner

brentp commented Jun 7, 2019

the current code allows regular gzip to be opened. (that's why it sets qrdr = nil when there's an error with bgzf reader). can you write a test that fails with master and passes with your code so I can understand what you are adding?

I still think that all that's needed is the line I posted above. If I misunderstand, then a test will help to clarify.

}
qrdr = nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an else clause here, when bgzip reader succeeds OR fails, qrdr is unconditionally being set to nil, thus always falls back to non-bgzip reader (xopen)

@zamaudio zamaudio changed the title Allow gz && bgz as input Fix always falling back to non-bgzip reader && allow .bgz extension Jun 7, 2019
@zamaudio
Copy link
Contributor Author

zamaudio commented Jun 7, 2019

@brentp I have identified that the bgzip reader is never being used in the current code. I have provided a brand new fix above that is basically your suggested change plus a couple of else clauses.

@brentp brentp merged commit deeeee3 into brentp:master Jun 7, 2019
@brentp
Copy link
Owner

brentp commented Jun 7, 2019

great! thank you for noticing and fixing this. I'll make a new release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants