Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decompress_peek_gz to cope with files starting on empty gzip blocks. #1643

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

jkbonfield
Copy link
Contributor

Demonstration:

$ echo -e "@seq\nAAAAA\n+\nABCDE" | gzip > _normal.gz                           
$ (gzip < /dev/null;cat _normal.gz) > _empty.gz                                 
                                                                                
$ ./htsfile _normal.gz                                                          
_normal.gz:     FASTQ gzip-compressed sequence data                             
                                                                                
$ ./htsfile _empty.gz                                                           
_empty.gz:      empty gzip-compressed data                                      
$ ./test/test_view _empty.gz                                                    
Unsupported or unknown category of data in input file                           

Zlib's inflate ends with Z_STREAM_END and then repeated calls to get more won't give any difference, so we need the explicit reset to move forwards with the next block.

@jkbonfield jkbonfield force-pushed the empty_gz branch 2 times, most recently from 143bc56 to d0e46e7 Compare July 10, 2023 09:13
@jkbonfield
Copy link
Contributor Author

Now with added paranoia for loop detection. Just incase...

@whitwham whitwham merged commit 2e672f3 into samtools:develop Jul 13, 2023

destsize = zs.total_out;
// zs.total_out can sometimes be wrong as inflateReset resets it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If total_out may have been reset by inflateReset, is it still appropriate to use it in the loop above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Good point. No (although in practice avail_in seems to catch it).

I'll make a follow up fix with the similar next_out - dest logic instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears I was being overly paranoid (although I left in the loop detection bit, even though I can't actually trigger zlib to consume no input without producing an error).

avail_out would have been the correct variable, as unlike total_out this isn't reset. However it turns out neither are necessary. Calling inflate with one of those as zero returns an error (Z_BUF_ERROR) so it promptly bails out the loop anyway, and even if it just returned Z_STREAM_END again it'd be caught by the other checks.

So it's now much simpler. See #1646 for 2nd try at fixing this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants