Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indexing bug by flushing BCF bgzf stream after header write #1742

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

daviesrob
Copy link
Member

bcf_idx_init() calls bgzf_tell() to get the starting index offset. This was OK when single-threaded but broke with multiple threads because bgzf_tell() lies about the file offset unless bgzf_flush() was called first. SAM.gz, BAM and VCF.gz all did this, but BCF didn't leading to an incorrect first index entry when combining multi-threads with indexing on the fly. Fix by adding the missing bgzf_flush() after writing the header.

As a side benefit, the BCF variant records will now start in a fresh BGZF block, instead of being mixed in with part of the BCF header.

test/index.bcf.csi has to be replaced due to the extra flush adding one more block to the (uncompressed) index.bcf file that gets generated by the test harness.

Fixes #1740

bcf_idx_init() calls bgzf_tell() to get the starting index offset.
This was OK when single-threaded but broke with multiple threads
because bgzf_tell() lies about the file offset unless bgzf_flush()
was called first.  SAM.gz, BAM and VCF.gz all did this, but BCF
didn't leading to an incorrect first index entry when combining
multi-threads with indexing on the fly.  Fix by adding the missing
bgzf_flush() after writing the header.

As a side benefit, the BCF variant records will now start in
a fresh BGZF block, instead of being mixed in with part of the
BCF header.

test/index.bcf.csi has to be replaced due to the extra flush
adding one more block to the (uncompressed) index.bcf file that
gets generated by the test harness.
@whitwham whitwham merged commit 34031e9 into samtools:develop Feb 19, 2024
9 checks passed
@daviesrob daviesrob deleted the bcf_hdr_flush branch February 20, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bcftools-1.19: Invalid index is produced by --write-index and --threads on my bcf.gz files
2 participants