Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Cannot sort using custom contig order #2324

Open
rickymagner opened this issue Nov 26, 2024 · 2 comments
Open

[Feature Request] Cannot sort using custom contig order #2324

rickymagner opened this issue Nov 26, 2024 · 2 comments

Comments

@rickymagner
Copy link

Hi, I have some VCFs where the sequence dictionary in the header is out of the canonical order because of a tool's decisions. I'd like to be able to sort by VCF to follow the "usual" order, in other words sort according to a custom order of the contigs, e.g. the order from a reference fai for example.

Here are some possible ways some new features in bcftools could allow for this.

  • Update bcftools reheader -f ref.fai to also force the new header ordering of the contigs to match the order in the ref.fai. I'd imagine this is the simplest to implement, and then can be followed with a bcftools sort to get the entries to match this ordering, but is not strictly backwards compatible since behavior of an existing flag would change.
  • Update bcftools sort to include a -f ref.fai input to do both of the things described above: update the header to have sequence dict matching the order of the input, and sort all the records according to this order.

Unless I missing something, there is currently no way to (easily) achieve this with bcftools.

@pd3 pd3 added the enhancement label Dec 3, 2024
@pd3
Copy link
Member

pd3 commented Dec 3, 2024

I am not opposed to adding this feature, but it is unlikely to happen by my doing. What is the motivation for this request? VCF specification does not mandate any specific order of the contigs, programs should not be relying on it.

@rickymagner
Copy link
Author

The motivation is that some tools write records unsorted, and then you can only sort according to the sequence dictionary in the header using bcftools sort. This means if you want to do anything where you iterate over a family of files (e.g. your VCF, a bed file, a BAM, etc), you'd be unable to traverse them "together" since they would be sorted according to different conventions. It would be great to be able to coerce the ordering in your VCF to match your "normal" convention all your other files are following.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants