-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabix doesn't handle queries using sequence names containing ':' or '-' #1017
Comments
FYI: for VCF, the spec says that a colon shouldn't appear in the CHROM. in https://samtools.github.io/hts-specs/VCFv4.2.pdf
handling a colon in a vcf.gz would be wrong isn't it ? |
@lindenb In the hg38 reference we find contigs such as this one:
|
This is due to an inconsistency between SAM and VCF specs regarding what
constitutes a legal contig name....
It would be good to solve this at the spec-level...but in the meantime...we
needs a solution. :-/
…On Mon, Oct 23, 2017 at 11:14 PM, Pierre Lindenbaum < ***@***.***> wrote:
@cmnbroad <https://github.com/cmnbroad>
FYI: for VCF, the spec says that a colon shouldn't appear in the CHROM.
in https://samtools.github.io/hts-specs/VCFv4.2.pdf
The colon symbol (:) must be absent from all chromosome name
handling a colon in a vcf.gz would be wrong isn't it ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1017 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACnk0hXRvZ16cjz1OsizhKnkC2vSjKTSks5svPOOgaJpZM4QDa60>
.
|
Would one possible workaround be to add support for escaping the problematic characters in the chromosome name in tabix when parsing the region specifier? For example, quoting the problematic characters by prepending a backslash, like Or alternatively, using the percent encoding, like |
Since 2017, both SAM and VCF have gained clarity around what constitutes a legal contig name (and VCF's prohibition of colons has been removed). The SAM spec also has some suggestions about parsing region specifiers in the face of extra colons: see Appendix A of SAMv1.pdf. |
TabixReader parseReg method assumes these characters don't appear in sequence names. This came up in the review for #911.
The text was updated successfully, but these errors were encountered: