Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabix doesn't handle queries using sequence names containing ':' or '-' #1017

Open
cmnbroad opened this issue Oct 23, 2017 · 5 comments
Open
Assignees

Comments

@cmnbroad
Copy link
Collaborator

TabixReader parseReg method assumes these characters don't appear in sequence names. This came up in the review for #911.

@lindenb
Copy link
Contributor

lindenb commented Oct 23, 2017

@cmnbroad

FYI: for VCF, the spec says that a colon shouldn't appear in the CHROM.

in https://samtools.github.io/hts-specs/VCFv4.2.pdf

The colon symbol (:) must be absent from all chromosome name

handling a colon in a vcf.gz would be wrong isn't it ?

@droazen
Copy link
Contributor

droazen commented Oct 23, 2017

@lindenb In the hg38 reference we find contigs such as this one:

@SQ	SN:HLA-A*01:01:01:01	LN:3503	M5:01cd0df602495b044b2c214d69a60aa2	AS:38	UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta	SP:Homo sapiens

@yfarjoun
Copy link
Contributor

yfarjoun commented Oct 24, 2017 via email

@cmnbroad cmnbroad self-assigned this Oct 24, 2017
@jkmatila
Copy link

Would one possible workaround be to add support for escaping the problematic characters in the chromosome name in tabix when parsing the region specifier?

For example, quoting the problematic characters by prepending a backslash, like HLA\-A*01\:01\:01\:01.

Or alternatively, using the percent encoding, like HLA-A%2A01%3A01%3A01%3A01.

@jmarshall
Copy link
Member

Since 2017, both SAM and VCF have gained clarity around what constitutes a legal contig name (and VCF's prohibition of colons has been removed). The SAM spec also has some suggestions about parsing region specifiers in the face of extra colons: see Appendix A of SAMv1.pdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants