-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
htsget should require that returned SAM/VCF headers include @SQ
/##contig
#578
Comments
As noted in #311 (comment) which motivated their addition, |
@jmarshall The "class-header" option works great for reads, thanks for that, but how does it work for variants? AFAIK the only required line in a VCF header is the format directive, and I don't recall seeing a VCF with referenceNames listed in the header. |
The relevant VCF header is I think there has been talk of making You are correct that it would behoove htsget to specify that the VCF headers returned by a IMHO that would be a more practical approach than inventing another endpoint and requiring servers to divine this information from the underlying data in some other way than looking at these headers. ( |
@SQ
/##config
@SQ
/##config
@SQ
/##contig
@jmarshall Thanks, that would work for me (##contig). I just need the names, so even the short form would work. I will build in workarounds if they are missing, which obviously I have already done for alignments. |
@jmarshall BTW servers must already be "divining" the reference names in a dataset since they are supposed to return an error if a request is made for a reference name that is not present. This was at the root of my requests from the beginning, if you are going to return an error shouldn't there be a way to determine what is legal in advance? In any event I have long implemented workarounds, even without the "header" option, so you don't need to keep this open for me. Thanks for your responses. |
File-based servers will typically be using a bai/csi/tbi index to return results for requested referenceNames and coordinates; doing such an individual lookup by name is a typical operation on such an index. Enumerating all the available referenceNames is not a typical operation on these indexes (and is not possible for some indexes), so would require additional implementation. This is what I mean by additional divination being required. Anyway, for the 1 vs chr1 problem, the real solution is for VCF to gain an equivalent of SAM's
As you know, there has been a way to determine that for a little over two years. |
@jmarshall It's been more than 2 years since I last worked on this until recently, so I'm getting back up to speed. Yes there is a way for BAM files, but not for VCF, the optional ##contig header not withstanding. Currently I'm handling it this way, if there's a mismatch on a VCF file that lacks the ##contig header too bad, its just going to be an error. |
Please be explicit. What do you consider is lacking for VCF? (Other than the minor “my VCF file does not have |
@jmarshall Nothing, that's it, the optional ##contig header. I'm sure you're right, they are unusual, unusual files are way over represented in my world because they generally end up as IGV help tickets. In this case I'll wait for such a ticket and just assume ##contig will always be there. |
Totally confused about what issue I'm responding to via email, sorry for confusion. All is good for now, if htsget is extended to other formats this might need revisited. |
Header requests work whether or not the files contain
|
Just as a random context, in non-htsget with tabix'ed VCF files, we used the names inside the VCF tabix file for discovery of reference sequence names. Indexes would not be available in htsget of course, but it would be nice to have some expectation (if not guarantee) that the refnames would be in a header request |
It would be possible to invent another form of request (e.g. The reasons for adding text like that are
Item 3 represents the expectation you mention; YMMV how best to express that in spec text. |
I'm imagining two forms of 400 error that this could be applied to:
I can't find any indication that there is a easy way to access just the reference_names from an AlignmentHeader, but it would presumably be easy to implement in any htsget server. |
See igvteam/igv.js#1187 for context.
TL;DR:
/cc @mlin
The text was updated successfully, but these errors were encountered: