Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htsget variants middleware #983

Closed
brainstorm opened this issue Jun 17, 2021 · 3 comments
Closed

htsget variants middleware #983

brainstorm opened this issue Jun 17, 2021 · 3 comments
Assignees

Comments

@brainstorm
Copy link
Contributor

As discussed in samtools/htsjdk#1555, @jrobinso will provide some wrapping class similar to the HtsgetBAMReader one to finish up the htsget PR, but for Variants (VCFs and BCFs) by using Tribbles classes present in IGV.

This is meant to be a compromise solution while htsjdk gets such htsget+variants abstraction upstream (htsjdk only provides HtsGet+BAMReader is available at the time of writing this).

brainstorm added a commit to umccr/igv that referenced this issue Jun 17, 2021
@jrobinso
Copy link
Contributor

jrobinso commented Jul 13, 2021

Note to self, example data: https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8?format=VCF&referenceName=8&start=128732400&end=128770475

See igvteam/igv.js#1187 for more.

@brainstorm Is BCF supported on any available server? I'm not sure how easy that will be to do, I'm going to focus on VCF.

@brainstorm
Copy link
Contributor Author

Yes, BCF it is supported on our upcoming Rust htsget implementation: https://github.com/umccr/htsget-rs

Totally reasonable and fine to focus on VCF first though, since it'll see the most users I reckon. Thanks Jim!

@jrobinso
Copy link
Contributor

@brainstorm I've added minimal support in the "htsget" branch. You should be able to load vcf variants by URL by entering the full url, including dataset ID, in "load from URL". For example. #983

This is very minimal, and data: URIs are not supported, consider it a prototype. If you want to add support for data URIs or other missing features go for it.

To determine if the server is an htsget variant source a "header" query is made, then a test is made for the htsget container, and the format is checked (VCF only at this time). This could be generalized for the BAM reader when it is ready. Since this is checked after everything else fails overhead for non-htsget users is minimal. See TrackLoader, line ~209. Obviously this could be more elegant.

brainstorm added a commit to umccr/igv that referenced this issue Jul 16, 2021
…=header' URL addition in HtsgetReader.getReader() returns errors igvteam#983 (comment), needs more work, I'm surprised it worked for @jrobinso at all unless the merge introduced other artifacts :-S
@jrobinso jrobinso self-assigned this Jul 20, 2021
@jrobinso jrobinso added this to the 2.11.0 milestone Jul 20, 2021
jrobinso added a commit that referenced this issue Jul 23, 2021
* Change rule to trigger whole genome view -- consider only "long" chromosomes, that is chromosomes that contribute to the whole genome view.

* update hosted genomes

* Add feature source for htsget VCF variant service.

* Support of htsget variant sources.  See issue #983

* Handle htsget endpoints with query strings.

* update htsjdk

* Support BAM (and CRAM?) format from htsjdk servers.

* Set "type" (format) on ResourceLocator for htsget services -- needed to trigger panel creation for alignment tracks

* Support UCSC blat web service.  Replacement for screen-scraping website blat.   Fixes #913

* * Add support for command line blat
* Refactor Runtime process support
* Remove deprecated code

* Final tweaks
@jrobinso jrobinso modified the milestones: 2.11.0, 2.10.3 Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants