-
Notifications
You must be signed in to change notification settings - Fork 111
Adding sequence records, and cleaning up reference records. #162
Conversation
Hi Frank,
CH |
@calbach Certainly! Right now, the API assumes that the only contig sequences which will accessed using the GA4GH APIs are reference sequences. In reality, reference sequences are a subset of possible contigs someone might want to access. For example, I could want to store contigs that represent:
Essentially, this provides a more generic approach for accessing contigs. As for the fragment:sequence question, the assumption is that fragments would be non-overlapping continuous subsequences from a contig sequence. Essentially, this would be the preferred way of implementing pagination for contig sequences. |
+1 I was wondering when we would pick up on this discussion from #133 (diff) - glad to see we're allowing for these possibilities :) |
Hello @fnothaft and @jeromekelleher, I can appreciate the concept of a more generic Sequence. I see that building it from Fragments addresses practical issues in very long I would be interested to discuss on Friday. As a side effect, you have made a major change in that you now have Reference point to its referenceSet, rather than ReferenceSet contain a list of the references On the positive side, I like the change to an array of derivedFrom and sourceDivergence. Richard On 14 Oct 2014, at 00:08, Frank Austin Nothaft notifications@github.com wrote:
The Wellcome Trust Sanger Institute is operated by Genome Research |
After discussing this with @cassiedoll and @calbach today, I think we can simplify this a great deal and address @richarddurbin's concern. The key issue that we were trying to deal with here is to give a consistent way of paging over large numbers of objects. This is complicated in this case, as we are really just trying to page over a large string and not over a set of well defined objects. A more effective way to do this might be to simply stream these large strings over HTTP using a GET method (with optional bounds), and avoid dealing with this complication at the GA4GH protocol level. This gets rid of the need for However, there are some changes here that we still think are useful. @franknothaft, I'll email you offline with the details, and hopefully we can update the PR. |
@jeromekelleher Sounds good to me; this:
Sounds like a good idea here. |
I wouldn't throw out the baby with the bathwater yet. Actually if the source type (i.e. GAReadSet, GAReference, GAVariantSet) is also specified in GASequence then we should be able to reconstruct what GAReference{Set} it is derived from. We should be able to add start/end as necessary, but by going up the chain to the reference it would be easy to regenerate the full sequence in a pipeline module. Let's keep working on this, since we're on a good track and almost there. |
@jeromekelleher @fnothaft - any update on this PR? is there going to be a new one or an updated version? |
I'm happy to close the PR at this point - it's probably easier to make a new one when we hit the issues involved in the reference server. What do you think @fnothaft? |
Is there interest in continuing with this, or closing it? |
@fnothaft, of course there is interest! I think working with RNA-seq data would be quite important. What is the concern? |
I think we should think about what we want to do with this in light of #238. Right now in the API we have a concept of a "sequence", which is a string with an associated ID and end joins onto other sequences, and we have a way to page through the bases of a sequence. We also still have the The simplest thing would be to close this up and run with the sequence system we have, but I still think what we have is kinda ugly, and so would be open to changing it around. |
I vote to close this - it would be simpler to bring in the aspects we want to keep as a new PR than to update this one. |
I agree with @jeromekelleher. There are a couple of issues that were discussed in the conversation, and due to the age of the issue, we should refresh ourselves on the motivation for the PR. If no one objects, I will close this issue on Friday Feb 27th @ 8AM UTC. |
I am in agreement about closing this. |
update variant annotation test data with HGVSg, names and metadata
@jeromekelleher and I plotted this one out this PM; we think this is a more consistent and general approach to serving references and sequences. Specifically, we:
GASequence
andGASequenceFragment
models. These models define a generic contig sequence, which could be used to describe a reference assembly, a de novo assembly, variant calls spliced back into a reference assembly, etc...GAReference
andGAReferenceSet
to make the relations of the models slightly cleaner/clearer./references/{id}/bases
GET request. Instead, we have added/sequences/{id}
and /sequencefragments/{id}` GET requests, which are more general.