-
Notifications
You must be signed in to change notification settings - Fork 111
Conversation
basepairs. In the default forward orientation a sequence is specified by the DNA | ||
letters of its top strand, e.g. in this case GGTGGNG, where N indicates an | ||
example where the top strand is GGTGGAG. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of example below still uses GGTGGNG. Did you want to change A back to an N?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should probably be N. I will fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, when I was editing I half edited N to A, then stopped. I think it would be good to remove N from
the examples, although to say somewhere that N is a legitimate character in a sequence. There are a few
Ns, but they are not typical.
Richard
On 17 Mar 2015, at 04:40, Maciek Smuga-Otto notifications@github.com wrote:
In src/main/resources/avro/common.avdl #250 (comment):
-```
-5' ------- ------------------- 3'| <- top strand nick at position (2,+)
G G T G G N G
-3' ---------------- ---------- 5'C C A C C N C
- | <- bottom strand nick at position (3,-)
- +0- +1- +2- +3- +4- +5- +6- <- coordinates
-```-A sequence is a piece of double-stranded DNA composed of a series of DNA
-basepairs. In the default forward orientation a sequence is specified by the DNA
-letters of its top strand, e.g. in this case GGTGGNG, where N indicates an
+example where the top strand is GGTGGAG.
+
Example below still uses GGTGGNG. Do you want to change N->A throughout the rest of the example?—
Reply to this email directly or view it on GitHub https://github.com/ga4gh/schemas/pull/250/files#r26549104.
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
39ae636
to
a74921a
Compare
a terminal side. The value of `startJoin` or `endJoin`, if set, is the side to | ||
which the corresponding side of this `Sequence` is connected. | ||
A `Segment` may have zero length (for example, when it is being used to | ||
specify a `Path` consisting onyl of a `Join`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: onyl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am fixing this.
Thanks for the updates @adamnovak. The model itself seems sensible enough to me, but I'm really struggling to understand how the API would work. How would a client do something useful with this? In particular, I don't understand what the client is supposed to do with |
+1 As an aside, I agree with @jeromekelleher:
But, I think we're in a bit of a chicken/egg situation (e.g., we can't reason about how to perform common query X, unless we've got a data model). I think this is a data model that we can work with, and we'll figure out what the query patterns are by using the data model. |
+1 Critical to "how do we do useful things" question is adding in variant On Wed, Mar 18, 2015 at 9:54 AM, Frank Austin Nothaft <
|
I'm basically +1 on the data model here, but I've still got issues with the API methods. Perhaps we could break these into separate PRs, so we're not holding up implementation of the data model? |
@jeromekelleher Interesting suggestion on splitting the PR. I know it would help me a lot to have the basic graph model stabilized, even as the discussion on API methods and Alleles carries on. @adamnovak what do you think? |
I think @jeromekelleher's suggestion is a great suggestion. |
*/ | ||
|
||
record ReferenceSet { | ||
/** The reference set ID. Unique in the repository. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ReferenceSet
s are now meant to be composeable, should this be explicitly added somewhere in the definition of the object here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in the definition (see includedReferenceSets
), but it should be in the doc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamnovak oops... was accidentally searching on master, not your PR version of the file. Sorry!
I do see how it would be good to hash out things like Would people be OK with that? |
I am OK with that. |
OK, in response to @diekhans's comments about needing a good way to specify a Position without a strand, I now have a @diekhans, is that good for your use case? I didn't find anything in the metadata API to switch over to the new type. |
Re. splitting the PR, I think it's totally fine if we're temporarily missing some methods from our git repo. |
OK, here's a version with no changes to the methods files. |
Thanks @adamnovak, +1. We should squash the commits - I guess one Adam commit, one Richard commit, one Adam commit is the most appropriate? |
OK I've squashed everything down (even though I have heard it is rude to rewrite other people's commits). What do you think? |
Looks good Adam - we could probably tidy up the commit comments a bit though. Stuff like "Rolling methods back" doesn't really tell future-us very much about what this commit actually does. |
I'll merge this in the next 24 hours (unless Jerome feels strongly about tidying up the commit comments). |
There has been much confusion caused by the lack of a `Sequence` type, and instead attempting to represent everything with `Segment`s. This is an attempt to add in a `Sequence` and see what things look like. `VariantSet`s now contain extra `Sequence`s instead of estra `Segment`s. `Segment`s don't have joins any more, since they now only are path components, and thus don't need them. `Path`s have optional joins at their ends, if they need to include adjacencies before and after the bases they cover.
…ct, David H and Heng. The graph model now has Sequences and Joins, which join two arbitrary oriented positions (sometimes known as sides) in sequences. We also remove the NO_STRAND option in the Strand enum - this should be done by setting the variable to null. Revised graph representation with Sequence and Join without start/endJoin. Also added isPrimary to Reference, and a couple more changes. Add includedReferenceSets to permit ReferenceSets to be based on others.
OK, I've cleaned up the concatenated commit messages slightly. |
Describing sticky end handling Clarifying Segments and join semantics Making the new `ReferenceSet` composing system more fleshed out and described. Clarifying the semantics in `SearchSequencesRequest` and `SearchJoinsRequest`. Removing Position, calling it Side instead Adding a new, un-oriented Position I'm going to make the `Sequence`/`Join` method changes later.
Looks good, thanks @adamnovak. Merging. |
Side graphs: `Sequence`s and `Join`s
Here's a draft of what our graph model would look like if instead of the "insert graphs" we are using now (where sequences come with associated end joins), we used a "side graph". In a side graph, we split out
Sequence
s (which are just strings with IDs) andJoin
s (which bind two sides of two bases in two sequences together). Both kinds of objects, in this world, can occur inReferenceSet
s andVariantSet
s.I've made a bunch of API changes to see how that would actually work: creating a
Sequence
object, to start with. @richarddurbin madeJoin
s, and then I changed the methods for searching to give us nice cleansearchSequences()
andsearchJoins()
methods to get them.@richarddurbin: is this what you were going for?
@lh3: have I captured side graphs correctly?
All: do we actually like side graphs better than insert graphs?
I don't expect this to be merged for a while, but I think we need a concrete proposal to help anchor the discussion in #243 and its ensuing massive e-mail thread.