-
Notifications
You must be signed in to change notification settings - Fork 112
Should GA4GH.cigar be something other than a string? #8
Comments
This makes sense. Here's a snippet of schema from an email I sent to the read task team mailing list...
|
Btw, this format would require only 1 + 2n bytes of space where n is the number of events. Just as compact as a string based CIGAR and faster to parse. |
That looks good for a v1. I hope that at a later point we could consolidate some of the types though. (perhaps get rid of alignment_match and force sequence match/mismatch to be used or vice versa, etc) (passing should probably be padding btw) |
The sequence (mis)match semantics are rarely used because the MD tag provides that information (and more) in a more compact space. The MD tag also also for calculation of edit distance (which the CIGAR string does not). From the SAM spec...
|
Right so, we could get rid of those two types :) |
Sorry, I misread your comment to mean we drop ALIGNMENT_MATCH and use SEQUENCE_(MIS)MATCH exclusively. I see the vice versa now. :) I guess this raises the issue -- we haven't specified an MD tag analog. |
Sounds like a new read field to me - we should probably make a new bug. |
+1 Created #9 |
On yesterday's Reads task team call, Frank Nothaft volunteered to submit a pull request and drive this to resolution. (Thanks Frank!) |
Closing (addressed in #33) |
(Issue split out from #3)
We might consider an alternative representation of the cigar that requires less regex usage while still being compact.
(maybe a more formal structure, like [{type: deletion, count: 10}, {type: match, count: 30}]? or anything else that might help with parsing and be easy for api providers to support)
The text was updated successfully, but these errors were encountered: