-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment and reference sequence from CIGAR and MD #110
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think the API could be simplified, see below.
…all other iterators & position representations. Updated record interface accordingly.
I would like to know if there is a function to get alignment pair now. In other similar htslib bindings, the reference sequence with the query sequence can be fetch in the same time. |
Is there any particular blocker for this work? I'd be happy to help get it over the line if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long silence and thanks a lot! Looks good to me in principle (see below)
quick_error! { | ||
#[derive(Debug,Clone)] | ||
pub enum MDAlignError { | ||
NoMD { | ||
description("no MD aux field") | ||
} | ||
BadMD { | ||
description("bad MD value") | ||
} | ||
MDvsCIGAR { | ||
description("MD inconsistent with CIGAR") | ||
} | ||
BadSeqLen { | ||
description("Sequence/quality length inconsistent with MD/CIGAR") | ||
} | ||
EmptyAlign { | ||
description("Alignment has no positions") | ||
} | ||
ParseInt(err: ::std::num::ParseIntError) { | ||
from() | ||
} | ||
Utf8(err: ::std::str::Utf8Error) { | ||
from() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've moved to thiseeror, hence, this should be adapted and moved into errors.rs.
This is a module that uses CIGAR and MD fields to construct alignments and reconstruct reference sequences from BAM records. I found myself wanting to do this repeatedly, and it isn't actually straightforward. I think these will be generally useful, based on the number of people requesting this feature in various languages.
This code features an MD field parser, a minimal alignment position type that includes only the information directly present in the CIGAR + MD fields, and more "complete" alignment types that use the read sequence to provide complete read and reference sequences directly in the alignment. These are all structured around iterators that generate individual positions, and can easily be collected into a vector if needed. I also added a function to create a bio-type Alignment.