Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment and reference sequence from CIGAR and MD #110

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ingolia
Copy link

@ingolia ingolia commented Sep 2, 2018

This is a module that uses CIGAR and MD fields to construct alignments and reconstruct reference sequences from BAM records. I found myself wanting to do this repeatedly, and it isn't actually straightforward. I think these will be generally useful, based on the number of people requesting this feature in various languages.

This code features an MD field parser, a minimal alignment position type that includes only the information directly present in the CIGAR + MD fields, and more "complete" alignment types that use the read sequence to provide complete read and reference sequences directly in the alignment. These are all structured around iterators that generate individual positions, and can easily be collected into a vector if needed. I also added a function to create a bio-type Alignment.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the API could be simplified, see below.

src/bam/md_align.rs Show resolved Hide resolved
@y9c
Copy link

y9c commented Aug 9, 2021

Hi @johanneskoester

I would like to know if there is a function to get alignment pair now. In other similar htslib bindings, the reference sequence with the query sequence can be fetch in the same time.

For example, https://github.com/blachlylab/dhtslib/blob/11be3debdce9feda903b59ddab6fb737dfd9d3fa/source/dhtslib/sam/record.d#L527-L531

@ArtRand
Copy link

ArtRand commented Dec 10, 2022

Is there any particular blocker for this work? I'd be happy to help get it over the line if necessary.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long silence and thanks a lot! Looks good to me in principle (see below)

Comment on lines +947 to +972
quick_error! {
#[derive(Debug,Clone)]
pub enum MDAlignError {
NoMD {
description("no MD aux field")
}
BadMD {
description("bad MD value")
}
MDvsCIGAR {
description("MD inconsistent with CIGAR")
}
BadSeqLen {
description("Sequence/quality length inconsistent with MD/CIGAR")
}
EmptyAlign {
description("Alignment has no positions")
}
ParseInt(err: ::std::num::ParseIntError) {
from()
}
Utf8(err: ::std::str::Utf8Error) {
from()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've moved to thiseeror, hence, this should be adapted and moved into errors.rs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants