Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for mdtag issues with insertions #748

Merged
merged 1 commit into from
Jul 30, 2015

Conversation

arahuja
Copy link
Contributor

@arahuja arahuja commented Jul 29, 2015

Sorry all, looks the code in #727 contains a few errors when it came to parsing mismatches mixed with insertions. I added a quite a few tests to cover this case and other complex mdtags.

If you think there are interesting cases not covered in the new tests now let me know!

Thanks!

cigarIdx += 1
cigarOperatorIndex = 0
}
case _ => {
cigarIdx += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the cigarOperatorIndex = 0 not necessary here?

mind commenting these last two cases a little bit? I assume the main interesting cigar operator being passed over here is I/insertion, which MDTags don't care about... any other cases being handled here worth noting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cigarOperatorIndex = 0 is not necessary, it shouldn't be updated, but just added it for clarity.

The last case is insertion or clipped bases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't matter whether you set cigarOperatorIndex = 0 here or not? That's also confusing to me, I don't fully have my head around this.

Is the second-to-last case one that you specifically added for Ns? Is that the main focus there?

Could you add comments clarifying which operators you expect to hit each of these last two? At this point in the match I'm struggling to keep in my head what's left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the second to last case is for N

assert(tag.toString === "2A4A2")
}

test("Get correct matches for mdtag with intron and deletion between mismatches") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, copy mcpastington? same test as above, different name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I should was trying to think of more cases, but didn't add the test actually

@ryan-williams
Copy link
Member

added some of my nits in hammerlab#2 including the check that the number of deleted bases is consistent between cigar and MDTag.

thoughts on some test cases verifying that IAEs are thrown when the various assumptions are broken?

@arahuja
Copy link
Contributor Author

arahuja commented Jul 30, 2015

@ryan-williams I fixed the dupe test and added a test case for the exception you added (mdtag and cigar disagreeing on deleted bases)

I also made the match statement explicit for the operators they were catching

re: cigarOperatorIndex, the only operator never traversed in one shot is M since you can look for matches for mismatches, therefore for every other operator cigarOperatorIndex is always 0

@ryan-williams
Copy link
Member

cool, thanks @arahuja. LGTM once tests pass. Last complete build failed due to needing ./scripts/format-source btw

@arahuja
Copy link
Contributor Author

arahuja commented Jul 30, 2015

Ah ok, good to know, I didn't realize this wasn't auto-formatting, I'll run that, squash etc.

@fnothaft
Copy link
Member

LGTM as well.

ryan-williams added a commit that referenced this pull request Jul 30, 2015
Fix for mdtag issues with insertions
@ryan-williams ryan-williams merged commit 312aea7 into bigdatagenomics:master Jul 30, 2015
@ryan-williams ryan-williams deleted the mdtag-more-fix branch July 30, 2015 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants