Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong filtering in Alignment metrics #943

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Ksenomorfa
Copy link

Description

Comment in AlignmentSummaryMetricsCollector method collectReadData() says: " // NB: for read count metrics, do not include supplementary records, but for base count metrics, do include supplementary records". But supplementary records are pre-filtered at super level in method acceptRecord() in the same class, because only non-secondary and non-supplementary records are accepted issue.

Implementation

According to the issue:

  • Added check for supplementary flag before increment histogram for metrics.BAD_CYCLES in method collectQualityData();
  • Changed docs of BAD_CYCLES metric in AlignmentSummaryMetrics;
  • Created testSupplementaryReadsAndBases() and tested SAM-file containing supplementary alignment.

Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

@yfarjoun yfarjoun requested a review from nh13 September 29, 2017 11:39
@coveralls
Copy link

coveralls commented Sep 29, 2017

Coverage Status

Coverage increased (+0.006%) to 76.775% when pulling 2b99ffb on EpamLifeSciencesTeam:epam-ls_WrongFilteringInAlignmentMetrics into d4b04a9 on broadinstitute:master.

@yfarjoun
Copy link
Contributor

@nh13, I think that you are responsible for these NB's....will have time to take a look at these changes?

Copy link
Collaborator

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Some discussion about getSupplementaryAlignmentFlag is still needed.

@@ -270,7 +268,8 @@ private void collectQualityData(final SAMRecord record, final ReferenceSequence
// NB: for read count metrics, do not include supplementary records, but for base count metrics, do include supplementary records.

// If the read isn't an aligned PF read then look at the read for no-calls
if (record.getReadUnmappedFlag() || record.getReadFailsVendorQualityCheckFlag() || !doRefMetrics) {
if (!record.getSupplementaryAlignmentFlag() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check added getSupplementaryAlignmentFlag? I thought we wanted to have supplementary alignments add to these metrics (ex. PF_HQ_MEDIAN_MISMATCHES)? @yfarjoun want to weigh in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the idea is that a supplementary read should not be considered for the first part (namely updating badCycles histogram on unmapped reads) but, it should be considered for the "else" part....I agree with that sentiment, but I also agree with you that it wasn't entirely well thought out, in particular: a supplementary alignment will cause the code to enter the second part even all the other conditions (mapped, passesFilter, refMetrics) are false.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed if-else condition, sorry for this omission.

@@ -314,7 +313,8 @@ else if (!record.getReadFailsVendorQualityCheckFlag()) {
if (mismatch) hqMismatchCount++;
}

if (mismatch || SequenceUtil.isNoCall(readBases[readBaseIndex])) {
if (!record.getSupplementaryAlignmentFlag()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Author

@Ksenomorfa Ksenomorfa Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we can leave this "if" condition for supplementary reads (according to yfarjoun comment)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is right.. @nh13 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no. sorry. this is more complex. Given that we are only looping over the alignment blocks, we need to look at the supplemental reads in order to see the bases....I guess, it's as @nh13 said in the code comment: for metrics that have to do with bases (so the badCycleHistogram is one), we want to see the supplemental reads, but for metrics that have to do with reads, we do not. I think that this means that supplemental reads should be filtered from collectReadData (if supplemental, return) but not from collectQualityData.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this! Okay, so I want to make sure that I understand right.. :) I can remove all filtering for supplementary from collectQualityData() , and leave check for secondary in acceptRecord() - this will be necessary logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right. @nh13 ?

@@ -79,9 +79,7 @@ public AlignmentSummaryMetricsCollector(final Set<MetricAccumulationLevel> accum

@Override
public void acceptRecord(final SAMRecord rec, final ReferenceSequence ref) {
if (!rec.isSecondaryOrSupplementary()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include secondary reads as well? I think not. we should probably filter secondary reads here. Even if they are filtered upstream somehow (I don't think that they are) this program should be clear that secondary alignements are filtered out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added check for secondary read here.

@@ -270,7 +268,8 @@ private void collectQualityData(final SAMRecord record, final ReferenceSequence
// NB: for read count metrics, do not include supplementary records, but for base count metrics, do include supplementary records.

// If the read isn't an aligned PF read then look at the read for no-calls
if (record.getReadUnmappedFlag() || record.getReadFailsVendorQualityCheckFlag() || !doRefMetrics) {
if (!record.getSupplementaryAlignmentFlag() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the idea is that a supplementary read should not be considered for the first part (namely updating badCycles histogram on unmapped reads) but, it should be considered for the "else" part....I agree with that sentiment, but I also agree with you that it wasn't entirely well thought out, in particular: a supplementary alignment will cause the code to enter the second part even all the other conditions (mapped, passesFilter, refMetrics) are false.

* Fix if-else condition of supplementary reads
@SQ SN:chr6 LN:101
@SQ SN:chr8 LN:202
@RG ID:0 SM:Hi,Momma! LB:whatever PU:me PL:ILLUMINA
SL-XAV:1:1:0:700#0/2 137 chr6 1 255 101M * 0 0 NAATTGTTCTNAGTTTCTCGGTTTATGTGCTCTTCCAGGTGGGTAACACAATAATGGCCTTCCAGATCGTAAGAGCGACGTGTGTTGCACNAGTGTCGATC &0::887::::6/646::838388811/679:87640&./2+/-4/28:3,536/4''&&.78/(/554/./02*)*',-(57()&.6(6:(0601'/(,* RG:Z:0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good if this example were more realistic, so that the primary read were only aligned for, say 60 bases and the supplementary read would be aligned for the remaining 41 bases. also, the bases and the qualities should be the same for the read and its supplemental alignment....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PolinaBevad are you going to add a more realistic test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants