Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug where overlapping reads in subsequent regions can have invalid base qualities #6943

Merged
merged 3 commits into from
Nov 24, 2020

Conversation

jamesemery
Copy link
Collaborator

@jamesemery jamesemery commented Nov 6, 2020

This fixes the test that was accidentally disabled (when we added a flag for overlapping read adjustment we forgot to set it correctly for the test) and reinstated something like the code from #4926. It should now necessarily be the case that the finalize() method calls read.copy() at least once for every read (though many multiples of once are still quite possible).

Fixes #6882

…ets coppied without doing it again for every read
@jamesemery jamesemery requested a review from droazen November 6, 2020 18:36
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a slightly different approach here @jamesemery

@@ -114,6 +118,9 @@ public static void finalizeRegion(final AssemblyRegion region,
.filter(read -> !read.isEmpty() && read.getCigar().getReadLength() > 0)
.map(read -> ReadClipper.hardClipToRegion(read, region.getPaddedSpan().getStart(), region.getPaddedSpan().getEnd() ))
.filter(read -> read.getStart() <= read.getEnd() && read.getLength() > 0 && read.overlaps(region.getPaddedSpan()))
// The transient attribute is preserved across copy operations and all of the previous alterations make copies, this simple ensures
// that any reads that have not been copied along the way are copied here for safety.
.map(read -> (read.getTransientAttribute("Original") != read? read : read.copy()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on transient attributes (which, in my opinion, is brittle and a recipe for future problems), compare the memory address of the read pre/post clipping, and make a copy only if the address hasn't changed, as in the original patch in #4926. Rewrite this method to use a loop instead of streaming, if necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten it as a loop at the expense of angering the Aesthetic gods. Can you re-review?

@jamesemery
Copy link
Collaborator Author

@droazen can this bbe merged?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Back to you with a few questions on this one

if (read.getStart() <= read.getEnd() && read.getLength() > 0 && read.overlaps(region.getPaddedSpan())) {
// NOTE: here we make a defensive copy of the read if it has not been modified by the above operations
// which might only make copies in the case that the read is actually clipped
readsToUse.add(read == originalRead? read.copy() : read);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy() method performs a shallow copy -- it ends up calling SAMRecord.clone(), which copies all fields in the SAMRecord as if by assignment (the exception is the attributes array, which is explicitly copied). So the byte[] arrays for the bases and quals point to the same memory location in the copy as in the original. Is this a problem? Do we later modify the bases/quals in-place somewhere, or do we always copy the bases/quals arrays due to the defensive copies in SAMRecordToGATKReadAdapter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do modify them later, specifically when we call cleanOverlappingReadPairs() we modify the base qualities in place for reads that overlap and if any of those reads have not been clipped bby the clipping code here this could result in the next assembly region having invalid/wrong base qualities for the same read. Given how that method is structured it is non-trivial to refactor it to make the copy only in the event it needs to be modified and it seemed easier to just put a check in to make sure that every read is deeply copied at least once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery I disagree with your assessment that cleanOverlappingReadPairs() changes the bases/quals in-place -- it calls getBases() and getBaseQualities() on the read, which perform defensive copies. The question is: is there any code that calls getBasesNoCopy() and/or getBaseQualitiesNoCopy() and truly modifies the bases/quals arrays in place?

Copy link
Contributor

@droazen droazen Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the PileupReadErrorCorrector calls getBaseQualitiesNoCopy() and getBasesNoCopy() and then modifies the bases/quals in-place. This could have the effect of modifying the base/qual arrays in the original reads that will be used in subsequent assembly regions. I think as part of this PR we should patch PileupReadErrorCorrector to call the getters that make copies, and then call the setter methods to update the bases/quals.

We should also check for additional problematic usages of the *NoCopy() methods in the HC/M2 codepaths.

@@ -64,7 +64,7 @@ public void testfinalizeRegion() {
activeRegion.addAll(reads);
SampleList sampleList = SampleList.singletonSampleList("tumor");
Byte minbq = 9;
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList, false, false);
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList, true, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that the modified version of this test fails without the copy() call you added above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already tested it. It does fail without the above fix. the problem is that when we added correctOverlappingBaseQualities as an argument to finalizeRegion() it was mistakenly set to false for this method, which means this test was doing nothing and thus we missed that the last round of refactoring broke the assertion in this test by allowing un-coppied base qualities to be adjusted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment then mentioning that that boolean argument is critical for the test to be meaningful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droazen I added comments better explaining how the test works. Is this okay to merge?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@droazen droazen merged commit 3be2f38 into master Nov 24, 2020
@droazen droazen deleted the je_fixOverlappingReadsModifyingAcrossRegions branch November 24, 2020 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

finalizeRegion modifies read base qualities in place
2 participants