-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clone read base qualities rather than reference them #4926
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4926 +/- ##
================================================
- Coverage 80.435% 59.149% -21.285%
+ Complexity 17792 12568 -5224
================================================
Files 1090 1095 +5
Lines 64154 64606 +452
Branches 10342 10395 +53
================================================
- Hits 51602 38214 -13388
- Misses 8503 22145 +13642
- Partials 4049 4247 +198
|
@@ -45,9 +45,9 @@ public static void adjustQualsOfOverlappingPairedFragments(final GATKRead clippe | |||
final int numOverlappingBases = Math.min(clippedFirstRead.getLength() - firstReadStop, clippedSecondRead.getLength()); | |||
|
|||
final byte[] firstReadBases = clippedFirstRead.getBases(); | |||
final byte[] firstReadQuals = clippedFirstRead.getBaseQualities(); | |||
final byte[] firstReadQuals = clippedFirstRead.getBaseQualities().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GATKRead.getBases()
and GATKRead.getBaseQualities()
are safe methods and return a copy of the array. Thus, it is not really modified in place. I think that this patch is not needed for that reason.
@DonFreed, I agree with @magicDGS's assessment about it. This feels like a fix that was applied to Gatk3 but doesn't translate to 4? Of course, there could be implementations of GATKRead that don't obey the given contract about copying, but it's worth fixing those since we were more careful to think about copy/no copy when we wrote the new interface. Of note: if you haven't seen it, I'm going to close this. Feel free to reopen if you disagree / can provide a unit test that demonstrates the issue still exists. |
@lbergelson and @magicDGS, thank you for correcting me here. You are both right, this PR is based on an outdated understanding of the behavior of |
@DonFreed Thank you for your continued interest and additions to GATK! |
@lbergelson Could we reopen this? We were able to track down the source of the modified BQs across active regions. At the end of the function
In some cases, the clipped read is actually the original read, modifying the original read's BQ. I've committed a simple fix to my branch that ensures that the clipped read is never the original read. |
@lbergelson Can you re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my comments
@@ -86,7 +86,7 @@ public static void finalizeRegion(final AssemblyRegion region, | |||
if ( ! clippedRead.isEmpty() && clippedRead.getCigar().getReadLength() > 0 ) { | |||
clippedRead = ReadClipper.hardClipToRegion( clippedRead, region.getExtendedSpan().getStart(), region.getExtendedSpan().getEnd() ); | |||
if ( region.readOverlapsRegion(clippedRead) && clippedRead.getLength() > 0 ) { | |||
readsToUse.add(clippedRead); | |||
readsToUse.add((clippedRead == myRead) ? clippedRead.deepCopy() : clippedRead); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can get away with a less-expensive shallow copy (ie., copy()
) rather than a costlier deepCopy()
here?
Also, can you add a unit test that fails before the fix and passes after it?
copy() BQ instead of deepCopy().
@droazen, thanks for the comments. I've added a unit test to demonstrate the issue. The test shows that a shallow copy is fine, so that has been changed as well. |
Re-assigning to @jamesemery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor comments on the test you wrote and we can push this in.
} | ||
|
||
@Test | ||
public void testfinalizeRegion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks good, though in this case since there is only one test in the AssemblyBasedCallerUtilsUnitTests I wouldn't use the @BeforeClass
to initialize files. Instead just put them into static fields or build the header etc... into this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is a good test
|
||
SampleList sampleList = SampleList.singletonSampleList("tumor"); | ||
Byte minbq = 9; | ||
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a comment explaining what this test is doing here, (like explaining why it was failing due to unsafe operations on original reads and read qualities in the engine) so people know what is going on down the line.
Thanks @jamesemery! I've updated the PR to address your comments. |
Thank you @DonFreed for your contribution! |
…valid base qualities (#6943) This fixes the test that was accidentally disabled (when we added a flag for overlapping read adjustment we forgot to set it correctly for the test) and reinstated something like the code from #4926. It should now necessarily be the case that the finalize() method calls read.copy() at least once for every read (though many multiples of once are still quite possible). Fixes #6882
When adjusting the base quality of overlapping read pairs, the modifications are made in place. If the modified reads are later used in another active region, the results from the later active region will be changed by the earlier modification.
This pull request clones the read's base quality, so that in place modifications do not affect later active regions.