Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone read base qualities rather than reference them #4926

Merged
merged 4 commits into from
Sep 5, 2018

Conversation

DonFreed
Copy link
Contributor

When adjusting the base quality of overlapping read pairs, the modifications are made in place. If the modified reads are later used in another active region, the results from the later active region will be changed by the earlier modification.

This pull request clones the read's base quality, so that in place modifications do not affect later active regions.

@codecov-io
Copy link

codecov-io commented Jun 20, 2018

Codecov Report

Merging #4926 into master will decrease coverage by 21.285%.
The diff coverage is 100%.

@@               Coverage Diff                @@
##              master     #4926        +/-   ##
================================================
- Coverage     80.435%   59.149%   -21.285%     
+ Complexity     17792     12568      -5224     
================================================
  Files           1090      1095         +5     
  Lines          64154     64606       +452     
  Branches       10342     10395        +53     
================================================
- Hits           51602     38214     -13388     
- Misses          8503     22145     +13642     
- Partials        4049      4247       +198
Impacted Files Coverage Δ Complexity Δ
...kers/haplotypecaller/AssemblyBasedCallerUtils.java 63.366% <100%> (-6.931%) 22 <0> (-3)
...broadinstitute/hellbender/utils/svd/SimpleSVD.java 0% <0%> (-100%) 0% <0%> (-5%)
...ogramgroups/ShortVariantDiscoveryProgramGroup.java 0% <0%> (-100%) 0% <0%> (-3%)
...adinstitute/hellbender/utils/R/RScriptLibrary.java 0% <0%> (-100%) 0% <0%> (-6%)
...ls/walkers/mutect/M2FiltersArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-1%)
...odecs/gencode/GencodeGtfSelenocysteineFeature.java 0% <0%> (-100%) 0% <0%> (-4%)
...number/gcnv/GermlineCNVSegmentVariantComposer.java 0% <0%> (-100%) 0% <0%> (-6%)
...ctions/OptionalFeatureInputArgumentCollection.java 0% <0%> (-100%) 0% <0%> (-1%)
...ecaller/graphs/DeadEndKBestSubHaplotypeFinder.java 0% <0%> (-100%) 0% <0%> (-10%)
...otypecaller/RandomLikelihoodCalculationEngine.java 0% <0%> (-100%) 0% <0%> (-6%)
... and 614 more

@@ -45,9 +45,9 @@ public static void adjustQualsOfOverlappingPairedFragments(final GATKRead clippe
final int numOverlappingBases = Math.min(clippedFirstRead.getLength() - firstReadStop, clippedSecondRead.getLength());

final byte[] firstReadBases = clippedFirstRead.getBases();
final byte[] firstReadQuals = clippedFirstRead.getBaseQualities();
final byte[] firstReadQuals = clippedFirstRead.getBaseQualities().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GATKRead.getBases() and GATKRead.getBaseQualities() are safe methods and return a copy of the array. Thus, it is not really modified in place. I think that this patch is not needed for that reason.

@lbergelson
Copy link
Member

@DonFreed, I agree with @magicDGS's assessment about it. This feels like a fix that was applied to Gatk3 but doesn't translate to 4? Of course, there could be implementations of GATKRead that don't obey the given contract about copying, but it's worth fixing those since we were more careful to think about copy/no copy when we wrote the new interface.

Of note: if you haven't seen it, GATKRead provides a set of unsafe getBaseQualitiesNoCopy() methods for times when the copy is a performance bottleneck and you can guarantee safe use of the underlying array.

I'm going to close this. Feel free to reopen if you disagree / can provide a unit test that demonstrates the issue still exists.

@lbergelson lbergelson closed this Jun 21, 2018
@DonFreed
Copy link
Contributor Author

DonFreed commented Jun 22, 2018

@lbergelson and @magicDGS, thank you for correcting me here. You are both right, this PR is based on an outdated understanding of the behavior of GATKRead.getBaseQualities(). Thanks for closing it!

@DonFreed DonFreed deleted the df_cloneReadBQs branch June 22, 2018 06:39
@lbergelson
Copy link
Member

@DonFreed Thank you for your continued interest and additions to GATK!

@DonFreed DonFreed restored the df_cloneReadBQs branch July 3, 2018 17:50
@DonFreed
Copy link
Contributor Author

DonFreed commented Jul 3, 2018

@lbergelson Could we reopen this? We were able to track down the source of the modified BQs across active regions.

At the end of the function adjustQualsOfOverlappingPairedFragments(final GATKRead clippedFirstRead, final GATKRead clippedSecondRead), the base qualities are set in the clipped reads:

clippedFirstRead.setBaseQualities(firstReadQuals);
clippedSecondRead.setBaseQualities(secondReadQuals);

In some cases, the clipped read is actually the original read, modifying the original read's BQ. I've committed a simple fix to my branch that ensures that the clipped read is never the original read.

@lbergelson lbergelson reopened this Jul 3, 2018
@droazen
Copy link
Contributor

droazen commented Jul 6, 2018

@lbergelson Can you re-review?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my comments

@@ -86,7 +86,7 @@ public static void finalizeRegion(final AssemblyRegion region,
if ( ! clippedRead.isEmpty() && clippedRead.getCigar().getReadLength() > 0 ) {
clippedRead = ReadClipper.hardClipToRegion( clippedRead, region.getExtendedSpan().getStart(), region.getExtendedSpan().getEnd() );
if ( region.readOverlapsRegion(clippedRead) && clippedRead.getLength() > 0 ) {
readsToUse.add(clippedRead);
readsToUse.add((clippedRead == myRead) ? clippedRead.deepCopy() : clippedRead);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can get away with a less-expensive shallow copy (ie., copy()) rather than a costlier deepCopy() here?

Also, can you add a unit test that fails before the fix and passes after it?

@droazen droazen assigned droazen and unassigned lbergelson Jul 6, 2018
copy() BQ instead of deepCopy().
@DonFreed
Copy link
Contributor Author

@droazen, thanks for the comments. I've added a unit test to demonstrate the issue. The test shows that a shallow copy is fine, so that has been changed as well.

@droazen droazen assigned lbergelson and unassigned droazen Jul 12, 2018
@droazen droazen assigned jamesemery and unassigned lbergelson Aug 27, 2018
@droazen droazen requested a review from jamesemery August 27, 2018 19:13
@droazen
Copy link
Contributor

droazen commented Aug 27, 2018

Re-assigning to @jamesemery

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor comments on the test you wrote and we can push this in.

}

@Test
public void testfinalizeRegion() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks good, though in this case since there is only one test in the AssemblyBasedCallerUtilsUnitTests I wouldn't use the @BeforeClass to initialize files. Instead just put them into static fields or build the header etc... into this test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is a good test


SampleList sampleList = SampleList.singletonSampleList("tumor");
Byte minbq = 9;
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a comment explaining what this test is doing here, (like explaining why it was failing due to unsafe operations on original reads and read qualities in the engine) so people know what is going on down the line.

@DonFreed
Copy link
Contributor Author

DonFreed commented Sep 4, 2018

Thanks @jamesemery! I've updated the PR to address your comments.

@jamesemery
Copy link
Collaborator

Thank you @DonFreed for your contribution!

@jamesemery jamesemery merged commit 916d0ab into broadinstitute:master Sep 5, 2018
@DonFreed DonFreed deleted the df_cloneReadBQs branch September 5, 2018 18:39
droazen pushed a commit that referenced this pull request Nov 24, 2020
…valid base qualities (#6943)

This fixes the test that was accidentally disabled (when we added a flag for overlapping read adjustment we forgot to set it correctly for the test) and reinstated something like the code from #4926. It should now necessarily be the case that the finalize() method calls read.copy() at least once for every read (though many multiples of once are still quite possible).

Fixes #6882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants