-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a bug where overlapping reads in subsequent regions can have invalid base qualities #6943
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,6 +53,8 @@ public void testfinalizeRegion() { | |
|
||
SAMLineParser parser = new SAMLineParser(header); | ||
List<GATKRead> reads = new LinkedList<GATKRead>(); | ||
// NOTE: These reads are mates that overlap one-another without agreement which means they should have modified base qualities after calling finalize() | ||
// Read2 has a clean cigar, and thus will not be copied by the clipping code before being fed to the overlapping bases code. This test asserts that its still copied. | ||
SAMRecord orgRead0 = parser.parseLine("HWI-ST807:461:C2P0JACXX:4:2204:18080:5857\t83\t1\t42596803\t39\t1S95M5S\t=\t42596891\t-7\tGAATCATCATCAAATGGAATCTAATGGAATCATTGAACAGAATTGAATGGAATCGTCATCGAATGAATTGAATGCAATCATCGAATGGTCTCGAATAGAAT\tDAAAEDCFCCGEEDDBEDDDGCCDEDECDDFDCEECCFEECDCEDBCDBDBCC>DCECC>DBCDDBCBDDBCDDEBCCECC>DBCDBDBGC?FCCBDB>>?\tRG:Z:tumor"); | ||
SAMRecord orgRead1 = parser.parseLine("HWI-ST807:461:C2P0JACXX:4:2204:18080:5857\t163\t1\t42596891\t39\t101M\t=\t42596803\t7\tCTCGAATGGAATCATTTTCTACTGGAAAGGAATGGAATCATCGCATAGAATCGAATGGAATTAACATGGAATGGAATCGAATGTAATCATCATCAAATGGA\t>@>:ABCDECCCEDCBBBDDBDDEBCCBEBBCBEBCBCDDCD>DECBGCDCF>CCCFCDDCBABDEDFCDCDFFDDDG?DDEGDDFDHFEGDDGECB@BAA\tRG:Z:tumor"); | ||
|
||
|
@@ -64,9 +66,10 @@ public void testfinalizeRegion() { | |
activeRegion.addAll(reads); | ||
SampleList sampleList = SampleList.singletonSampleList("tumor"); | ||
Byte minbq = 9; | ||
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList, false, false); | ||
// NOTE: this test MUST be run with correctOverlappingBaseQualities enabled otherwise this test can succeed even with unsafe code | ||
AssemblyBasedCallerUtils.finalizeRegion(activeRegion, false, false, minbq, header, sampleList, true, false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you confirm that the modified version of this test fails without the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have already tested it. It does fail without the above fix. the problem is that when we added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a comment then mentioning that that boolean argument is critical for the test to be meaningful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @droazen I added comments better explaining how the test works. Is this okay to merge? |
||
|
||
// make sure reads are not changed due to finalizeRegion() | ||
// make sure that the original reads are not changed due to finalizeRegion() | ||
Assert.assertTrue(reads.get(0).convertToSAMRecord(header).equals(orgRead0)); | ||
Assert.assertTrue(reads.get(1).convertToSAMRecord(header).equals(orgRead1)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
copy()
method performs a shallow copy -- it ends up callingSAMRecord.clone()
, which copies all fields in the SAMRecord as if by assignment (the exception is the attributes array, which is explicitly copied). So thebyte[]
arrays for the bases and quals point to the same memory location in the copy as in the original. Is this a problem? Do we later modify the bases/quals in-place somewhere, or do we always copy the bases/quals arrays due to the defensive copies inSAMRecordToGATKReadAdapter
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do modify them later, specifically when we call
cleanOverlappingReadPairs()
we modify the base qualities in place for reads that overlap and if any of those reads have not been clipped bby the clipping code here this could result in the next assembly region having invalid/wrong base qualities for the same read. Given how that method is structured it is non-trivial to refactor it to make the copy only in the event it needs to be modified and it seemed easier to just put a check in to make sure that every read is deeply copied at least once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery I disagree with your assessment that
cleanOverlappingReadPairs()
changes the bases/quals in-place -- it callsgetBases()
andgetBaseQualities()
on the read, which perform defensive copies. The question is: is there any code that callsgetBasesNoCopy()
and/orgetBaseQualitiesNoCopy()
and truly modifies the bases/quals arrays in place?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the
PileupReadErrorCorrector
callsgetBaseQualitiesNoCopy()
andgetBasesNoCopy()
and then modifies the bases/quals in-place. This could have the effect of modifying the base/qual arrays in the original reads that will be used in subsequent assembly regions. I think as part of this PR we should patchPileupReadErrorCorrector
to call the getters that make copies, and then call the setter methods to update the bases/quals.We should also check for additional problematic usages of the
*NoCopy()
methods in the HC/M2 codepaths.