-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated MarkDuplicates to use Picard metrics code #4779
Conversation
* Metrics that are calculated during the process of marking duplicates | ||
* within a stream of SAMRecords. | ||
*/ | ||
@SuppressWarnings("serial") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of suppressing the serial warning we should add the serialVersionUID. We've had problems in the past due to not including it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public void updateMetrics(SAMRecord rec) { | ||
update(rec.getReadUnmappedFlag(), rec.isSecondaryOrSupplementary(), ReadUtils.readHasMappedMate(rec), rec.getDuplicateReadFlag() ); | ||
} | ||
public void updateMetrics(GATKRead read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a linebreak
@@ -6,7 +6,7 @@ | |||
|
|||
import java.io.*; | |||
|
|||
/** Coded for ReadEnds that just outputs the primitive fields and reads them back. */ | |||
/** Codec for ReadEnds that just outputs the primitive fields and reads them back. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -235,28 +237,40 @@ public void testTwoMappedPairsAndMappedSecondaryFragment() { | |||
tester.addMappedPair(1, 1, 100, false, false, ELIGIBLE_BASE_QUALITY); | |||
tester.addMappedPair(1, 1, 100, true, true, DEFAULT_BASE_QUALITY); // duplicate!!! | |||
tester.addMappedFragment(1, 200, false, DEFAULT_BASE_QUALITY, true); // mapped non-primary fragment | |||
if (this instanceof MarkDuplicatesSparkIntegrationTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like these instance of checks . Maybe instead of the instanceof, we can add a method to AbstractMarkDuplicatesTester.doNotMarkUnmappedMates()
that will be overriden by the the spark tester to provide the commandline argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an overridden method on AbstractMarkDuplicatesTester
to handle it
tester.addMappedFragment(1, 1, false, ELIGIBLE_BASE_QUALITY); | ||
tester.addMatePair(1, 200, 0, false, true, false, false, "54M22S", null, false, false, true, true, false, DEFAULT_BASE_QUALITY); | ||
if (this instanceof MarkDuplicatesSparkIntegrationTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery A few nitpicks with my own code...
@lbergelson Responded to your comments |
@@ -42,6 +43,11 @@ protected CommandLineProgram getCommandLineProgramInstance() { | |||
return new MarkDuplicatesSpark(); | |||
} | |||
|
|||
@Override | |||
protected String[] getExtraArguments() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, lets just set this always in the tester
Codecov Report
@@ Coverage Diff @@
## master #4779 +/- ##
==============================================
+ Coverage 80.13% 80.225% +0.095%
- Complexity 17438 17742 +304
==============================================
Files 1082 1085 +3
Lines 63150 63834 +684
Branches 10187 10353 +166
==============================================
+ Hits 50602 51211 +609
- Misses 8564 8606 +42
- Partials 3984 4017 +33
|
moving AbstractMarkDuplicateTester to test sources resolves #3762 updating tests to include new tests and changes tests not passing, need to update old gatk test files changed from hellbender...DuplicateMetrics -> hellbender...GATKDuplicateMetrics which extends picard...DuplicateMetrics porting bug fixes to LibraryIdGenerator from picard updating several metrics files with the newest version of picard the lbiraries in the metrics files are then sorted in alphabetical order changes, saving state add addtional test fixes
@lbergelson Are these changes good for you? I would like to get these changes in soonish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@fleharty Did you have any review comments? |
I do, haven't finished yet.
…On Mon, May 21, 2018, 4:51 PM Louis Bergelson ***@***.***> wrote:
@fleharty <https://github.com/fleharty> Did you have any review comments?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4779 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKN9DQy1ANY3yfmcfR5Ydvkgj3-SrQ60ks5t0yjLgaJpZM4UB_c2>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery Mostly nitpicks here. I approved since this is just really putting in finals and modifying whitespace.
@@ -1,5 +1,6 @@ | |||
package org.broadinstitute.hellbender.tools.spark.transforms.markduplicates; | |||
|
|||
import com.google.common.collect.Iterators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import appears to not be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it isn't your code, but can you remove the other two rouge imports....
line 21
import org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource;
and
import picard.sam.markduplicates.util.OpticalDuplicateFinder;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return Utils.stream(readsIter).peek(read -> { | ||
return Utils.stream(readsIter) | ||
.peek(read -> read.setIsDuplicate(false)) | ||
.peek(read -> { | ||
// Handle reads that have been marked as non-duplicates (which also get tagged with optical duplicate summary statistics) | ||
if( namesOfNonDuplicateReadsAndOpticalCounts.containsKey(read.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing around if appears incorrect
@@ -122,8 +126,10 @@ | |||
read.setIsDuplicate(false); | |||
// Everything else is a duplicate | |||
} else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after else
@@ -323,6 +323,14 @@ public static boolean readHasMappedMate( final GATKRead read ) { | |||
return read.isPaired() && ! read.mateIsUnmapped(); | |||
} | |||
|
|||
/** | |||
* @param read read to check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds confusing, could you rephrase?
* @param read read to check | ||
* @return true if the read is paired and has a mapped mate, otherwise false | ||
*/ | ||
public static boolean readHasMappedMate( final SAMRecord read ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do spaces go around the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats a good question, just looking in this file alone it seems to be wholly inconsistent one way or another. I'm not going to bother changing it because it would imply that there was some principled reason for me to unify ALL the formatting in this file at least which I don't necessarily want to do in this PR.
private void update(boolean readUnmappedFlag, boolean secondaryOrSupplementary, boolean mappedMate, boolean isDuplicate) { | ||
if (readUnmappedFlag) { | ||
++this.UNMAPPED_READS; | ||
} else if(secondaryOrSupplementary) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after if
|
||
public Histogram<Short> getOpticalDuplicatesByLibraryIdMap() { return this.opticalDuplicatesByLibraryId; } | ||
|
||
public static String getReadGroupLibraryName(SAMReadGroupRecord readGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final SAMReadGroupRecord?
@@ -23,17 +24,29 @@ | |||
private boolean deleteOnExit = true; | |||
protected final List<String> args = new ArrayList<>(); | |||
|
|||
public SamFileTester(final int readLength, final boolean deleteOnExit, final int defaultChromosomeLength, final DuplicateScoringStrategy.ScoringStrategy duplicateScoringStrategy) { | |||
public SamFileTester(final int readLength, final boolean deleteOnExit, final int defaultChromosomeLength, final DuplicateScoringStrategy.ScoringStrategy duplicateScoringStrategy, final SAMFileHeader.SortOrder sortOrder, boolean recordsNeedSorting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final boolean?
public abstract class AbstractMarkDuplicatesTester extends SamFileTester { | ||
|
||
final private File metricsFile; | ||
final DuplicationMetrics expectedMetrics; | ||
|
||
public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy) { | ||
super(50, true, SAMRecordSetBuilder.DEFAULT_CHROMOSOME_LENGTH, duplicateScoringStrategy); | ||
public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy, SAMFileHeader.SortOrder sortOrder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing finals?
this(duplicateScoringStrategy, sortOrder, true); | ||
} | ||
|
||
public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy, SAMFileHeader.SortOrder sortOrder, boolean recordNeedSorting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing finals?
@fleharty Thank you for the review. |
@fleharty The reason for the "read read" has to do with the javadocs. The first one is a flag indicating the argument name for documentation purposes. The second one is just the description, which should be reasonably formatted in the docs compared to how it looks here on github. |
@jamesemery I see, of course... for some reason my eye didn't catch that the first "read" was the parameter, and the second "read" was part of the description. Thanks for the clarification. |
Fixes #4777