-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated MarkDuplicates code to rely on the Picard OpticalDuplicatesFinder #4750
Conversation
@@ -86,6 +92,9 @@ public int getIndex() { | |||
|
|||
final JavaPairRDD<String, Iterable<IndexPair<GATKRead>>> keyedReads = getReadsGroupedByName(header, mappedReads, numReducers); | |||
|
|||
//TODO why in the world does a JavaRDD return a normal sparkcontext instead of the nice wrapped one...? | |||
final Broadcast<Map<String, Short>> headerReadGroupIndexMap = new JavaSparkContext(reads.context()).broadcast( getHeaderReadGroupIndexMap(header)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use JavaSparkContext.fromSparkContext
instead I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
prevIdx = i + 1; | ||
} | ||
} | ||
if (prevIdx < readName.length()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAY deleting code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah!
|
||
@Override | ||
public void setY(final short y) { this.y = y; } | ||
public void setY(final int y) { this.y = (short)y; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add checks here that it's not > maxshort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
/** | ||
* An intermediate class flagged as being serializable so that the Picard OpticalDuplicateFinder can be serialized despite | ||
* it not being marked as such. There is not state outside of construction at the time of creating this pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second sentence is more confusing than helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery A few minor changes and typos. Looks good though. Surprisingly little change.
@@ -142,11 +142,11 @@ | |||
/** | |||
* Little class to hold the sequence of a pair of reads and tile location information. | |||
*/ | |||
static class PairedReadSequence implements OpticalDuplicateFinder.PhysicalLocation { | |||
static class PairedReadSequence implements picard.sam.util.PhysicalLocation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the full path here?
@@ -1,7 +1,7 @@ | |||
package org.broadinstitute.hellbender.utils.read.markduplicates; | |||
|
|||
/** Little struct-like class to hold read pair (and fragment) end data for duplicate marking. */ | |||
public abstract class ReadEnds implements OpticalDuplicateFinder.PhysicalLocation { | |||
public abstract class ReadEnds implements picard.sam.util.PhysicalLocation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shorten the path?
@@ -20,7 +19,7 @@ | |||
* during the processing step of MarkDuplicatesSpark | |||
*/ | |||
@DefaultSerializer(Pair.Serializer.class) | |||
public final class Pair extends PairedEnds implements OpticalDuplicateFinder.PhysicalLocation { | |||
public final class Pair extends PairedEnds implements picard.sam.util.PhysicalLocation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short path?
@@ -225,6 +230,11 @@ public void addMatePair(final String readName, | |||
record2.setAttribute("MC", null); | |||
} | |||
|
|||
if (readGorup!=null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public void testOpticalDuplicatesDifferentReadGroups() { | ||
final AbstractMarkDuplicatesTester tester = getTester(); | ||
tester.setExpectedOpticalDuplicate(0); | ||
tester.addMatePair("RUNID:7:1203:2886:82292", 19, 19, 485253, 485253, false, false, true, true, "42M59S", "59S42M", true, false, false, false, false, DEFAULT_BASE_QUALITY, "H0164.2"); // duplicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate these 600 parameter methods... but I'm not sure what to do about them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a bunch of overloads that help the problem. For the most par every field is necessary to at least one of these tests.
@lbergelson Responded to your comments. |
Codecov Report
@@ Coverage Diff @@
## master #4750 +/- ##
===============================================
+ Coverage 80.052% 80.121% +0.069%
- Complexity 17413 17492 +79
===============================================
Files 1080 1080
Lines 63099 63440 +341
Branches 10196 10275 +79
===============================================
+ Hits 50512 50829 +317
- Misses 8597 8607 +10
- Partials 3990 4004 +14
|
abc1ca6
to
8971ef6
Compare
@lbergelson Is this branch good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery Looks like there are a few leftover bits from the short era that need to be updated
@Override | ||
public void setX(final short x) { this.x = x; } | ||
public void setX(final int x) { this.x = (short)x; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to remove the casts to short when you updated this to int
@@ -24,7 +25,7 @@ | |||
// Needed for testing | |||
public void setupOpticalDuplicateFinder() { | |||
this.opticalDuplicateFinder = new OpticalDuplicateFinder(opticalDuplicatesArgumentCollection.READ_NAME_REGEX, | |||
opticalDuplicatesArgumentCollection.OPTICAL_DUPLICATE_PIXEL_DISTANCE, logger); | |||
opticalDuplicatesArgumentCollection.OPTICAL_DUPLICATE_PIXEL_DISTANCE,null);//TODO logger firgure out, logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still has this typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson right, so i don't know the answer to this. Switiching over to the picard version of the optical duplicate finder means that we are no longer processing optical duplicates with our logger but the picard one, which happens to not be serializable. Should I just give it a null logger and call it a day then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null logger is fine for now . Could you open a ticket to enable us to pass a logger into it so we remember to look at it again?
|
||
@Override | ||
public void setY(final short y) { this.y = y; } | ||
public void setY(final int y) { this.y = (short) y; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover cast to short here
@@ -34,7 +34,7 @@ | |||
private final int score; | |||
|
|||
// Information used to detect optical dupes | |||
private transient short readGroup = -1; | |||
private short readGroupIndex = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a line break between this and the transient ones?
@@ -92,7 +92,7 @@ private Pair(Kryo kryo, Input input){ | |||
super(input.readInt(true), input.readString()); | |||
|
|||
// Information used to detect optical dupes | |||
readGroup = -1; | |||
readGroupIndex = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line should be removed?
@Override | ||
public void setY(final short y) { this.y = y; } | ||
public void setY(final int y) { this.y = (short)y; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@jamesemery We have test failures again https://storage.googleapis.com/hellbender-test-logs/build_reports/master_19079.2/tests/test/index.html |
…ances in that test should not have been considered close...
@lbergelson Yeah, so the test was broken because it was assuming that the pairedEnds y value was getting truncated. Now that test is running on two reads that are ACTUALLY close enough for comparison. |
final Short readGroup = headerReadGroupIndexMap.getValue().get(firstRead.getValue().getReadGroup()); | ||
final Pair pair = MarkDuplicatesSparkRecord.newPair(firstRead, secondRead.getValue(), header, secondRead.getIndex(), scoringStrategy); | ||
// Validate and add the read group to the pair | ||
final Short readGroup = headerReadGroupIndexMap.getValue().get(firstRead.getReadGroup()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add 2 simple tests that show that it explodes with the right errors on each of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, merge when tests are green
…pringing up in new places
Fixes #4700