-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More refactoring PDHCE and preparing for joint detection #8492
Conversation
@jamesemery Looks like I got that "glorious commit" to work after all. Now the second-to-last commit will take some scrutiny; the rest are straightforward. |
@davidbenjamin see #8467 where i guess i reviewed this already? Is this substantially different? |
@jamesemery Okay, we'ce got some really good tests for EventGroup clustering and branching now. It's ready for you to take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor oversight in one of the added tests and then some nothing comments. Feel free to merge this when you are satisfied that you've responded to everything. I would recommend that you go through the exercise of testing this in the FE framework before merging it just to be really sure that there is nothing that is going to break on the full dataset.
...der/tools/walkers/haplotypecaller/PartiallyDeterminedHaplotypeComputationEngineUnitTest.java
Show resolved
Hide resolved
...der/tools/walkers/haplotypecaller/PartiallyDeterminedHaplotypeComputationEngineUnitTest.java
Show resolved
Hide resolved
...der/tools/walkers/haplotypecaller/PartiallyDeterminedHaplotypeComputationEngineUnitTest.java
Show resolved
Hide resolved
// TODO: this needs to generalize to a set of determined events or empty if ref is determined | ||
this.determinedEvents = determinedEvents; | ||
|
||
final int minStart = allEventsAtDeterminedLocus.stream().mapToInt(Event::getStart).min().orElse(determinedPosition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a warning for the next stage to be careful about this part of the code... the PDHMM has a few optimizations in it that make this change a little dangerous, first it doesn't actually run for reads that do not overlap the determined event in the PDhaplotype (which means that the arrays might have unpopulated events in them that you REALLY do not want to include in the math anywhere). In theory you can simply extend this to run the genotyper for any event that overlaps any position in the determined extent so this should probably not cause issue but be careful.
* @return | ||
*/ | ||
public List<Set<Event>> setsForBranching(final List<Event> locusEvents, final Set<Event> determinedEvents, final boolean disallowSubsets) { | ||
public List<Set<Event>> eventSetsForPDHaplotypes(final Set<Event> determinedEvents, final List<Event> locusEvents) { | ||
final SmallBitSet locusOverlapSet = overlapSet(locusEvents); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider overlapSet -> intersectVsEventGroup to make it a little more legible what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or perhaps just a comment
…hing more correct
…overlapping SNPs in PD haplotypes
5d13855
to
7ea5494
Compare
Github actions tests reported job failures from actions build 6243495413
|
Functional equivalence results are unchanged, as they should be. I will merge now. |
* streamlining EventGroup code * PartiallyDeterminedHaplotype can handle multiple determined events * Branching in terms of included events, not excluded events * unit tests for event group clustering and branching
@jamesemery here's another one for you.