-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify the behavior of getNextReferenceVertex for non-ref paths #4889
Modify the behavior of getNextReferenceVertex for non-ref paths #4889
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4889 +/- ##
===============================================
- Coverage 80.421% 80.164% -0.257%
+ Complexity 17820 17772 -48
===============================================
Files 1089 1089
Lines 64161 64161
Branches 10344 10344
===============================================
- Hits 51599 51434 -165
- Misses 8501 8672 +171
+ Partials 4061 4055 -6
|
@davidbenjamin Are you able to review this? |
@lbergelson Sure! |
final Optional<E> edge = outgoingEdges.stream().filter(e -> !blacklistedEdgeSet.contains(e)).findAny(); | ||
return edge.isPresent() ? getEdgeTarget(edge.get()) : null; | ||
final List<E> edges = outgoingEdges.stream().filter(e -> !blacklistedEdgeSet.contains(e)).limit(2).collect(Collectors.toList()); | ||
return edges.size() == 1 ? getEdgeTarget(edges.get(0)) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the first edge is still essentially random in that it depends on the order in which reads arrived. Why not be principled and traverse the edge with the greatest weight, which seems like a good heuristic if you're trying to arrive back at the reference?
final Optional<E> edge = outgoingEdges.stream().max(Comparator.comparingInt(BaseEdge::getMultiplicity).reversed())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misread the code. But I still think my suggestion to do a greedy search before giving up on finding the reference is reasonable and more robust to things like high depth where you might have lots of low-weight non-reference edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the current behavior is to pick a random edge. The PR reverts to the 3.x behavior, which is to return a non-reference edge iff there 1 non-reference edge.
It would be great to try a more exhaustive search. However, my intuition is that the we are unlikely to make it back to the reference when the paths start forking and the computational cost can be substantial. We can use this PR to revert back to the deterministic logic and save implementation (and evaluation) of a greedy search for a future issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DonFreed I'm fine with the 3.x behavior. Anything fancier can wait for a later thorough revision of the dangling end merging code.
Thank you @DonFreed! @davidbenjamin for reviewing, if you feel we should make the changes you've suggested could you open an issue so we don't forget? |
There was a subtle change in the behavior of getNextReferenceVertex in the migration from GATK 3.x to the GATK 4.x The original 3.x behavior is described in a comment of the function, if @param allowNonRefPaths is true, allow sub-paths that are non-reference if there is only a single outgoing edge. The 4.x migration copied the comment, but changed the behavior to if @param allowNonRefPaths is true, allow sub-paths that are non-reference and pick a random outgoing edge. This pull request restores the 3.x behavior
There was a subtle change in the behavior of getNextReferenceVertex in the migration from GATK 3.x to the GATK 4.x that may be worth another look.
The original 3.x behavior is described in a comment of the function, if
@param allowNonRefPaths
is true, allow sub-paths that are non-reference if there is only a single outgoing edge. The 4.x migration copied the comment, but changed the behavior to if@param allowNonRefPaths
is true, allow sub-paths that are non-reference and pick a random outgoing edge.This pull request restores the 3.x behavior, which we believe is the design intent.