Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added IntervalMergingRule.NONE as an advanced use case #5887

Merged
merged 5 commits into from
Dec 20, 2019

Conversation

jamesemery
Copy link
Collaborator

Added the ability to specify IntervalMergingRule.NONE so so that no merging is performed. Also added the ability to request from IntervalArgumentCollection the unmerged user intervals. I have not solved the underlying issue that the GenomeLocSet requires non-overlapping intervals, though I acknowledge that replacing or refactoring that class is the long term solution to this problem.

@jamesemery jamesemery requested a review from droazen April 16, 2019 21:41
@codecov-io
Copy link

codecov-io commented Apr 16, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@dcff818). Click here to learn what that means.
The diff coverage is 92.708%.

@@            Coverage Diff             @@
##             master     #5887   +/-   ##
==========================================
  Coverage          ?   86.825%           
  Complexity        ?     32305           
==========================================
  Files             ?      1991           
  Lines             ?    149187           
  Branches          ?     16484           
==========================================
  Hits              ?    129531           
  Misses            ?     13641           
  Partials          ?      6015
Impacted Files Coverage Δ Complexity Δ
...titute/hellbender/utils/IntervalUtilsUnitTest.java 91.906% <100%> (ø) 146 <0> (?)
...nstitute/hellbender/utils/IntervalMergingRule.java 100% <100%> (ø) 1 <0> (?)
...broadinstitute/hellbender/utils/IntervalUtils.java 92.083% <100%> (ø) 192 <4> (?)
...rgumentcollections/IntervalArgumentCollection.java 89.063% <88.636%> (ø) 24 <1> (?)
...entcollections/IntervalArgumentCollectionTest.java 90.411% <94.118%> (ø) 26 <5> (?)

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @jamesemery with my comments

@@ -1115,6 +1115,9 @@ public void testUnmergedIntervals(String unmergedIntervals) {
merged = IntervalUtils.mergeIntervalLocations(locs, IntervalMergingRule.ALL);
Assert.assertEquals(merged.size(), 1);

merged = IntervalUtils.mergeIntervalLocations(locs, IntervalMergingRule.NONE);
Assert.assertEquals(merged.size(), 2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you choose to keep the IntervalUtils.loadIntervalsNonMerging() method around, you'll need to add direct unit tests for it here.

@droazen droazen assigned jamesemery and unassigned droazen Dec 19, 2019
@droazen
Copy link
Contributor

droazen commented Dec 19, 2019

@lbergelson In your opinion, how likely is this feature to cause problems? We do still call QueryInterval.optimizeIntervals() to merge intervals in ReadsDataSource before starting an iteration, and I think that's the main example of an HTSJDK query interface that can't handle overlapping intervals.

@droazen
Copy link
Contributor

droazen commented Dec 19, 2019

@jamesemery As discussed in person, I think we should remove NONE as a merging rule, and just keep the IntervalArgumentCollection.getSpecifiedIntervalsWithoutMerging() method around for tools that need it.

@lbergelson
Copy link
Member

I think this will not be exactly what people expect. I think if I saw this, I would expect that I would get apply calls for each position that was in the non-merged intervals. I.e. I would get multiple applies at the same position if I had overlapping intervals. I would like to have a MergingRule that does that, but if we don't take care to make the read queries individual to each interval then this will be confusing.

@jamesemery
Copy link
Collaborator Author

Alright, I think I am in agreement with you @lbergelson about this behavior. Furthermore, all I needed out of this branch was an exposed mechanism for getting back the un-merged intervals so that I can track them myself in subtools. To that end I think I'm going to keep this branch and its tests and get rid of the merging rule argument in favor of leaving the logic for merging in place to be accessed by tools.

@jamesemery jamesemery assigned droazen and unassigned jamesemery Dec 19, 2019
@jamesemery
Copy link
Collaborator Author

@droazen back to you

@jamesemery jamesemery force-pushed the je_allowUnmergedIntervals branch from d3c9ee4 to a88b534 Compare December 19, 2019 21:27
*/
public List<SimpleInterval> getIntervalsWithoutMerging(final SAMSequenceDictionary sequenceDict ) {
if (!intervalsSpecified()) {
throw new GATKException("Cannot call getIntervalsWithoutMerging() without specifying either intervals to include or exclude.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check whether getIntervalStrings() is non-empty to make sure that there are intervals to include. The error message needs to be adjusted as well:

"without specifying either intervals to include or exclude" -> " without specifying intervals to include"

Should the method also throw if there are exclusion intervals specified, or should it just silently ignore them? I leave that up to you...

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Found some lingering issues in the test code. We can merge this after they're addressed.

new Object[]{Arrays.asList("1:1-2", "2:1-2"), 0, intervalStringsToGenomeLocs("1:1-2", "2:1-2")},
new Object[]{Arrays.asList("1:1-10", "1:5-15"), 0, intervalStringsToGenomeLocs("1:1-10", "1:5-15")},
new Object[]{Arrays.asList("1:5-5"), 5, intervalStringsToGenomeLocs("1:1-10")},
new Object[]{Arrays.asList(severalIntervals), 0, intervalStringsToGenomeLocs("1:100-200", "2:20-30", "4:50")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one of these test cases has an overlapping interval, which seems inadequate given the nature of the method being tested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is adequately tested in the other class (which is just directly calling into this method which does all of the actual work). I will add a few more redundant cases here but I am confident that those tests are checking what I actually want.

@droazen droazen assigned jamesemery and unassigned droazen Dec 19, 2019
@gatk-bot
Copy link

gatk-bot commented Dec 19, 2019

Travis reported job failures from build 28430
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28430.13 logs
unit openjdk8 28430.3 logs

@gatk-bot
Copy link

gatk-bot commented Dec 19, 2019

Travis reported job failures from build 28433
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28433.13 logs
unit openjdk8 28433.3 logs

@gatk-bot
Copy link

gatk-bot commented Dec 19, 2019

Travis reported job failures from build 28432
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28432.13 logs
unit openjdk8 28432.3 logs

@jamesemery
Copy link
Collaborator Author

@droazen I resolved your comments and fixed the test I broke. Can this be merged so we can move forward on that other branch?

@jamesemery jamesemery removed their assignment Dec 20, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final comment/question @jamesemery, then this is good to go.

*/
public List<SimpleInterval> getIntervalsWithoutMerging(final SAMSequenceDictionary sequenceDict ) {
if (getIntervalStrings().isEmpty() ) {
throw new GATKException("Cannot call getIntervalsWithoutMerging() without specifying intervals to include.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular getIntervals() method uses the reference dictionary to construct a set of intervals when getIntervalStrings() is empty, rather than throwing an exception. Should this method do the same for the sake of consistency? (ie., return GenomeLocSortedSet.createSetFromSequenceDictionary(sequenceDict).toList() in that case?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could potentially, for the purposes of Depth of Coverage I do not need this functionality as it requires intervals be specified. If i need to change it perhaps I will switch it there? I would really like to get this in so Depth of Coverage can be looked at.

@jamesemery jamesemery merged commit a000a87 into master Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants