-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and test for calculating CombineGVCFs intermediate band interval start locations. #4681
Conversation
b7cb45c
to
666cef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment
@@ -157,8 +157,14 @@ void createIntermediateVariants(SimpleInterval intervalToClose) { | |||
|
|||
// Break up the GVCF according to the provided reference blocking scheme | |||
if ( multipleAtWhichToBreakBands > 0) { | |||
for (int i = ((intervalToClose.getStart())/multipleAtWhichToBreakBands)*multipleAtWhichToBreakBands; i <= intervalToClose.getEnd(); i+=multipleAtWhichToBreakBands) { | |||
sitesToStop.add(i-1); // Subtract 1 here because we want to split before this base | |||
// if the intermediate interval to close starts before the end of the first interval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a perfectly reasonable fix but i would probably add an extra layer of protection here. Below where sitesToStop is converted into intervalLocs you should also add a check that the intervals generated by this or the other break code are not going to be invalid. Also i think this is still broken with multipleAtWhichToBreakBands = 1 because you get a start position = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery I'm not clear on what kind of interval validation you think we should add ? If we ever do still create an invalid interval it would fail the same way it does now, which I think is the outcome we want. You're right about the start position 0 - I changed the banded stop site calculation, and wound up factoring it out into a separate method and added unit tests, since fine resolution can result in large test files, and I wanted to test a number of cases.
//Test that trailing reference blocks are emitted with correct intervals | ||
{new File[]{getTestFile("gvcfExample1WithTrailingReferenceBlocks.g.vcf"), getTestFile("gvcfExample2WithTrailingReferenceBlocks.g.vcf")}, getTestFile("gvcfWithTrailingReferenceBlocksExpected.g.vcf"), NO_EXTRA_ARGS, b38_reference_20_21}, | ||
// same test as the previous one, except with a band multiple specified | ||
{new File[]{getTestFile("gvcfExample1WithTrailingReferenceBlocks.g.vcf"), getTestFile("gvcfExample2WithTrailingReferenceBlocks.g.vcf")}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test with either/both of useBPResolution or bands of size 1 to demonstrate that there isn't a band being made at 0.
3fa25a2
to
22dc1b2
Compare
Codecov Report
@@ Coverage Diff @@
## master #4681 +/- ##
===============================================
+ Coverage 79.845% 79.966% +0.121%
- Complexity 17338 17444 +106
===============================================
Files 1074 1074
Lines 62956 63296 +340
Branches 10189 10290 +101
===============================================
+ Hits 50267 50615 +348
+ Misses 8707 8694 -13
- Partials 3982 3987 +5
|
667ba35
to
e33b5d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two very very minor comments that you could probably just ignore and merge.
{ new SimpleInterval("contig", 10, 10), 5, Arrays.asList(9) }, | ||
|
||
// TODO ?? is this the correct result ? | ||
{ new SimpleInterval("contig", 10, 10), 10, Arrays.asList(9) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks reasonably correct to me, how does it significantly differ from the above two? These all get filtered out after this method anyway before we actually build the intervals so its not an issue for the output list to have a result outside of the interval space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ehhh... i don't consider this critical or even important at all, but all of these tests that return Collections.EMPTY_LIST as a result are only ever failing on the first breakpoint interval, which technically goes through a new codepath. If you wanted you could add a test { new SimpleInterval("contig", 110, 120), 100, Collections.EMPTY_LIST }, to be more through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on the first comment. I view the purpose of this test as validating that we don't create invalid intermediate intervals at this stage of the calculation, since thats what was originally failing - I think the fact that they get filtered out later is ok. Ideally we'd have additional unit tests for the subsequent parts of the calculation, but they require much more state.
I think the additional test you mentioned should return {"99"}, not the empty list, for this method. Again, that stop site will get filtered later as its outside of the interval. Its testing the case where the band multiple is smaller than the start of the close interval, which is essentially the same as the { new SimpleInterval("contig", 10, 20), 5, Arrays.asList(9, 14, 19) }
test. Doesn't hurt to add it though.
e33b5d5
to
da3c898
Compare
da3c898
to
4efbeb4
Compare
Fixes #4672.