-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contig stratification should defer to user-defined intervals #7238
Conversation
good luck on the test this time...passed first round |
); | ||
spec.executeTest(name, this); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two tests are identical except for the file name and interval, so they can be replaced with a single test that uses a @DataProvider
. That will also nicely document the test files contents. Also please add a test case with no intervals specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you've used data providers before - if not let me know and I'll add one here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i've used them. i reworked the test so it doesnt need the static test files anymore. i dont know what GATK's vision is in terms of having the full output checked in vs. testing specific features of the output. most of VariantEvalIntegration test uses the former since it was ported from GATK3, but i assume you dont want infinite test outputs checked in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.
|
||
return new ArrayList<>(contigs); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transformation seems pretty specific to the contig strat class, so I think it would make more sense to keep it there, and instead expose an intervals getter on VariantEvalEngine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@cmnbroad OK. I believe this covers the comments. let's hope we dont get docker pull issues |
FWIW, this failure seems like an I/O issue. I cant restart, but unless I'm missing something maybe that would fix it: requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, 'ECONNRESET')")', OSError("(104, 'ECONNRESET')")) https://travis-ci.com/github/broadinstitute/gatk/jobs/502648703 |
@cmnbroad Thanks for restarting - we have clean tests now. |
); | ||
spec.executeTest(name, this); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.
tests.add(new Object[]{null, allContigs}); | ||
tests.add(new Object[]{"2", Collections.singletonList("2")}); | ||
return tests.toArray(new Object[][]{}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version of these tests was much easier to understand - it just needed a data provider, which for the old version of the tests would just be:
@DataProvider(name = "testContigStratWithIntervals")
public Object[][] testContigStratWithIntervals() {
return new Object[][] {
{ "testContigStratWithUserSuppliedIntervals", "-L 1:1-1480226" },
{ "testContigStratWithUserSuppliedIntervals2", "-L 1" },
{ "testContigStratWithUserSuppliedIntervals3", null },
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed below - made the change
@bbimber I appreciate the attempt to update the tests, but I think your previous version was much better (modulo use of the data provider). |
Also, per your comment, static test files are fine, especially when the alternative is a bunch of parsing code. |
@cmnbroad I understand that I could have retained a bunch of single-use text files, but it seemed like the more permutations one adds, the less it makes sense to have a separate, very redundant, static text file to check each scenario. There's a ton of VariantContext-related tests that parse the output VCF to test some feature as opposed to checking in a bunch of VCF text files.... While I'll grant the 4th test case I added (where we pass chr 2) isnt especially compelling over just testing chr 1, one could argue more breadth is a good thing here. if you want clarity, pulling that VariantEval report parsing code into a method called extractUniqueContigsFromEvalReport(), or simply adding a comment line, supports this goal. Anyway, I'm checking in slightly clarified version of this now, simply to get tests running. If you respond to the above, maybe we go with that. In the interest of time, I'll stage and check in the version which restores the text files and goes that route. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, once tests pass.
@bbimber In general I agree that keeping the number of one-off test files to a minimum is good, but unlike the text files we're using here, VCF files have a documented file format, a tested parser, and are intended to be machine readable. So I think including the expected txt files in this case is dramatically better than hand-written parsing code. |
I see that argument, but I think one could reasonably argue either way here. I'm really not that invested in this particular test, so I'm happy to check in the files. Thanks for approving the changes. As you might have seen we got a docker pull limit failure: https://travis-ci.com/github/broadinstitute/gatk/jobs/502703296 |
That build (34048) is irrelevant at this point (34049 is the build thats approved). We'll just let it die. Restarting it would just increase the likelihood that someone else hits the limit. |
@cmnbroad yes, you're right. anyway, tests are now clean |
Done. |
excellent - thank you again for the help getting these merged! |
You're welcome, and thanks for your contributions - we're happy to see external tools like VariantQC getting value out of the GATK framework. |
@cmnbroad I updated VariantQC and identified one minor difference in behavior associated with VariantEvalEngine. Contig stratification assigns level based on all the contigs. If user-supplied contigs are given, it should defer to these. This PR addresses this, and adds a test case.
Note: I put the getContigNames() method into VariantEvalEngine, but it would also be possible to keep this in Config, but expose a getter for userSuppliedIntervals. It seemed marginally better to keep that private.