Contig stratification should defer to user-defined intervals #7238

bbimber · 2021-04-30T13:10:51Z

@cmnbroad I updated VariantQC and identified one minor difference in behavior associated with VariantEvalEngine. Contig stratification assigns level based on all the contigs. If user-supplied contigs are given, it should defer to these. This PR addresses this, and adds a test case.

Note: I put the getContigNames() method into VariantEvalEngine, but it would also be possible to keep this in Config, but expose a getter for userSuppliedIntervals. It seemed marginally better to keep that private.

bbimber · 2021-04-30T14:42:15Z

good luck on the test this time...passed first round

cmnbroad · 2021-05-03T13:04:59Z

...java/org/broadinstitute/hellbender/tools/walkers/varianteval/VariantEvalIntegrationTest.java

+        );
+        spec.executeTest(name, this);
+    }
+


These two tests are identical except for the file name and interval, so they can be replaced with a single test that uses a @DataProvider. That will also nicely document the test files contents. Also please add a test case with no intervals specified.

Not sure if you've used data providers before - if not let me know and I'll add one here.

yes, i've used them. i reworked the test so it doesnt need the static test files anymore. i dont know what GATK's vision is in terms of having the full output checked in vs. testing specific features of the output. most of VariantEvalIntegration test uses the former since it was ported from GATK3, but i assume you dont want infinite test outputs checked in.

This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.

cmnbroad · 2021-05-03T13:07:32Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/varianteval/VariantEvalEngine.java

+
+        return new ArrayList<>(contigs);
+    }
+


This transformation seems pretty specific to the contig strat class, so I think it would make more sense to keep it there, and instead expose an intervals getter on VariantEvalEngine.

bbimber · 2021-05-03T14:56:07Z

@cmnbroad OK. I believe this covers the comments. let's hope we dont get docker pull issues

bbimber · 2021-05-03T15:59:17Z

FWIW, this failure seems like an I/O issue. I cant restart, but unless I'm missing something maybe that would fix it:

requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, 'ECONNRESET')")', OSError("(104, 'ECONNRESET')"))

https://travis-ci.com/github/broadinstitute/gatk/jobs/502648703

bbimber · 2021-05-03T17:30:24Z

@cmnbroad Thanks for restarting - we have clean tests now.

cmnbroad · 2021-05-03T18:26:18Z

...java/org/broadinstitute/hellbender/tools/walkers/varianteval/VariantEvalIntegrationTest.java

+        );
+        spec.executeTest(name, this);
+    }
+


This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.

cmnbroad · 2021-05-03T18:29:50Z

...java/org/broadinstitute/hellbender/tools/walkers/varianteval/VariantEvalIntegrationTest.java

+        tests.add(new Object[]{null, allContigs});
+        tests.add(new Object[]{"2", Collections.singletonList("2")});
+        return tests.toArray(new Object[][]{});
+    }


The original version of these tests was much easier to understand - it just needed a data provider, which for the old version of the tests would just be:

@DataProvider(name = "testContigStratWithIntervals") public Object[][] testContigStratWithIntervals() { return new Object[][] { { "testContigStratWithUserSuppliedIntervals", "-L 1:1-1480226" }, { "testContigStratWithUserSuppliedIntervals2", "-L 1" }, { "testContigStratWithUserSuppliedIntervals3", null }, }; }

discussed below - made the change

cmnbroad · 2021-05-03T18:35:11Z

@bbimber I appreciate the attempt to update the tests, but I think your previous version was much better (modulo use of the data provider).

cmnbroad · 2021-05-03T18:41:05Z

Also, per your comment, static test files are fine, especially when the alternative is a bunch of parsing code.

bbimber · 2021-05-03T18:50:19Z

@cmnbroad I understand that I could have retained a bunch of single-use text files, but it seemed like the more permutations one adds, the less it makes sense to have a separate, very redundant, static text file to check each scenario. There's a ton of VariantContext-related tests that parse the output VCF to test some feature as opposed to checking in a bunch of VCF text files....

While I'll grant the 4th test case I added (where we pass chr 2) isnt especially compelling over just testing chr 1, one could argue more breadth is a good thing here. if you want clarity, pulling that VariantEval report parsing code into a method called extractUniqueContigsFromEvalReport(), or simply adding a comment line, supports this goal.

Anyway, I'm checking in slightly clarified version of this now, simply to get tests running. If you respond to the above, maybe we go with that. In the interest of time, I'll stage and check in the version which restores the text files and goes that route.

cmnbroad

Looks good now, once tests pass.

cmnbroad · 2021-05-03T19:27:34Z

@bbimber In general I agree that keeping the number of one-off test files to a minimum is good, but unlike the text files we're using here, VCF files have a documented file format, a tested parser, and are intended to be machine readable. So I think including the expected txt files in this case is dramatically better than hand-written parsing code.

bbimber · 2021-05-03T19:32:55Z

I see that argument, but I think one could reasonably argue either way here. I'm really not that invested in this particular test, so I'm happy to check in the files. Thanks for approving the changes.

As you might have seen we got a docker pull limit failure: https://travis-ci.com/github/broadinstitute/gatk/jobs/502703296

cmnbroad · 2021-05-03T19:42:47Z

That build (34048) is irrelevant at this point (34049 is the build thats approved). We'll just let it die. Restarting it would just increase the likelihood that someone else hits the limit.

bbimber · 2021-05-03T20:52:04Z

@cmnbroad yes, you're right. anyway, tests are now clean

cmnbroad · 2021-05-03T21:26:16Z

Done.

bbimber · 2021-05-03T21:28:52Z

excellent - thank you again for the help getting these merged!

cmnbroad · 2021-05-03T21:39:31Z

You're welcome, and thanks for your contributions - we're happy to see external tools like VariantQC getting value out of the GATK framework.

bbimber added 2 commits April 30, 2021 05:54

Contig stratification should defer to user-defined intervals

c805743

Add test case

7874be6

cmnbroad requested changes May 3, 2021

View reviewed changes

Code review

75e470a

cmnbroad requested changes May 3, 2021

View reviewed changes

Split apart code to make intent clearer

7ca616d

bbimber added 2 commits May 3, 2021 12:01

Restore original test style with static files for expected output

f228a3d

No need to create ArrayList

61cd15b

cmnbroad approved these changes May 3, 2021

View reviewed changes

cmnbroad merged commit 8ba9aa4 into broadinstitute:master May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contig stratification should defer to user-defined intervals #7238

Contig stratification should defer to user-defined intervals #7238

bbimber commented Apr 30, 2021

bbimber commented Apr 30, 2021

cmnbroad May 3, 2021

cmnbroad May 3, 2021

bbimber May 3, 2021

cmnbroad May 3, 2021

cmnbroad May 3, 2021

bbimber May 3, 2021

bbimber commented May 3, 2021

bbimber commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad May 3, 2021

cmnbroad May 3, 2021

bbimber May 3, 2021

cmnbroad commented May 3, 2021

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad left a comment

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021 •

edited

Loading

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021

Contig stratification should defer to user-defined intervals #7238

Contig stratification should defer to user-defined intervals #7238

Conversation

bbimber commented Apr 30, 2021

bbimber commented Apr 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbimber commented May 3, 2021

bbimber commented May 3, 2021

bbimber commented May 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad commented May 3, 2021

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad left a comment

Choose a reason for hiding this comment

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021 • edited Loading

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021

bbimber commented May 3, 2021

cmnbroad commented May 3, 2021

cmnbroad commented May 3, 2021 •

edited

Loading