Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contig stratification should defer to user-defined intervals #7238

Merged
merged 6 commits into from
May 3, 2021

Conversation

bbimber
Copy link
Contributor

@bbimber bbimber commented Apr 30, 2021

@cmnbroad I updated VariantQC and identified one minor difference in behavior associated with VariantEvalEngine. Contig stratification assigns level based on all the contigs. If user-supplied contigs are given, it should defer to these. This PR addresses this, and adds a test case.

Note: I put the getContigNames() method into VariantEvalEngine, but it would also be possible to keep this in Config, but expose a getter for userSuppliedIntervals. It seemed marginally better to keep that private.

@bbimber
Copy link
Contributor Author

bbimber commented Apr 30, 2021

good luck on the test this time...passed first round

);
spec.executeTest(name, this);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests are identical except for the file name and interval, so they can be replaced with a single test that uses a @DataProvider. That will also nicely document the test files contents. Also please add a test case with no intervals specified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you've used data providers before - if not let me know and I'll add one here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i've used them. i reworked the test so it doesnt need the static test files anymore. i dont know what GATK's vision is in terms of having the full output checked in vs. testing specific features of the output. most of VariantEvalIntegration test uses the former since it was ported from GATK3, but i assume you dont want infinite test outputs checked in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.


return new ArrayList<>(contigs);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation seems pretty specific to the contig strat class, so I think it would make more sense to keep it there, and instead expose an intervals getter on VariantEvalEngine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

@cmnbroad OK. I believe this covers the comments. let's hope we dont get docker pull issues

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

FWIW, this failure seems like an I/O issue. I cant restart, but unless I'm missing something maybe that would fix it:

requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, 'ECONNRESET')")', OSError("(104, 'ECONNRESET')"))

https://travis-ci.com/github/broadinstitute/gatk/jobs/502648703

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

@cmnbroad Thanks for restarting - we have clean tests now.

);
spec.executeTest(name, this);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much more convoluted and error-prone than the previous version, which was much easier to comprehend.

tests.add(new Object[]{null, allContigs});
tests.add(new Object[]{"2", Collections.singletonList("2")});
return tests.toArray(new Object[][]{});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version of these tests was much easier to understand - it just needed a data provider, which for the old version of the tests would just be:

    @DataProvider(name = "testContigStratWithIntervals")
    public Object[][] testContigStratWithIntervals() {
        return new Object[][] {
                { "testContigStratWithUserSuppliedIntervals", "-L 1:1-1480226" },
                { "testContigStratWithUserSuppliedIntervals2", "-L 1" },
                { "testContigStratWithUserSuppliedIntervals3", null },
        };
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed below - made the change

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

@bbimber I appreciate the attempt to update the tests, but I think your previous version was much better (modulo use of the data provider).

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

Also, per your comment, static test files are fine, especially when the alternative is a bunch of parsing code.

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

@cmnbroad I understand that I could have retained a bunch of single-use text files, but it seemed like the more permutations one adds, the less it makes sense to have a separate, very redundant, static text file to check each scenario. There's a ton of VariantContext-related tests that parse the output VCF to test some feature as opposed to checking in a bunch of VCF text files....

While I'll grant the 4th test case I added (where we pass chr 2) isnt especially compelling over just testing chr 1, one could argue more breadth is a good thing here. if you want clarity, pulling that VariantEval report parsing code into a method called extractUniqueContigsFromEvalReport(), or simply adding a comment line, supports this goal.

Anyway, I'm checking in slightly clarified version of this now, simply to get tests running. If you respond to the above, maybe we go with that. In the interest of time, I'll stage and check in the version which restores the text files and goes that route.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, once tests pass.

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

@bbimber In general I agree that keeping the number of one-off test files to a minimum is good, but unlike the text files we're using here, VCF files have a documented file format, a tested parser, and are intended to be machine readable. So I think including the expected txt files in this case is dramatically better than hand-written parsing code.

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

I see that argument, but I think one could reasonably argue either way here. I'm really not that invested in this particular test, so I'm happy to check in the files. Thanks for approving the changes.

As you might have seen we got a docker pull limit failure: https://travis-ci.com/github/broadinstitute/gatk/jobs/502703296

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

That build (34048) is irrelevant at this point (34049 is the build thats approved). We'll just let it die. Restarting it would just increase the likelihood that someone else hits the limit.

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

@cmnbroad yes, you're right. anyway, tests are now clean

@cmnbroad cmnbroad merged commit 8ba9aa4 into broadinstitute:master May 3, 2021
@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

Done.

@bbimber
Copy link
Contributor Author

bbimber commented May 3, 2021

excellent - thank you again for the help getting these merged!

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 3, 2021

You're welcome, and thanks for your contributions - we're happy to see external tools like VariantQC getting value out of the GATK framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants