-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a new suite of tools for variant filtering based on site-level annotations. #7954
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7954 +/- ##
===============================================
- Coverage 86.689% 86.659% -0.030%
- Complexity 38394 38771 +377
===============================================
Files 2308 2328 +20
Lines 180119 181659 +1540
Branches 19823 19946 +123
===============================================
+ Hits 156143 157424 +1281
- Misses 17036 17239 +203
- Partials 6940 6996 +56
|
* adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments
1cbd55b
to
3e00758
Compare
3e00758
to
1e0da0e
Compare
I still need to finish up the tool-level Javadocs for the TrainVariantAnnotationsModel tool. But since I'll be off on vacation until the end of the week, I wanted to go ahead and open this up for review. There's a lot here, but not too much of it is production code (<2k LOC). I've split things up into commits that should hopefully make it more easy to review. The first commit contains the WDL added in #7932 and has already been reviewed by me, although it may benefit from a second pass. The second commit updates that WDL to account for some changes I added after review. There are TODOs scattered throughout the code, but some of them are intentionally left as an exercise for future developers. See the meta issue linked above to get an idea of what might be appropriate to leave to future work. Also note that tools are marked BETA, so there’s certainly room for improvement or changes! There are also stubs throughout for the BGMM implementation, which will be added in a separate PR. Hopefully we can get some ML club reviewers then. @meganshand @droazen @davidbenjamin mind taking a look or suggesting other reviewers? I would hope that we can get this in by the next release after the other flow-based methods are released, since the IsolationForest filtering method added here is also used in that pipeline. It would also be nice to get this merged by the next release to keep us on track on the malaria side. |
1e0da0e
to
122cb18
Compare
Also, if the WDL-generation and tab-completion tests continue to fail, I’ll address it after I get back. I think this has something to do with non-ASCII characters in the Javadocs, but I thought I had gotten all of them… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @samuelklee! I thought the layout of classes was very clear and the testing was thorough. I had a few questions and comments. Also, I noticed a few places where there were temp files to read and write data within the tool and I was surprised that was necessary.
...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotations.java
Outdated
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotations.java
Outdated
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotations.java
Outdated
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotations.java
Outdated
Show resolved
Hide resolved
...institute/hellbender/tools/walkers/vqsr/scalable/ScoreVariantAnnotationsIntegrationTest.java
Show resolved
Hide resolved
...institute/hellbender/tools/walkers/vqsr/scalable/ScoreVariantAnnotationsIntegrationTest.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/vqsr/scalable/TrainVariantAnnotationsModelIntegrationTest.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/vqsr/scalable/TrainVariantAnnotationsModelIntegrationTest.java
Outdated
Show resolved
Hide resolved
...stitute/hellbender/tools/walkers/vqsr/scalable/ExtractVariantAnnotationsIntegrationTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really clean. The documentation is excellent.
src/main/java/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/data/VariantType.java
Outdated
Show resolved
Hide resolved
...g/broadinstitute/hellbender/tools/walkers/vqsr/scalable/LabeledVariantAnnotationsWalker.java
Outdated
Show resolved
Hide resolved
.../java/org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/ScoreVariantAnnotations.java
Outdated
Show resolved
Hide resolved
.../org/broadinstitute/hellbender/tools/walkers/vqsr/scalable/TrainVariantAnnotationsModel.java
Outdated
Show resolved
Hide resolved
32ce261
to
b3d6fec
Compare
OK, thanks for the thorough reviews, @meganshand and @davidbenjamin! I think I've addressed everything or left them as TODOs; I'll break these out into issues (or at least add them to the meta issue) later today. I also added the docs for the training tool. Apologies for the slight delay, I had to get my eyes dilated on Friday morning and was completely useless for the rest of the day! |
b3d6fec
to
d853337
Compare
d853337
to
9de8b1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks @samuelklee! The added documentation for the training tool looks great.
…annotations. (#7954) * Adds wdl that tests joint VCF filtering tools (#7932) * adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments * Added changes for specifying model backend and other tweaks to WDLs and environment. * Added classes for representing a collection of labeled variant annotations. * Added interfaces for modeling and scoring backends. * Added a new suite of tools for variant filtering based on site-level annotations. * Added integration tests. * Added test resources and expected results. * Miscellaneous changes. * Removed non-ASCII characters. * Added documentation for TrainVariantAnnotationsModel and addressed review comments. Co-authored-by: meganshand <mshand@broadinstitute.org>
* Added a new suite of tools for variant filtering based on site-level annotations. (#7954) * Adds wdl that tests joint VCF filtering tools (#7932) * adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments * Added changes for specifying model backend and other tweaks to WDLs and environment. * Added classes for representing a collection of labeled variant annotations. * Added interfaces for modeling and scoring backends. * Added a new suite of tools for variant filtering based on site-level annotations. * Added integration tests. * Added test resources and expected results. * Miscellaneous changes. * Removed non-ASCII characters. * Added documentation for TrainVariantAnnotationsModel and addressed review comments. Co-authored-by: meganshand <mshand@broadinstitute.org> * Added toggle for selecting resource-matching strategies and miscellaneous minor fixes to new annotation-based filtering tools. (#8049) * Adding use_allele_specific_annotation arg and fixing task with empty input in JointVcfFiltering WDL (#8027) * Small changes to JointVCFFiltering WDL * making default for use_allele_specific_annotations * addressing comments * first stab * wire through WDL changes * fixed typo * set model_backend input value * add gatk_override to JointVcfFiltering call * typo in indel_annotations * make model_backend optional * tabs and spaces * make all model_backends optional * use gatk 4.3.0 * no point in changing the table names as this is a POC * adding new branch to dockstore * adding in branching logic for classic VQSR vs VQSR-Lite * implementing the separate schemas for the VQSR vs VQSR-Lite branches, including Java changes necessary to produce the different tsv files * passing classic flag to indel run of CreateFilteringFiles * Update GvsCreateFilterSet.wdl cleaning up verbiage * Removed mapping error rate from estimate of denoised copy ratios output by gCNV and updated sklearn. (#7261) * cleanup up sloppy comment --------- Co-authored-by: samuelklee <samuelklee@users.noreply.github.com> Co-authored-by: meganshand <mshand@broadinstitute.org> Co-authored-by: Rebecca Asch <rasch@broadinstitute.org>
This adds the following tools, which supplant the VQSR workflow: ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations. See meta issue #7724.