Adds wdl that tests joint VCF filtering tools #7932

meganshand · 2022-07-08T18:22:58Z

This adds a small test case for the WDL of the filtering pipeline. This still has indels and snps separated out. I can combine them if needed, but we'd like to use different annotations for each mode. This also doesn't actually apply the final filtering (with a threshold) since we still need to add a step to determine the correct threshold. The final VCFs from this workflow should have SCORE INFO annotations for each site.

This takes in an array of VCFs (and outputs an array of VCFs) because this is an option for large callsets in the WARP joint genotyping WDL which is where this WDL will eventually be integrated.

This test only ensures that the WDL runs and doesn't compare to expected results (the same as the other WDL tests in this repo).

codecov · 2022-07-08T18:51:32Z

Codecov Report

❗ No coverage uploaded for pull request base (sl_sklearnvarianttrain_scalable@16e686c). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head a894c2a differs from pull request most recent head 93eebb0. Consider uploading reports for the commit 93eebb0 to get more accurate results

@@                         Coverage Diff                         @@
##             sl_sklearnvarianttrain_scalable     #7932   +/-   ##
===================================================================
  Coverage                                   ?   87.031%           
  Complexity                                 ?     37304           
===================================================================
  Files                                      ?      2238           
  Lines                                      ?    175124           
  Branches                                   ?     18897           
===================================================================
  Hits                                       ?    152412           
  Misses                                     ?     16010           
  Partials                                   ?      6702

samuelklee

Thanks so much for putting this together, @meganshand! Mostly minor comments and questions, but there are a few that get a little deeper into questions of tool/pipeline design that we might want to discuss. At the same time, we can talk about plans/timelines for getting this and the branch it's based off merged.

samuelklee · 2022-07-11T17:58:27Z

.github/workflows/gatk-tests.yml

@@ -291,7 +291,7 @@ jobs:
    runs-on: ubuntu-latest
    strategy:
      matrix:
-        wdlTest: [ 'RUN_CNV_GERMLINE_COHORT_WDL', 'RUN_CNV_GERMLINE_CASE_WDL', 'RUN_CNV_SOMATIC_WDL', 'RUN_M2_WDL', 'RUN_CNN_WDL' ]
+        wdlTest: [ 'RUN_CNV_GERMLINE_COHORT_WDL', 'RUN_CNV_GERMLINE_CASE_WDL', 'RUN_CNV_SOMATIC_WDL', 'RUN_M2_WDL', 'RUN_CNN_WDL', 'RUN_FILTERING_WDL' ]


This is fine for now, but we may want to be a little more specific than "RUN_FILTERING_WDL" in the future. Any thoughts on alternatives?

(Just to be clear, I think we'd want to specify that this is INFO-annotation-based filtering, but not close the door on using this pipeline for single-sample, somatic, non-human, etc.)

samuelklee · 2022-07-11T18:00:40Z

scripts/filtering_cromwell_tests/README.md

+
+**This directory is for GATK devs only**
+
+This directory contains scripts for running CNN Variant WDL tests in the automated travis build environment.


Update CNN Variant as appropriate.

Perhaps also a little blurb about the test data?

samuelklee · 2022-07-11T18:02:27Z