-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS-390. Add precision and sensitivity wdl #7813
Conversation
Make chromosome optional Added the logic to select only input_vcfs on the specified chromosome.
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7813 +/- ##
================================================
Coverage ? 86.295%
Complexity ? 35192
================================================
Files ? 2170
Lines ? 164837
Branches ? 17775
================================================
Hits ? 142246
Misses ? 16265
Partials ? 6326 |
File input_vcf | ||
String output_basename | ||
|
||
String docker = "us.gcr.io/broad-gotc-prod/imputation-bcf-vcf:1.0.5-1.10.2-0.1.16-1649948623" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this container may be better? but fyi bcftools is in the ah_var_store container (and we should probably add tabix if we haven't already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize ah_var_store had bgzip (and tabix?), so I went with this one. It's a lot smaller (394 MB vs 4.74 GB for ah_var_store), so the task will run faster. For simple tasks like this it probably makes some sense to use the smaller, but on the other hand standardizing on one docker might make sense too.
``` | ||
Now create single sample gVCFs for the control samples; in this example the sample names for the controls are "BI_HG-002", "UW_HG-002" and "BI_HG-003": | ||
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzin the samples in `sample_names`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzin the samples in `sample_names`. | |
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzing the samples in `sample_names`. |
``` | ||
BASE_CMD="rtg vcfeval --region chr20 --roc-subset snp,indel --vcf-score-field=INFO.MAX_AS_VQSLOD -t human_REF_SDF" | ||
SUFFIX="_roc_filtered" | ||
**truth_beds** - A list of the bed files for the truth data used for analyzin the samples in `sample_names`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gone analyzin'
**truth_beds** - A list of the bed files for the truth data used for analyzin the samples in `sample_names`. | |
**truth_beds** - A list of the bed files for the truth data used for analyzing the samples in `sample_names`. |
|
||
if (false) { | ||
String? none = "None" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this construct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the trick for how you make an undefined value / 'None' in wdl.
I have heard of no other way to set a variable to undefined.
} | ||
|
||
String? contig = if (chromosome == "all") then none else chromosome | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit---I found this bit (lines 18-22) a lil confusing without a comment. And why are we passing the contig to SelectVariants if we already split by contig??? hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the comment. 'none' is defined to be an undefined value and this was the trick I used to allow the workflow take a defined value 'all' for chromosome and translate that into undefined which was passed to SelectVariants.
Co-authored-by: RoriCremer <6863459+RoriCremer@users.noreply.github.com>
Co-authored-by: RoriCremer <6863459+RoriCremer@users.noreply.github.com>
Co-authored-by: RoriCremer <6863459+RoriCremer@users.noreply.github.com>
Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
Converted tie-out procedure to calculate Precision and Sensitivity to a wdl.
An example run (on chr20) is at: https://job-manager.dsde-prod.broadinstitute.org/jobs/03346ba0-94f8-4205-b72e-499d73de9d43
An example run (using all input VCFs and all chromosomes) is at: https://job-manager.dsde-prod.broadinstitute.org/jobs/a041f918-e72f-4a98-aa98-7f242bab0b03