-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full scientific validation via end to end comparison of filtered results between WARP and BQ #7179
Conversation
* add to dockstore.yml * add optional model inputs to ngs_filter_extract.wdl * add model input to indels VariantRecalibrator command * add missing model_report variable * fix indentation in .dockstore.yml * simplify model arguments
* add approximate excess het calculation to feature extract and filter on it
.dockstore.yml
Outdated
branches: | ||
- master | ||
- ah_var_store | ||
- kc_feature_tieout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove any references to branches other than master and ah_var_store (here and for cohort extract)
bcftools view -O z SYNDIP.bq.all.vcf.gz chr20 > SYNDIP.bq.all.chr20.vcf.gz | ||
tabix SYNDIP.bq.all.chr20.vcf.gz | ||
|
||
gsutil -m cp gvs.bq.all.vcf.gz* gs://broad-dsp-spec-ops/scratch/bigquery-jointcalling/warp/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question as above - the NA12878 extraction uses gvs.bq.all.noeh.vcf.gz
- i suspect this is just naming inconsistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops -- yeah naming consistency, I should have removed noeh
First, create a full cohort extract (as described in README.md) using the `gvs_tieout_acmg_v3` (baseline), or otherwise desired, filtering model. | ||
|
||
``` | ||
~/gatk SelectVariants -V gvs.bq.all.noeh.vcf.gz --sample-name SM-G947Y --select-type-to-exclude NO_VARIATION -O NA12878.bq.all.noeh.vcf.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per other threads - remove noeh
here
@@ -10,6 +10,7 @@ | |||
import org.broadinstitute.hellbender.tools.variantdb.CommonCode; | |||
import org.broadinstitute.hellbender.tools.variantdb.SampleList; | |||
import org.broadinstitute.hellbender.tools.variantdb.SchemaUtils; | |||
import org.broadinstitute.hellbender.utils.bigquery.BigQueryUtils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
## RESEARCH: How to create list of excess het sites from WARP | ||
``` | ||
rm excess_het_sites.bed | ||
WORKFLOW_ID="c3733f56-218d-4242-af11-6557f101c5e2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update for the good set:
WORKFLOW_ID="36e7547e-3253-4888-9b1c-2a7437401aee"
for f in `gsutil ls gs://broad-dsp-spec-ops-cromwell-execution/JointGenotyping/${WORKFLOW_ID}/call-HardFilterAndMakeSitesOnlyVcf/shard-*/*.sites_only.variant_filtered.vcf.gz `; do
echo "Processing $f"
gsutil cat $f | gunzip | awk '{ if ($7 == "ExcessHet") print $1"\t"($2-1)"\t"$2}' >> excess_het_sites.bed
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions and one update for getting the excess het sites, but looks good to me!
…lts between WARP and BQ (#7179) * first pass * fixed to suppoer 1/0 1|0 genotypes * updates * updated workflow id * qualapprox updates from mmt branch (#7130) * handle multi-allelics and clean up diff output * updated alt allele script * excluding indels * updated for 37 sample tieout * full e2e tieout * formatting * output model/rscripts * Add model inputs to ngs_filter_extract (#7163) * add to dockstore.yml * add optional model inputs to ngs_filter_extract.wdl * add model input to indels VariantRecalibrator command * add missing model_report variable * fix indentation in .dockstore.yml * simplify model arguments * doc updates, tsv updates, fixed WARP dependencies to output model/RScript * modified parameters to use WARP excess het * doc updates * AH- add excess het (approx) to feature extract (#7175) * add approximate excess het calculation to feature extract and filter on it * added hacked version of XL * doc updates * moved EH to site-level (#7178) * cleanup of old VQSR feature input tieout * PR comments * PR comments Co-authored-by: M. Morgan Taylor <marymorg@broadinstitute.org> Co-authored-by: Andrea Haessly <ahaessly@broadinstitute.org>
No description provided.