Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full scientific validation via end to end comparison of filtered results between WARP and BQ #7179

Merged
merged 23 commits into from
Apr 2, 2021

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Apr 2, 2021

No description provided.

@kcibul kcibul requested a review from mmorgantaylor April 2, 2021 14:55
.dockstore.yml Outdated
branches:
- master
- ah_var_store
- kc_feature_tieout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove any references to branches other than master and ah_var_store (here and for cohort extract)

@gatk-bot
Copy link

gatk-bot commented Apr 2, 2021

Travis reported job failures from build 33498
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33498.1 logs
cloud openjdk11 33498.14 logs

bcftools view -O z SYNDIP.bq.all.vcf.gz chr20 > SYNDIP.bq.all.chr20.vcf.gz
tabix SYNDIP.bq.all.chr20.vcf.gz

gsutil -m cp gvs.bq.all.vcf.gz* gs://broad-dsp-spec-ops/scratch/bigquery-jointcalling/warp/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question as above - the NA12878 extraction uses gvs.bq.all.noeh.vcf.gz - i suspect this is just naming inconsistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops -- yeah naming consistency, I should have removed noeh

First, create a full cohort extract (as described in README.md) using the `gvs_tieout_acmg_v3` (baseline), or otherwise desired, filtering model.

```
~/gatk SelectVariants -V gvs.bq.all.noeh.vcf.gz --sample-name SM-G947Y --select-type-to-exclude NO_VARIATION -O NA12878.bq.all.noeh.vcf.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per other threads - remove noeh here

@@ -10,6 +10,7 @@
import org.broadinstitute.hellbender.tools.variantdb.CommonCode;
import org.broadinstitute.hellbender.tools.variantdb.SampleList;
import org.broadinstitute.hellbender.tools.variantdb.SchemaUtils;
import org.broadinstitute.hellbender.utils.bigquery.BigQueryUtils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

## RESEARCH: How to create list of excess het sites from WARP
```
rm excess_het_sites.bed
WORKFLOW_ID="c3733f56-218d-4242-af11-6557f101c5e2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update for the good set:

WORKFLOW_ID="36e7547e-3253-4888-9b1c-2a7437401aee"
for f in `gsutil ls gs://broad-dsp-spec-ops-cromwell-execution/JointGenotyping/${WORKFLOW_ID}/call-HardFilterAndMakeSitesOnlyVcf/shard-*/*.sites_only.variant_filtered.vcf.gz `; do
    echo "Processing $f"
    gsutil cat $f | gunzip | awk '{ if ($7 == "ExcessHet") print $1"\t"($2-1)"\t"$2}' >> excess_het_sites.bed
done

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions and one update for getting the excess het sites, but looks good to me!

@kcibul kcibul merged commit d2e190e into ah_var_store Apr 2, 2021
@kcibul kcibul deleted the kc_feature_tieout branch April 2, 2021 19:34
mmorgantaylor added a commit that referenced this pull request Apr 6, 2021
…lts between WARP and BQ (#7179)

* first pass

* fixed to suppoer 1/0 1|0 genotypes

* updates

* updated workflow id

* qualapprox updates from mmt branch (#7130)

* handle multi-allelics and clean up diff output

* updated alt allele script

* excluding indels

* updated for 37 sample tieout

* full e2e tieout

* formatting

* output model/rscripts

* Add model inputs to ngs_filter_extract (#7163)

* add to dockstore.yml

* add optional model inputs to ngs_filter_extract.wdl

* add model input to indels VariantRecalibrator command

* add missing model_report variable

* fix indentation in .dockstore.yml

* simplify model arguments

* doc updates, tsv updates, fixed WARP dependencies to output model/RScript

* modified parameters to use WARP excess het

* doc updates

* AH- add excess het (approx) to feature extract (#7175)

* add approximate excess het calculation to feature extract and filter on it

* added hacked version of XL

* doc updates

* moved EH to site-level (#7178)

* cleanup of old VQSR feature input tieout

* PR comments

* PR comments

Co-authored-by: M. Morgan Taylor <marymorg@broadinstitute.org>
Co-authored-by: Andrea Haessly <ahaessly@broadinstitute.org>
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants