Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

kcibul · 2021-02-25T16:00:16Z

Resolves https://github.com/broadinstitute/dsp-spec-ops/issues/239

See README.md in this PR for full details

To make this easier to review, the changes break down into a few sections

Docs -- the README.md. Does it make sense? Could you follow it?
Comparison Script (compare_data.py)-- is it clear? Obvs any bugs would be great. The Github Issue for this PR describes what is compared
WDL changes -- should be straightforward to review, just minor changes
Code changes (java) -- we can walk through this together if that's more effective

mmorgantaylor

covered your changes in python so far - still need to go through the full thing in more detail, but the changes all make sense

scripts/variantstore/tieout/compare_data.py

mmorgantaylor · 2021-03-01T19:45:41Z

scripts/variantstore/tieout/compare_data.py

@@ -261,9 +271,9 @@ def compare_sample_data(e1, e2):
                log_difference(key, e1, e2) 

        # TODO: temporary until we decide what to do with spanning deletions


is this #TODO comment still applicable?

yep -- tied to https://github.com/broadinstitute/dsp-spec-ops/issues/143, but I'll add that to the comment as well

ahaessly

looked at most of it except python

ahaessly · 2021-03-01T20:56:32Z

scripts/variantstore/tieout/dig_reblocked.sh

+POS=$2
+
+GVCF=$(cat legacy_wdl/sample_set_membership_v6.tsv | grep $SAMPLE | cut -f2)
+gsutil cat $GVCF | gunzip | grep -C 10 $POS


there is a zgrep command that you can use so you don't have to gunzip.

good to know!

ahaessly · 2021-03-01T21:16:45Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohortEngine.java

        if((isIndel && totalAsQualApprox < INDEL_QUAL_THRESHOLD) || (!isIndel && totalAsQualApprox < SNP_QUAL_THRESHOLD)) {
-            // logger.info(contig + ":" + currentPosition + ": dropped for low QualApprox of  " + totalAsQualApprox);
+            if ( printDebugInformation ) {


comment about totalAsQualApprox. it is the sum of the QualApprox values for every sample with a variant at this site (but not allele specific). this seems strange. it seems like it should be an average or some other ratio.

It's just the straight sum. I agree it seems like you would want a single high quality variant or some such... but this is exactly what is calculated in the warp pipeline.

ahaessly

👍

kcibul · 2021-03-02T17:01:47Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohortEngine.java

@@ -297,24 +325,19 @@ private double getQUALapproxFromSampleRecord(GenericRecord sampleRecord) {

        String s = o.toString();

-        // TODO: KCIBUL -- unclear how QUALapproxes are summed from non-ref alleles... replicating what I saw but need to confirm with Laura
+        // Non-AS QualApprox (used for qualapprox filter) is simply the sum of the AS values (see GnarlyGenotyper)


replace "is simply" but "is approximated by"

…es (#7106) * tieout changes * tidy up, updated README.md * PR comments

Marianie-Simeon approved these changes Mar 1, 2021

View reviewed changes

mmorgantaylor reviewed Mar 1, 2021

View reviewed changes

ahaessly reviewed Mar 1, 2021

View reviewed changes

ahaessly approved these changes Mar 2, 2021

View reviewed changes

kcibul commented Mar 2, 2021

View reviewed changes

kcibul force-pushed the ah_var_store branch from 76da027 to f3134df Compare March 9, 2021 17:41

kcibul added 5 commits March 9, 2021 16:39

tieout changes

84c2d29

WIP

00d037c

tidy up, updated README.md

f7fdda6

cleanup

20bbff4

PR comments

0293a21

kcibul force-pushed the kc_tieout branch from e0111e5 to 0293a21 Compare March 10, 2021 02:48

kcibul merged commit c4484d6 into ah_var_store Mar 10, 2021

kcibul deleted the kc_tieout branch March 10, 2021 13:56

mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021

Perform full WGS cohort extract scientific tieout for 35 ACMG59 sampl…

dd2cb77

…es (#7106) * tieout changes * tidy up, updated README.md * PR comments

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

kcibul commented Feb 25, 2021 •

edited

Loading

mmorgantaylor left a comment

mmorgantaylor Mar 1, 2021

kcibul Mar 1, 2021

ahaessly left a comment

ahaessly Mar 1, 2021

kcibul Mar 2, 2021

ahaessly Mar 1, 2021

kcibul Mar 2, 2021

ahaessly left a comment

kcibul Mar 2, 2021

		@@ -261,9 +271,9 @@ def compare_sample_data(e1, e2):
		log_difference(key, e1, e2)

		# TODO: temporary until we decide what to do with spanning deletions

Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

Conversation

kcibul commented Feb 25, 2021 • edited Loading

mmorgantaylor left a comment

Choose a reason for hiding this comment

mmorgantaylor Mar 1, 2021

Choose a reason for hiding this comment

kcibul Mar 1, 2021

Choose a reason for hiding this comment

ahaessly left a comment

Choose a reason for hiding this comment

ahaessly Mar 1, 2021

Choose a reason for hiding this comment

kcibul Mar 2, 2021

Choose a reason for hiding this comment

ahaessly Mar 1, 2021

Choose a reason for hiding this comment

kcibul Mar 2, 2021

Choose a reason for hiding this comment

ahaessly left a comment

Choose a reason for hiding this comment

kcibul Mar 2, 2021

Choose a reason for hiding this comment

kcibul commented Feb 25, 2021 •

edited

Loading