-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
covered your changes in python so far - still need to go through the full thing in more detail, but the changes all make sense
@@ -261,9 +271,9 @@ def compare_sample_data(e1, e2): | |||
log_difference(key, e1, e2) | |||
|
|||
# TODO: temporary until we decide what to do with spanning deletions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this #TODO comment still applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep -- tied to https://github.com/broadinstitute/dsp-spec-ops/issues/143, but I'll add that to the comment as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looked at most of it except python
POS=$2 | ||
|
||
GVCF=$(cat legacy_wdl/sample_set_membership_v6.tsv | grep $SAMPLE | cut -f2) | ||
gsutil cat $GVCF | gunzip | grep -C 10 $POS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a zgrep command that you can use so you don't have to gunzip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know!
if((isIndel && totalAsQualApprox < INDEL_QUAL_THRESHOLD) || (!isIndel && totalAsQualApprox < SNP_QUAL_THRESHOLD)) { | ||
// logger.info(contig + ":" + currentPosition + ": dropped for low QualApprox of " + totalAsQualApprox); | ||
if ( printDebugInformation ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment about totalAsQualApprox. it is the sum of the QualApprox values for every sample with a variant at this site (but not allele specific). this seems strange. it seems like it should be an average or some other ratio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the straight sum. I agree it seems like you would want a single high quality variant or some such... but this is exactly what is calculated in the warp pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -297,24 +325,19 @@ private double getQUALapproxFromSampleRecord(GenericRecord sampleRecord) { | |||
|
|||
String s = o.toString(); | |||
|
|||
// TODO: KCIBUL -- unclear how QUALapproxes are summed from non-ref alleles... replicating what I saw but need to confirm with Laura | |||
// Non-AS QualApprox (used for qualapprox filter) is simply the sum of the AS values (see GnarlyGenotyper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace "is simply" but "is approximated by"
…es (#7106) * tieout changes * tidy up, updated README.md * PR comments
Resolves https://github.com/broadinstitute/dsp-spec-ops/issues/239
See README.md in this PR for full details
To make this easier to review, the changes break down into a few sections
Docs -- the README.md. Does it make sense? Could you follow it?
Comparison Script (compare_data.py)-- is it clear? Obvs any bugs would be great. The Github Issue for this PR describes what is compared
WDL changes -- should be straightforward to review, just minor changes
Code changes (java) -- we can walk through this together if that's more effective