-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenotypeGVCFs not outputting some variants it should with -stand-call-conf #5793
Comments
@lbergelson and/or @ldgauthier, your thoughts? Louis thought there was something fishy in the handling of |
This sounds really familiar... I feel like this was something I noticed a long time ago and it was explained away by someone.
In particular I think there's this site here gatk/src/main/java/org/broadinstitute/hellbender/tools/walkers/genotyper/GenotypingEngine.java Line 389 in d8d06cd
isPlausible . If I remember correctly, this can filter sites that would end up with higher than stand-call-conf qual if they continued through.
This was years ago though, so my memory is hazy. I remember thinking something seemed really wrong off about the parameter though. |
That threshold gets used for the final decision of whether to output a
site, but also for evaluating each allele. I'm going to point the finger at
the old AFCalculator because multi-allelics are its Achilles heel. Since
that version was before the made "new qual" the default, can you try with
`--use-new-qual-calculator`?
…On Wed, Mar 13, 2019 at 11:39 AM Louis Bergelson ***@***.***> wrote:
This sounds really familiar... I feel like this was something I noticed a
long time ago and it was explained away by someone.
-stand-call-conf is used to filter variants at several points in the
code, and not always compared against the final GQ.
In particular I think there's this site here
https://github.com/broadinstitute/gatk/blob/d8d06cd769959b595132cb0aab18ccd7fe913ffd/src/main/java/org/broadinstitute/hellbender/tools/walkers/genotyper/GenotypingEngine.java#L389
which decides if a variant isPlausible. If I remember correctly, this can
filter sites that would end up with higher than stand-call-conf qual if
they continued through.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5793 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdFfz4VpBVd2KgboqxRV43B6g1DY-ks5vWRvLgaJpZM4bthOJ>
.
--
Laura Doyle Gauthier, Ph.D.
Associate Director, Germline Methods
Data Sciences Platform
gauthier@broadinstitute.org
Broad Institute of MIT & Harvard
320 Charles St.
Cambridge MA 0214
|
@ldgauthier it looks like using
It looks like this is fixed in #5484 (and so 4.1.0.0). @tfenne are you ok closing this? |
Closing as resolved with the new QUAL calculator. |
I'm reopening this as I'm still having this same problem when using the latest release (4.1.1.0). The variant shown above is not emitted by |
It's kind of tricky because suppose eg that we have three alt alleles each with an allele qual of 19, so that the overall variant qual is roughly 3x19 = 57. If we filter alleles with a confidence of 20, we get no alleles and the variant qual changes to 0. Now, if instead of filtering by allele we only filter by overall variant qual we then have to keep an arbitrary number of sketchy alleles. I mean, what if we have 30 alleles each with a qual of 1? The current behavior seems preferable to me because the usual question users would ask downstream is whether some allele is real, not whether some site exhibits variation. As long as we define |
@davidbenjamin I'm not sure I follow your logic. But if you believe the current implementing is doing the expected thing I'd like to understand. In the example above, the site is multi-allelic in the gVCF. However, when run through GenotypeGVCFs it's reduced to being bi-allelic, and the QUAL of the bi-allelic site in the genotyped GVCF doesn't change - it's still To rephrase the issue - I find that if I run |
@tfenne would it make more sense to separate out the confidences into per-allele and per-site thresholds (where a multi-allelic site's QUAL accumulates evidence from all alleles)? The bi-allelic QUAL seems wrong if it's the same as when all the alleles are present. The genotype you give above is for the only sample in the VCF? |
@ldgauthier I think it's probably best to try and get to the bottom of why this variant's qual isn't being adjusted as it's reduced from 7 alleles in the gVCF to 2 alleles in the called VCF. This is, as you guessed, all single-sample. I've attached a reduced gVCF that just includes the variant in question, and the resulting genotyped VCF from running the command line below (and their indices)
A few observations from running the above command but varying the
Circling back to one of my original statements, I believe the least confusing way for this to work would be to think of it this way:
That said, it sounds like maybe the problem is less with the filtering on QUAL and more to do with the calculation of the final QUAL that ends up in the VCF? |
So we definitely don't update the QUAL if we drop alternate alleles: gatk/src/main/java/org/broadinstitute/hellbender/tools/walkers/genotyper/GenotypingEngine.java Line 259 in 9fce0b2
Note that the QUAL is based off of the AFResult that had alleles removed if they exceeded the output limit, but not if they had less evidence than the calling confidence threshold. @davidbenjamin I really hate to run the AF calculator again if we drop low quality alleles. Or maybe the new qual isn't as bad as I think? Would it be a decent approximation to add up the per-allele quals for the remaining alleles? |
Bug Report
Affected tool(s) or class(es)
GenotypeGVCFs 4.0.0.12
Affected version(s)
Description
I've run into a weird case where GenotypeGVCFs is doing something unexpected. I have a gVCF with the following entry in it:
It's a messy site for sure, an indel in a long homopolymer-T, but I think that's a separate issue. If I run the following on that gVCF:
then I get the following output to the VCF just like I'd expect:
QUAL is unchanged since I'm genotyping a single-sample gVCF. However, if I raise my
-stand-call-conf
threshold to 19.0, GenotypeGVCFs no longer outputs any variants. 565.73 >> 19.0, so I'm confused as to why that variant is no longer emitted.Steps to reproduce
Run GenotypeGVCFs on the above example with
-stand-call-conf
values ranging up to ~550.Expected behavior
The variant should be emitted into the VCF.
Actual behavior
The variant is not emitted if
-stand-call-conf
is set to 19 or higher.The text was updated successfully, but these errors were encountered: