-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about how GQ does work #326
Comments
Hi @aderzelle, thank you for your question. I believe this is coming from the cohort merging step (by GLnexus) assuming that you used The current version of the merging parameters uses I believe this is the reason why you are seeing the sharp drop at I hope this helps and please let us know if you have any more questions/comments. [1] https://doi.org/10.1101/2020.02.10.942086 Best, |
Thanks for the very helpful answer. From GLnexus definition
That means we should be more stringent on quality filtering for singletons since they are not supported by observations in more than one individual, right? |
We can be (that would imply Depending on how the cohort VCF is used, one may want to apply additional filters to the cohort, which can either be applied after merging using a standard VCF modification tool (e.g. bcftools), or by changing the merging settings directly in the .yml file. |
FYI I submitted a pull request to GLnexus repo for the "nomod" preset for merging (no filters or genotype revision). dnanexus-rnd/GLnexus#229 If this is accepted, you'll be able to try it out without downloading an external .yml file. @aderzelle Please let me know if you have any questions/comments related to this issue. If not, please feel free to close it :) |
Thanks for your detail explanation! I think I have all I need ;) |
Hello!
so some quick background, I am interested in SNP that are unique (= private) to each sample of my cohort. I ran DeepVariant on each sample, then GLnexus as per the best practices recommendations.
I extracted unique variants using bcftools --private option. Then, I wanted to do some filtering on GQ.
Here is the GQ distribution on one of the individual vcf file (so before joint-calling), I don't have much to say about it, it makes sense
However, here is the GQ distribution of the --private SNPs
First, there are some values that are NA due to SNPs that have . as GQ value. That's all right, errors in sequencing / mapping I guess. In fact, my reasoning is that the --private option will enrich the SNP set in all the errors that are unique to each sequencing data set. Therefore, I was expecting that the GQ distribution would be shifted to the left. However, what we see is that, not only is it shifted to the left but the shape of the distribution is also changed.
So I would like to know more about how GQ is exactly computed by DeepVariant. And why does the GQ seems to abruptly peak at 11-12
I also read on your blog that you consider "high quality variants" as the ones with a GQ of 20. Of course, owing to the distribution of GQ for the private set, setting a GQ threshold at 20 will make a big difference, as seen on this plot
Thanks a lot for your insight!
The text was updated successfully, but these errors were encountered: