Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent / child pairs in CalculateGenotypePosteriors #5409

Closed
gmagoon opened this issue Nov 14, 2018 · 27 comments
Closed

Parent / child pairs in CalculateGenotypePosteriors #5409

gmagoon opened this issue Nov 14, 2018 · 27 comments
Assignees
Milestone

Comments

@gmagoon
Copy link

gmagoon commented Nov 14, 2018

It is not clear to me from the docs whether parent/child pairs are intended to be supported by CalculateGenotypePosteriors, but a quick glance at the mentions in comments in FamilyLikelihoods.java makes me suspect that they are intended to be supported. (In any case, from my perspective, it would be a very nice feature as I have yet to find a tool that will robustly handle this use case.)

Here are the main issues that I'm encountering when trying to use CalculateGenotypePosteriors for a parent-child pair:

  1. If I supply a ped file with two individuals like the following, this check gets triggered, resulting in printing of the warning and skipping family priors.
FAM	MOM	0	0	2	0
FAM	CHILD	0	MOM	2	0
  1. If I add a father to the ped file to form a trio, like below, CalculateGenotypePosteriors proceeds without the warning that occurs in first approach, but the output doesn't appear to make any adjustments to genotypes, posteriors, etc. Note that there is no entry for "DAD" in the input VCF.
FAM	MOM	0	0	2	0
FAM	DAD	0	0	1	0
FAM	CHILD	DAD	MOM	2	0
@ldgauthier
Copy link
Contributor

CalculateGenotypePosteriors is only intended for trios. This is noted (admittedly not too clearly) in the output section of the docs (Per-site, per-trio joint likelihoods (JL) and joint posteriors (JL) -- which needs a fix to be JP for posteriors) In GATK3, CalculateGenotypePosteriors shares some code with PhaseByTransmission (namely the FamilyLikelihoods.java), which does support parent-child pairs, which is why you encountered comments relevant to pairs.

We can certainly update the docs for clarity.

@ldgauthier
Copy link
Contributor

@bhanugandham can you take a stab at a docs update? I'm happy to review.

@gmagoon
Copy link
Author

gmagoon commented Nov 14, 2018

I think I figured out my problem. Looking more closely at the FamilyLikelihoods.java source code, I noticed some indications that the analysis needed GTs. My initial input only contained PLs. Using the second approach with input with GTs added, it now appears that the pairs info is being taken into consideration.

@gmagoon
Copy link
Author

gmagoon commented Nov 14, 2018

Sorry, I was mistaken...I ran some further checks and the pairs info wasn't being taken into account with the input containing GTs.

Thanks for the explanation @ldgauthier .

@bhanugandham
Copy link
Contributor

Hi @ldgauthier
I will look into this. What's the timeline? How soon do we need this?

@ldgauthier
Copy link
Contributor

ldgauthier commented Nov 15, 2018 via email

@sooheelee
Copy link
Contributor

@ldgauthier, I've been assigned this doc update. I'm wondering if you have in mind updates to a specific existing document, e.g. tool doc, or if we are to create a new tutorial on CalculateGenotypePosteriors. I had already created some preliminary content covering the tool as an extension of the Variant Discovery workshop hands-on tutorial, which you reviewed at some point back last summer before the Cambridge UK workshop. What I can do is create a forum tutorial based on this content.

@ldgauthier
Copy link
Contributor

I didn't expect for this particular clarification to launch a tutorial. I think the tool docs (i.e. javadoc) should be updated and a new note about pairs in the workflow docs (https://software.broadinstitute.org/gatk/documentation/article?id=11074), in the intro and probably at the section at the end about "family priors". Incidentally I'm seeing that the LaTeX at the end of that doc isn't coming out right, so that would be a nice fix too.

@sooheelee
Copy link
Contributor

Sure @ldgauthier, I can update the javadoc and Article#11074. It might take me some time to figure out the LaTeX part. I remember @vdauwera mentioning this was something we will fix on the forum management side. Any help towards what needs to be fixed would be appreciated @vdauwera or I can contact Vanilla for help.

@sooheelee
Copy link
Contributor

@vdauwera, fyi here is a link to our SLACK discussion about the LaTeX issue: https://broadinstitute.slack.com/archives/C33EBDEAX/p1540481072000100.

To quote:

Yes we have a latex interpreter built into the forum and website styling code
First thing to check: is the formatting messed up on both the forum view and the doc website view? That will help narrow down the problem
We don’t want to use pictures of rendered equations

Then @bhanugandham confirms the issue is on both the forum view and the doc website view based on https://software.broadinstitute.org/gatk/documentation/article.php?id=8031. What is the next step?

@vdauwera
Copy link
Contributor

vdauwera commented Jan 15, 2019 via email

@sooheelee
Copy link
Contributor

sooheelee commented Jan 15, 2019

Great, thanks @vdauwera. I'm having trouble placing images into the Monday.com issue ticket so will note my updates here.

Studying the LaTeX issue, it appears that three of the equations do not render correctly but five do. This is the case for the https://software.broadinstitute.org/gatk/documentation/article?id=11074 view. For the forum view at https://gatkforums.broadinstitute.org/gatk/discussion/11074, none of the equations render.

CORRECT RENDERING:

screenshot 2019-01-15 10 24 43

DO NOT RENDER:

screenshot 2019-01-15 10 22 30

screenshot 2019-01-15 10 23 12


I will keep the LaTeX formatted text in HTML comment tags so the docs retain them and they remain untouched and also place in PNG images of the rendered equations, like so:

screenshot 2019-01-15 10 36 53

@sooheelee
Copy link
Contributor

sooheelee commented Jan 15, 2019

Here are the results of updating the three LaTeX equations:


screenshot 2019-01-15 10 56 05

screenshot 2019-01-15 10 56 20

Now I will move on to clarifying content.

@sooheelee
Copy link
Contributor

Towards updates:

@sooheelee
Copy link
Contributor

@ldgauthier, I made some comments in the linked GoogleDocs above towards updates and referenced you in a number of the comments. I hope you have a moment to review.

@sooheelee
Copy link
Contributor

Laura is OOO until tomorrow. I will start making changes that fall under the scope of 'a quick fix'.

@sooheelee
Copy link
Contributor

I have gone ahead and made some tracked changes to the tool doc. Waiting for @ldgauthier's perspective on proposed changes to Article#11074.

@sooheelee sooheelee added this to the Engine-Q12019 milestone Jan 17, 2019
@sooheelee
Copy link
Contributor

Testing the tool behavior when given an incomplete PED file.

PED
screenshot 2019-01-22 16 42 07

Command

gatk CalculateGenotypePosteriors \
-V precomputed/trioGGVCF.vcf.gz \
-ped duo.ped \
--skip-population-priors \
-O sandbox/duoCGP.vcf.gz

Results
screenshot 2019-01-22 16 44 01

The line of interest reads:

16:27:00.401 INFO  CalculateGenotypePosteriors - No PED file passed or no *non-skipped* trios found in PED file. Skipping family priors.

@sooheelee
Copy link
Contributor

The information you requested @ldgauthier ^^.

@sooheelee
Copy link
Contributor

@gmagoon, please feel free to comment in the linked GoogleDocs the portions of the documentation that you found unclear. Thanks.

@gmagoon
Copy link
Author

gmagoon commented Jan 23, 2019

Thanks @sooheelee.
The aspect that warranted clarification from my perspective was which sort of relationships are considered when calculating the posteriors.
So maybe an addition like the italicized text below?:

*(Optional) A PED pedigree file containing the description of the relationships between individuals. Only trio groups are considered in the calculations.

@sooheelee
Copy link
Contributor

Thank you @gmagoon. This is in the new draft now and these updates should be reflected in the next release of GATK.

@ldgauthier, the PR containing the updates is at #5601.

@sooheelee
Copy link
Contributor

@ldgauthier, please let me know if it is okay to close this or if you would like to make further changes to either document. To summarize, we have updated the tooldoc and the visualization of the LaTeX equations in https://software.broadinstitute.org/gatk/documentation/article?id=11074. We have some suggestions for further updates to Article#11074 commented in the linked GoogleDoc above but will not update the article further at this time.

@ldgauthier
Copy link
Contributor

Thanks for the equation fixes in the article. I only just saw the google doc and made some comments. I reviewed #5601 and have some comments there too.

@sooheelee
Copy link
Contributor

I am getting to these now @ldgauthier. Apologies for the delay--I had a tire blowout Sunday evening while in Vermont and just returned to work today.

@sooheelee
Copy link
Contributor

I've incorporated your feedback to #5601 @ldgauthier and the commit is undergoing tests. Please feel free to merge the PR if you accept the changes and have no further comments.

As for Article#11074, given we have addressed the original issues (e.g. Latex), I am going to consider the additional recommendations as something for the not-so-near future. Would that be okay with you @ldgauthier?

@sooheelee
Copy link
Contributor

Please see #5703 for a LaTeX rendering solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants