Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --high_confidence option for dual hybrid genomes #9

Closed
FelixKrueger opened this issue Feb 22, 2017 · 2 comments
Closed

Add --high_confidence option for dual hybrid genomes #9

FelixKrueger opened this issue Feb 22, 2017 · 2 comments
Assignees

Comments

@FelixKrueger
Copy link
Owner

We have come across certain position in the genome where different strains appear to have the same SNP (indicated by the GT/genotype field), but one of the strains failed the FI/FILTER criterium (1 is PASS, 0 is FAIL). Here is an example:

GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI
1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1 (129) 1/1:15:4:0:79,15,0:67,12,0:2:24:4:0,0,4,0:0:-0.556411:.:0 (Cast)

For single hybrid genomes we would include this position into the 129 genome (1/1 homozygous SNP, first line), but would ignore the position for the Cast genome (also 1/1 homozygous SNP, but failed the high confidence FI filter, second line). This seems like a reasonable approach.

For dual hybrid genomes such positions might be a problem though because when the 129 and Cast SNP lists are compared with each other it looks like there is now a SNP between 129 and Cast, even though there was evidence that the genotype was the same (1/1) in and Cast, only that it did not pass the threshold to count as high confidence SNP in Cast.

As a solution to this can we change the SNPsplit genome preparation to store the FI value as well as the GT genotype and only use the position for a dual-hybrid SNP list if the position was measured with high confidence (i.e. FI=1) in both strains? Thanks to @nservant for helpful discussions in this regard.

@FelixKrueger
Copy link
Owner Author

I have now tried to add functionality for the --dual_hybrid mode to identify positions where both genomes had homozygous SNPs compared to the reference but where one strain did not pass the high confidence filters. Instead of making this a new option this is now the default behaviour since I believe this is the right thing to do. Addressed 210af81 and 1ab9048.

@FelixKrueger
Copy link
Owner Author

In addition to high confidence homozygous SNP positions we also see some cases of low confidence no-SNP positions, such as this one:
GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI
1/1:21:12:0:152,21,0:128,12,0:2:55:9:3,0,7,2:0:-0.662043:.:1
0/0:.:5:0:.,.,.:.,.,.:2:47:4:1,0,4,0:0:-0.556411:.:0

In line with only including high-confidence positions for the allele-specific analysis I have now added an additional check so that both FI fields need to have passed the filter (i.e. FI=1) irrespective of the genotype (which may e.g. be 0/0, 0/1 or ./.). This addition requires some additional memory compared to the original version but will make the genome preparation more robust.

Addressed in c9688d9 and 481a460.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant