You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have come across certain position in the genome where different strains appear to have the same SNP (indicated by the GT/genotype field), but one of the strains failed the FI/FILTER criterium (1 is PASS, 0 is FAIL). Here is an example:
For single hybrid genomes we would include this position into the 129 genome (1/1 homozygous SNP, first line), but would ignore the position for the Cast genome (also 1/1 homozygous SNP, but failed the high confidence FI filter, second line). This seems like a reasonable approach.
For dual hybrid genomes such positions might be a problem though because when the 129 and Cast SNP lists are compared with each other it looks like there is now a SNP between 129 and Cast, even though there was evidence that the genotype was the same (1/1) in and Cast, only that it did not pass the threshold to count as high confidence SNP in Cast.
As a solution to this can we change the SNPsplit genome preparation to store the FI value as well as the GT genotype and only use the position for a dual-hybrid SNP list if the position was measured with high confidence (i.e. FI=1) in both strains? Thanks to @nservant for helpful discussions in this regard.
The text was updated successfully, but these errors were encountered:
I have now tried to add functionality for the --dual_hybrid mode to identify positions where both genomes had homozygous SNPs compared to the reference but where one strain did not pass the high confidence filters. Instead of making this a new option this is now the default behaviour since I believe this is the right thing to do. Addressed 210af81 and 1ab9048.
In addition to high confidence homozygous SNP positions we also see some cases of low confidence no-SNP positions, such as this one: GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:21:12:0:152,21,0:128,12,0:2:55:9:3,0,7,2:0:-0.662043:.:1 0/0:.:5:0:.,.,.:.,.,.:2:47:4:1,0,4,0:0:-0.556411:.:0
In line with only including high-confidence positions for the allele-specific analysis I have now added an additional check so that both FI fields need to have passed the filter (i.e. FI=1) irrespective of the genotype (which may e.g. be 0/0, 0/1 or ./.). This addition requires some additional memory compared to the original version but will make the genome preparation more robust.
We have come across certain position in the genome where different strains appear to have the same SNP (indicated by the GT/genotype field), but one of the strains failed the FI/FILTER criterium (1 is PASS, 0 is FAIL). Here is an example:
GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI
1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1 (129) 1/1:15:4:0:79,15,0:67,12,0:2:24:4:0,0,4,0:0:-0.556411:.:0 (Cast)
For single hybrid genomes we would include this position into the 129 genome (1/1 homozygous SNP, first line), but would ignore the position for the Cast genome (also 1/1 homozygous SNP, but failed the high confidence FI filter, second line). This seems like a reasonable approach.
For dual hybrid genomes such positions might be a problem though because when the 129 and Cast SNP lists are compared with each other it looks like there is now a SNP between 129 and Cast, even though there was evidence that the genotype was the same (1/1) in and Cast, only that it did not pass the threshold to count as high confidence SNP in Cast.
As a solution to this can we change the
SNPsplit genome preparation
to store theFI
value as well as theGT
genotype and only use the position for a dual-hybrid SNP list if the position was measured with high confidence (i.e.FI=1
) in both strains? Thanks to @nservant for helpful discussions in this regard.The text was updated successfully, but these errors were encountered: