-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with find-sites #96
Comments
Hi, thanks for reporting. Will you give it a try with this binary? It will give us more information about exactly where the error is occuring |
My apologies. I didn't overwrite a previous |
Thanks for the prompt feedback! Here is what I get with the whole vcf file. $ ./somalier_dbg find-sites batch1.Sentieon.biSNPs.vcf.gz Thanks in advance! |
I see the problem. It is fixed in this binary I was assuming that the table would have entries for each chromosome which is obviously not going to happen in every case, including yours. |
So now the error message is gone but not sure it's working properly, most likely these things were also present in my original run, I just didn't check thoroughly. I seem to have SNPs which are not far (10000 bp) apart from each other after giving "--snp-dist=10000", there are many cases like this below. In addition I don't have any SNPs from e.g., chr8, and I know that there are many SNPs matching the parameters --min-AN=1 --min-AF=0.4 --snp-dist=10000 chr7 4620726 . A C 100 . AC=3;AF=0.5 Here is the run command/output: $ ./somalier_dbg2 find-sites --min-AN=1 --min-AF=0.4 --snp-dist=10000 batch1.Sentieon.biSNPs.vcf.gz As a more general approach, as an alternative, I can also e.g., filter my vcf file for AC values I choose and then use "plink --indep" to remove linked sites, right? |
I will also try to just use a vcf with only autosomes, not sure if the sex chromosomes are interfering with the overall calculations. |
Hi, sorry for your trouble. |
Thanks Brent, no worries and sounds good ;) |
OK. I think this will work correctly for you. |
see #96 this also relplaces a vector of variants with a vector of integers making for faster linear-search and less memory.
Thanks, we are getting close but probably not fully there yet! chrX 2782161 . A G 100 . AC=1;AF=0.167 $ ./somalier_dbg3 find-sites --min-AN=1 --min-AF=0.4 --snp-dist=50000 batch1.Sentieon.biSNPs.vcf.gz |
Hi, it's intentionally getting more values on X to evaluate sex. This is too many, in your case, and I'll make a change for that, but it will not affect your relatedness calculations. I'll update this message:
to indicate Would these changes address your concerns? |
The other thing to clarify is that the sex chromosome have different filtering because it's only for quality and linkage doesn't matter. |
indicate number of autosomal variants. limit number of X and y chromosomes as we don't need so many
indicate number of autosomal variants. limit number of X and y chromosomes as we don't need so many
And here's another binary with those changes in case you want to try it. It only updates the message and caps the number of X and y variants it will use. |
Super, thanks a lot Brent for the prompt response, I think it looks great! Yes, the X/Y chromosomes are now also filtered for quality. |
Hi Brent, I've been using the binary of the Somalier (somalier_dbg4.gz) that you shared with me (in this thread above), and it's the Somalier's version: 0.2.16. However, I realised that the latest version in the releases and the docker file is actually the v0.2.15. I know that there are only minor changes in the versions, but I was wondering whether you will also push that latest v0.2.16 to docker and github? Thanks, Ashot |
Hi Brent,
Thanks for Somalier, it's quite fast and straightforward to use!
At the moment I am also trying to extract sites more tailored to my dataset with "find-sites". While it worked fine for one of my vcf files, it failed for another, which was produced from a different pipeline with slightly different INFO and FORMAT fields.
For the failed one I am getting the error below: I am running somalier from it's singularity image and this is just a chr21 extract of my main vcf.
$ singularity exec ~/bin/somalier.sif somalier find-sites --min-AN=2 --min-AF=0.2 --snp-dist=10 test.vcf
INFO: Converting SIF file to temporary sandbox...
somalier version: 0.2.15
on chrom:chr21
290 candidate variants
tables.nim(262) []
Error: unhandled exception: key not found: chr21 [KeyError]
INFO: Cleaning up image...
Could you elaborate for which fields in the vcf is "find-sites" looking for? My vcf seems to have the relevant ones such as AC, AF, AN...
E.g., INFO and FORMAT fields:
AC=2;AF=0.333;AN=6;DP=16;ExcessHet=1.0474;FS=0;MLEAC=1;MLEAF=0.167;QD=22;SOR=2.303 GT:AD:DP:GQ:PL
Thanks
Ashot
The text was updated successfully, but these errors were encountered: