-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--freq-file
not using all available SNPs present in both study and freq-file VCF
#29
Comments
hmmm...hard to say what the problem is without an example. I did try this on the test data and it runs okay:
That indexing error is worrying. What does |
I'm thinking the error had something to do with duplicate entries in the frequencies VCF. When I normalized with |
FYI you can run stats on two files at once ( Duplicate entries would likely cause problems for |
Thanks for the tip on bcftools stats / multiple files. RE after removing dupes - yes, I did, and that worked. Also a note. I'm using |
Thanks for troubleshooting this. It seems like the best course of action here is to warn or maybe just exit with error. It is on my to do list! |
There doesn't seem to be any reason not to allow analysis from a freq_file while also restricting to certain regions. This also removes the auto-usage of the frq_file as the regions in bcf_sr, which works around some unused SNPs as described in Illumina#29
I'm having an issue supplying allele frequencies as answered in #28. Unfortunately I can't provide example data, but I'll try to provide as much as I can without supplying sensitive data.
background
I have a study VCF (bgzip+tabix indexed). I'm using your set of "reliable" SNPs that happen to be on my platform. Some stats on my study VCF:
Some stats on my frequencies VCF (also bgzip'd and tabix indexed):
Just to be sure, I'm using bcftools isec to check the number of SNPs intersecting between these files.
According to the readme produced with isec, I'm primarily interested in how many SNPs are in 0002 and 0003 - the SNPs shared between the files.
I have 16,137 overlapping.
problem demonstration
However, when I run
akt kin
supplying this VCF with frequency data, its telling me that it's only using 9 SNPs in total.I'm using akt 0.3.2.
I also tried converting everything to BCF:
And running
akt kin
again, but this time, I run into a different error. Not sure what this is about.I did ensure that the frequencies VCF and the study VCF are both sorted, but that didn't appear to be the issue. Any idea what could be happening here?
The text was updated successfully, but these errors were encountered: