-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent ancestry predictions between Somalier and Peddy #103
Comments
Hi, I have known there are some issues with the somalier ancestry setup. You can trust the peddy ones (as you note) much more. |
Thanks Brent! Unfortunately, reducing the PCAs to 4 still resulted in most samples being called as AMR. I will note that I did not input any of the known ancestries for our samples when running Somalier--I can try this to see if the samples with missing ancestry will be imputed better, although Peddy is already giving results more in-line with our expectations, so I may just stick with those results. Thanks again for your help! |
Is your data from sequencing? Exome? WGS? Or from a chip? |
This is array data--I filtered for some basic QC steps (i.e. genotype call rate) before running Somalier. However, I obtained similar results when running the software on SNP array data imputed from TOPMED as well as WES and WGS-based datasets. |
Hi, I am experiencing the same issues. I am using the 1KGP dataset as the reference, and somalier labels most of my samples as AMR. |
Hello,
I am currently using your Somalier software to check for relatedness and uniformly calculate ancestry PCAs across several cohorts for a meta-analysis. Our cohorts are mostly individuals with known European ancestry; however, Somalier's ancestry function calls most samples as AMR super-group based on the 1K Genomes dataset.
Other members of my lab have previously used your Peddy software for the same calculations, so I went back and checked the results with that software using the same underlying dataset (only change was to remove "chr" from the VCF file). Here, the results look as expected, with most samples labeled as the EUR super-group.
For reference, here is the code I used for each software:
./somalier extract -d AMPAD_affy_preimpute/ --sites sites.hg38.vcf.gz -f Homo_sapiens_assembly38.fasta ROSMAP_affy_preimpute_hg38.vcf.gz
./somalier ancestry --labels ancestry-labels-1kg.tsv --n-pcs=10 -o AMPAD_affy_preimpute 1kg-somalier/*.somalier ++ AMPAD_affy_preimpute/*.somalier
python -m peddy --sites hg38 --plot --prefix AMPAD_affy_preimpute ROSMAP_affy_preimpute_hg38_nochr.vcf.gz ROSMAP_affy_genotypes_hg38_final.fam
I was wondering if you have come across this issue before, or would have any insights into the different results? (I can send over an example VCF if you would like to trouble-shoot; it will be a different cohort than the plots above, as those are restricted data.) Thanks!
The text was updated successfully, but these errors were encountered: