Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when converting from plink to eigenstrat using convertf #86

Open
jaurbanChicago opened this issue Mar 7, 2023 · 4 comments
Open

Comments

@jaurbanChicago
Copy link

Hello,

I have a vcf file and a plink file with 5,758,371 positions. The vcf was converted to plink using the plink software, and then, I used convertf to convert the plink (bed) into an eigenstrat file. After converting to eigenstrat, my eigenstrat file has 5,757,704 positions. Apparently, the convertf run had no issues since the .err file came out empty. Why would I be losing 667 positions? Thanks in advance

Best,
Jose Antonio

@jaurbanChicago jaurbanChicago changed the title Issue when converting from plint to eigenstrat using convertf Issue when converting from plink to eigenstrat using convertf Mar 7, 2023
@bumblenick
Copy link

bumblenick commented Mar 7, 2023 via email

@jaurbanChicago
Copy link
Author

parameter file: plink2eigen_WGS_SriLankans.par
genotypename: /scratch/jaurban/SriLanka_HighCov_Genomes/vcf/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.bed
snpname: /scratch/jaurban/SriLanka_HighCov_Genomes/vcf/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.bim
indivname: /scratch/jaurban/SriLanka_HighCov_Genomes/vcf/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.fam
outputformat: EIGENSTRAT
genotypeoutname: /scratch/jaurban/SriLanka_HighCov_Genomes/eigenstrat/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.geno
snpoutname: /scratch/jaurban/SriLanka_HighCov_Genomes/eigenstrat/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.snp
indivoutname: /scratch/jaurban/SriLanka_HighCov_Genomes/eigenstrat/Panel_dbWGS_SriLankaModAnc.SNPbiallelic.rel.maf5p.maxSNPmiss10p.ind
familynames: NO
## convertf version: 5000
genetic distance set from physical distance
genotype file processed
numvalidind:   4884  maxmiss: 4884001
eigenstrat output
##end of convertf run


------------ Job WrapUp ------------

Job ID:            37825838.cri16sc001
User ID:           jaurban
Job Name:          plink2eigen_WGS_SriLankans.pbs
Queue Name:        mid
Working Directory: /scratch/jaurban/SriLanka_HighCov_Genomes/vcf
Resource List:     walltime=39:59:00,nodes=2:ppn=15,mem=32gb,neednodes=2:ppn=15
Resources Used:    cput=02:07:54,vmem=11390228kb,walltime=02:08:01,mem=10013956kb,energy_used=0
Exit Code:         0
Mother Superior:   cri16cn220

Execution Nodes: 
cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn220 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219 cri16cn219

This is what the log file says!

Best,
JA

@bumblenick
Copy link

bumblenick commented Mar 7, 2023 via email

@janxkoci
Copy link

janxkoci commented May 7, 2024

Another possibility are multiallelic SNPs - both Plink v1 and eigenstrat formats don't support those, while VCF has no problem with such positions.

Does your Plink file already show the reduction in number of sites? You can check with e.g. wc -l in.bim. On the VCF side, you can use something like bcftools view -H -m3 in.vcf | wc -l or even just awk '/#/ {next} $5 ~ /,/ {counter++} END {print counter}' in.vcf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants