-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to subset .vcf.gz file to include only variants whose genomic coordinates are given in a list #2332
Comments
The command looks correct. This is a very basic functionality, so it's strange it wouldn't work. Can you try to upgrade to the latest version of bcftools, we are at 1.21 now. If there is something wrong with the input data, the newer version might give some informative error messages. The -T option does not require an index, so it's unlikely that it is the problem. If upgrading does not help, can you provide a small test case for us to reproduce the problem? |
Hi, Thanks for the fast reply. I downloaded and installed version 1.21 but now I get an error message saying 'Could not parse 2-th line of file snplist.txt, using the columns 1,2[,3] Failed to read the targets: snplist.txt' Here is a head of snplist.txt: Head of hbcs_sisu_b38.vcf.gz would be quite massive so I copy-pasted here only seven first columns of the output when I run #CHROM POS ID REF ALT QUAL FILTER |
Let me know if you need more information to be able to reproduce the problem! |
Is your file tab-delimited as described in the documentation? http://samtools.github.io/bcftools/bcftools.html#common_options
|
Yes, I think it is tab-delimited. I'm not sure which way would be the best way to verify this, but if I run it prints And I get the same column names by running My snplist.txt is also tab-delimited. |
Hi,
I would like to create a subset of a large .vcf.gz file so that I would be able to read it in R with
read.vcfR
from thevcfR
package (I get memory issues if I try to read the non-subsetted .vcf.gz file). I only need certain variants given in a list. What I have tried:~/bcftools-1.12/bcftools view -T snplist.txt hbcs_sisu_b38.vcf.gz -o hbcs_sisu_b38_subset.vcf.gz
The 'snplist.txt' is tab-delimited and includes columns '#CHROM' and 'POS' (not sure if they were required).
I have also tried option '-R' instead of '-T' for the 'view' command, and command 'filter' instead of 'view' with both options '-T' and '-R'. But depending on which variants are included in snplist.txt, in the subsetted there is always either just one variant or no variants at all, even though
less -S hbcs_sisu_b38.vcf.gz | grep -f snplist.txt
prints lines for more variants.
I am not sure if .csi file was required here, but I have created hbcs_sisu_b38.vcf.gz.csi like this:
~/bcftools-1.12/bcftools index hbcs_sisu_b38.vcf.gz
The text was updated successfully, but these errors were encountered: