Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools +setGT does not consider major allele as new genotype #1800

Closed
twaddlac opened this issue Oct 7, 2022 · 3 comments
Closed

bcftools +setGT does not consider major allele as new genotype #1800

twaddlac opened this issue Oct 7, 2022 · 3 comments

Comments

@twaddlac
Copy link

twaddlac commented Oct 7, 2022

I am trying to manually set a genotype based on Variant Allele Frequency (VAF) cutoffs.

Given the following VCF file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1.sorted.bam
sample1       1       .       G       .       .       PASS    END=1566;MinDP=157      GT:DP   0/0:157
sample1       1567    .       C       A       13.9188 PASS    DP=363;VDB=0.885403;SGB=-0.693147;RPBZ=1.34407;MQBZ=0;MQSBZ=0;BQBZ=-0.875012;NMBZ=-0.631166;SCBZ=-0.445653;FS=0;MQ0F=0;AC=1;AN=2;DP4=154,133,25,32;MQ=60        GT:PL:DP:AD:VAF 0/0:49,0,255:344:287,57:0.165698
sample1       1568    .       C       .       .       PASS    END=1629;MinDP=320      GT:DP   0/0:320
sample1       1630    .       C       A       220.56  PASS    DP=496;VDB=0.0744208;SGB=-0.693147;RPBZ=-2.89068;MQBZ=0;MQSBZ=0;BQBZ=0.765252;NMBZ=0.891827;SCBZ=0.513809;FS=0;MQ0F=0;AC=1;AN=2;DP4=20,79,138,237;MQ=60 GT:PL:DP:AD:VAF 0/0:255,0,37:474:99,375:0.791139
sample1       1631    .       A       .       .       PASS    END=6536;MinDP=44       GT:DP   0/0:44

when I run:
bcftools +setGT sample1.flt.norm.vcf.gz -- --target-gt q --new-gt M --include "VAF>0.6"

I would expect to see the genotype at position 1630 to be 1/1 instead of 0/0 since REF=C, ALT=A, and VAF=0.791139.

Does this make sense or is there something I'm missing?

Thank you!

@pd3 pd3 closed this as completed in 87bf159 Oct 11, 2022
@pd3
Copy link
Member

pd3 commented Oct 11, 2022

This is because the -n M refer to something else. Here the major allele means the allele observed more frequently in the population, not among the reads in that specific sample. I updated the usage page to make this clear. Also added a new --new-gt X option which allows to do what you want, filling in the allele with the bigger read depth as determined from FORMAT/AD.

@JosephLalli
Copy link

Hi @pd3,

+setGT does not allow the use of X in custom genotypes, eg --new-gt c:'0/M' is acceptable, but c:'0/X' causes the following error:

Could not parse the genotype: c:0/X

Could this capability be added?

@pd3
Copy link
Member

pd3 commented Dec 30, 2023

@JosephLalli This is now available, see #2065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants