Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSQ sample annotation error downstream of stop codon #1578

Closed
katiesevans opened this issue Sep 14, 2021 · 1 comment
Closed

CSQ sample annotation error downstream of stop codon #1578

katiesevans opened this issue Sep 14, 2021 · 1 comment
Labels

Comments

@katiesevans
Copy link

After annotating my VCF with CSQ (bcftools 1.12), I noticed a variant (I:4464670) that was annotated as *missense for sample JU2106 but I couldn't find a previous stop codon in this sample. It seems that a stop codon (I:4463998) in another sample (ECA2091) is causing this mis-annotation. If I remove ECA2091 from the VCF, JU2106 is now labeled missense at this position. Similarly, if I leave all samples in but remove the stop codon in ECA2091, JU2106 is again labeled properly as missense. See code below for a reproducible example.

#### download vcf, ref, and gff
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/eri6.vcf.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/chrI.fa.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/csq.gff3.gz

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6.vcf.gz > test.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou test.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > test.bcsq.samples.tsv


# look at annotation
cat test.bcsq.samples.tsv | grep 4464670 # missense variant
cat test.bcsq.samples.tsv | grep 4463998 # stop

#################################################
# remove ECA2091 with stop codon
# make vcf
bcftools view -s JU2106,CB4856,MY10 eri6.vcf.gz > eri6_noECA2091.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_noECA2091.vcf > noECA2091.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou noECA2091.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > noECA2091.bcsq.samples.tsv


# look at annotation
cat noECA2091.bcsq.samples.tsv | grep 4464670 # missense variant
cat noECA2091.bcsq.samples.tsv | grep 4463998 # stop

###########################################################
# remove stop variant but keep ECA2091
# make vcf
bcftools view -t ^I:4463998 eri6.vcf.gz > eri6_nostop.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_nostop.vcf > nostop.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou nostop.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > nostop.bcsq.samples.tsv


# look at annotation
cat nostop.bcsq.samples.tsv | grep 4464670 # missense variant
cat nostop.bcsq.samples.tsv | grep 4463998 # stop
@pd3 pd3 added the bug label Sep 17, 2021
pd3 added a commit that referenced this issue Sep 17, 2021
The upstream stop could be falsely assigned to all samples in
a multi-sample VCF even if the stop was relevant for a single sample
only.

Fixs #1578
@pd3
Copy link
Member

pd3 commented Sep 17, 2021

This is now fixed. Thank you for the bug report and the test case!

@pd3 pd3 closed this as completed Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants