Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools 1.20 merge gvcfs results in more than one variant at the same location for complex polyallelic MNP variants #2333

Open
jcm6t opened this issue Dec 9, 2024 · 2 comments

Comments

@jcm6t
Copy link

jcm6t commented Dec 9, 2024

I'm not clear if this is expected behavior or a bug.
bcftools 1.20, file formats are VCF 4.2.

We merge multiple gvcfs, using the following pipeline - this is for our test case of chr14 with a selected complex polyallelic variant SNP + indels

$BCFTOOLS merge --merge none -f PASS,. --gvcf $REFGENOME -i DP:sum,MQ:avg,DQUAL:min \
  --file-list $gvcflistfile -Ou | \
$BCFTOOLS norm -m -any --atom-overlaps . -Oz > test_chr14.vcf.gz

We find multiple entries at the same position with the same pair of alleles after merge. I suspect that this happens at complex loci where there are overlapping indels and snvs but we see multiple simple substitution variants as well as duplicated indels.

We show the first 9 columns only. Look at chr14:53952388 as an example. Notice there are two A/C entries. We note your response to the now closed #2215 and no 'vertical' merging but shouldn't these have merged ?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr14   53952385        .       A       <NON_REF>       .       PASS    .       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT
chr14   53952386        .       C       <NON_REF>       .       PASS    .       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT
chr14   53952387        .       CA      <NON_REF>       39.13   PASS    MQRankSum=4.805;ReadPosRankSum=1.526;FractionInformativeReads=0.941;DP=34;DQUAL=50;MQ=250       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT:AF:F1R2:F
2R1:DGQ:PRI:SB:MB
chr14   53952387        .       CA      C       39.13   PASS    MQRankSum=4.805;ReadPosRankSum=1.526;FractionInformativeReads=0.941;DP=34;DQUAL=50;MQ=250       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT:AF:F1R2:F2R1:DGQ:
PRI:SB:MB
chr14   53952388        .       A       C       56.72   PASS    MQRankSum=4.002;ReadPosRankSum=1.98;FractionInformativeReads=1;DP=203;DQUAL=33.53;MQ=248.644    GT:AD:AF:DP:F1R2:F2R1:GQ:DGQ:PL:SPL:ICNT:PRI:SB:
MB:PS:MIN_DP
chr14   53952388        .       A       <NON_REF>       56.72   PASS    MQRankSum=4.002;ReadPosRankSum=1.98;FractionInformativeReads=1;DP=203;DQUAL=33.53;MQ=248.644    GT:AD:AF:DP:F1R2:F2R1:GQ:DGQ:PL:SPL:ICNT
:PRI:SB:MB:PS:MIN_DP
chr14   53952388        .       A       ACCC    29.78   PASS    FractionInformativeReads=0.964;DP=28;DQUAL=49.7;MQ=250  GT:AD:AF:DP:F1R2:F2R1:GQ:DGQ:PL:SPL:ICNT:PRI:SB:MB:MIN_DP
chr14   53952388        .       A       C       29.78   PASS    FractionInformativeReads=0.964;DP=28;DQUAL=49.7;MQ=250  GT:AD:AF:DP:F1R2:F2R1:GQ:DGQ:PL:SPL:ICNT:PRI:SB:MB:MIN_DP
chr14   53952388        .       A       <NON_REF>       29.78   PASS    FractionInformativeReads=0.964;DP=28;DQUAL=49.7;MQ=250  GT:AD:AF:DP:F1R2:F2R1:GQ:DGQ:PL:SPL:ICNT:PRI:SB:MB:MIN_DP
chr14   53952389        .       C       <NON_REF>       .       PASS    .       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT
chr14   53952390        .       C       <NON_REF>       .       PASS    .       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT
chr14   53952391        .       C       <NON_REF>       .       PASS    .       GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT

We have a small test case we can share separately when you are ready to review.

Thanks.

@jcm6t jcm6t changed the title bcftools 1.20 merge (--merge both) gvcfs results in more than one variant at the same location for complex polyallelic MNP variants bcftools 1.20 merge gvcfs results in more than one variant at the same location for complex polyallelic MNP variants Dec 9, 2024
@pd3
Copy link
Member

pd3 commented Dec 16, 2024

This really depends on the input data, a small test case will be required, thank you.

@jcm6t
Copy link
Author

jcm6t commented Dec 19, 2024

Petr,
I hope you received the test case via offline email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants