Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HaplotypeCaller emitting incorrect phasing when genotyping hom-het-het #6463

Closed
1 of 2 tasks
tfenne opened this issue Feb 21, 2020 · 3 comments
Closed
1 of 2 tasks

HaplotypeCaller emitting incorrect phasing when genotyping hom-het-het #6463

tfenne opened this issue Feb 21, 2020 · 3 comments

Comments

@tfenne
Copy link
Contributor

tfenne commented Feb 21, 2020

Bug Report

Affected tool(s) or class(es)

HaplotypeCaller when emitting physical phasing.

Affected version(s)

  • Latest public release version [4.1.4.1]
  • Latest master branch as of [n/a]

Description

When there are three SNPs in close proximity with the first having a homozygous-alt genotype and the other two being hets that are in trans, the GATK incorrectly outputs genotypes and phasing indicating they are in cis. I haven't tested more broadly (e.g. with > 3 variants or with indels etc.) but my suspicion is that it is to do with the first variant in the phase set being homozygous.

This was seen happening on real data from a real sample, but I have also been able to reproduce this with synthetic test data that I can attach here.

Steps to reproduce

I've attached phasing.zip to this issue. It contains a BAM file of synthetic data where I've introduced two variant haplotypes at 50 locations each separated by about 1000 bases. My goal in doing this was just to have a number of different sequence contexts and variant alleles in case that affected anything. It also contains the resulting VCF from running this GATK command using 4.1.4.1:

gatk HaplotypeCaller -I phasing.bam -O phasing.g.vcf -ERC GVCF \
    -R hg19.fasta -L chr2:179390700-179672150

While the BAM clearly shows the two hets as in trans with one another:
hom_with_in_trans_hets

The resulting variant calls are given as in-cis:

chr2  179393825  .  C  A,<NON_REF>  2686.03  .  DP=60;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=216000,60                                                            GT:AD:DP:GQ:PGT:PID:PL:PS:SB  1|1:0,60,0:60:99:0|1:179393825_C_A:2700,181,0,2700,181,2700:179393825:0,0,60,0
chr2  179393826  .  T  <NON_REF>    .        .  END=179393826                                                                                                                     GT:DP:GQ:MIN_DP:PL            0/0:60:99:60:0,120,1800
chr2  179393827  .  T  G,<NON_REF>  1386.60  .  BaseQRankSum=0.000;DP=60;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=216000,60;ReadPosRankSum=0.157   GT:AD:DP:GQ:PGT:PID:PL:PS:SB  0|1:25,35,0:60:99:0|1:179393825_C_A:1394,0,944,1470,1050,2519:179393825:25,0,35,0
chr2  179393828  .  A  <NON_REF>    .        .  END=179393828                                                                                                                     GT:DP:GQ:MIN_DP:PL            0/0:60:99:60:0,120,1800
chr2  179393829  .  A  C,<NON_REF>  936.60   .  BaseQRankSum=0.000;DP=60;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=216000,60;ReadPosRankSum=-0.217  GT:AD:DP:GQ:PGT:PID:PL:PS:SB  0|1:35,25,0:60:99:0|1:179393825_C_A:944,0,1394,1050,1470,2519:179393825:35,0,25,0

Expected behavior

A phase set of three variants should be emitted that shows the two het SNPs in-trans (i.e. the ref allele for one in phase with the alt allele for the other).

Actual behavior

A phase set of three variants is emitted with the two het SNPs in-cis (i.e. alt alleles in phase).

@droazen droazen added this to the GATK-Priority-Backlog milestone Feb 21, 2020
@droazen droazen removed this from the GATK-Priority-Backlog milestone Jun 22, 2020
@droazen
Copy link
Contributor

droazen commented Nov 9, 2020

@cwhelan Do your existing phasing PRs have any bearing on this ticket?

@droazen droazen assigned cwhelan and unassigned jamesemery Nov 9, 2020
@cwhelan
Copy link
Member

cwhelan commented Dec 23, 2020

This doesn't seem really related to the other stuff I've been doing with the phasing code, but I've been taking a look since I've been working in that area. After some testing I can confirm that this does seem to be an error in the phasing algorithm logic that occurs when the first variant in the set of called variants is homozygous alt, as @tfenne suggests. I'll try to come up with a fix and either package it with my fix to #6845 or in a separate PR.

@cwhelan
Copy link
Member

cwhelan commented Jan 26, 2021

This should now be fixed via #7019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants