Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for breakend variants from VCF #1399

Merged

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Apr 18, 2023

ENSVAR-2086, fixes #437, #441 and #782

Logic

  1. When creating structural variants from VCF, parse breakend variants in VEP
    • Variants are considered breakend if they have SVTYPE=BND in INFO field or <BND> in ALT field, for instance:
chr1    234919885  MantaBND:1445:0:1:1:3:0:1     A  [chr1:17124942[A
chr22    17577704  ref_panel_1kg_v1_BND_chr22_6  N  <BND>             487 BOTHSIDES_SUPPORT;UNRESOLVED    ALGORITHMS=manta;CHR2=chr22;END=17577704;END2=20098034;EVIDENCE=PE,SR;PREDICTED_INTRONIC=DGCR8,SLC25A18;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=++;SVLEN=2520330;SVTYPE=BND;UNRESOLVED_TYPE=INVERSION_SINGLE_ENDER_++
  1. For each alternative allele (e.g., A[chr22:22893780[ in ALT field or CHR2=chr22;END2=20098034 in INFO field), create a hashref with chromosome, position, placement (before/after) and if mate is inverted
    • This PR also supports multiple comma-separated alternative alleles in ALT field, such as A[chr22:22893780[,CT[chrX:10932343[
  2. As usual, VEP creates an input buffer with a certain number of variants loaded into memory. For all variants in the input buffer:
  3. Get features (transcript, regulatory regions, etc.) that overlap all variants currently in the input buffer (including features that overlap alternative alleles from breakend mates)
  4. Determine consequences using ensembl-variation#984

Technical changes

  • get_source_chr_name() now works when $self->valid_chromosomes is hashref (instead of arrayref only)
  • min_max changed from arrayref $min_max = [$min, $max] to hashref $min_max->{chr} = [$min, $max]
    • Allows to store min and max coordinates per chromosome
    • filter_features_by_min_max() updated to take into account the chromosome
  • Allow to store alternative allele and return it in the output
    • BND variants are the only SVs to display the original alternative allele
  • Fix VCF format detection to recognise alternative alleles such as A[chr22:22893780[ (fixes ERROR: Can't detect input format  #782)

Potential improvements missing

  • Unit tests Added
  • Support shifting of alternative allele positions
  • Consider alternative allele sequence in ALT field; example: 11 94987872 MantaBND:0:0:1:0:0:0:0 T ACTCCT[8:107653411[
  • Consider mate placement and inversion when calculating consequences (relevant?)
  • Return transcript/regulatory fusion consequences (this would maybe require to check the consequences for pairs of transcripts?)
  • Report which breakend is associated with each transcript consequence Done in Report consequences for each breakend (e112) #1496
  • Support contigs (and more complex breakend arrangements?)
  • Benchmark speed/memory?
  • Some code could/should be moved to ensembl-variation Done in Report consequences for each breakend (e112) #1496

Testing

Run VEP with different examples and check if the consequences make sense. You can use this VCF to test the changes:

#CHROM POS       ID           REF ALT    QUAL    FILTER INFO                              FORMAT
1       251748          gnomAD_v2_BND_1_4               N       <BND>                   432     UNRESOLVED            END=258100;SVTYPE=BND
chr1    17124941        MantaBND:1445:0:1:1:3:0:0       T       [chr1:234919886[T       999     PASS    SVTYPE=BND;MATEID=MantaBND:1445:0:1:1:3:0:1;CIPOS=0,1;HOMLEN=1;HOMSEQ=T;INV5;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=254;BND_DEPTH=107;MATE_BND_DEPTH=100 GT:FT:GQ:PL:PR:SR       0/1:PASS:999:999,0,999:65,8:15,51
chr1    17124948        MantaBND:1445:0:1:0:0:0:0       T       T]chr1:234919824]       999     PASS    SVTYPE=BND;MATEID=MantaBND:1445:0:1:0:0:0:1;INV3;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=109;MATE_BND_DEPTH=83      GT:FT:GQ:PL:PR:SR       0/1:PASS:999:999,0,999:60,2:0,46
chr1    234919824       MantaBND:1445:0:1:0:0:0:1       G       G]chr1:17124948]        999     PASS    SVTYPE=BND;MATEID=MantaBND:1445:0:1:0:0:0:0;INV3;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=83;MATE_BND_DEPTH=109      GT:FT:GQ:PL:PR:SR       0/1:PASS:999:999,0,999:60,2:0,46
chr1    234919885       MantaBND:1445:0:1:1:3:0:1       A       [chr1:17124942[A        999     PASS    SVTYPE=BND;MATEID=MantaBND:1445:0:1:1:3:0:0;CIPOS=0,1;HOMLEN=1;HOMSEQ=A;INV5;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=254;BND_DEPTH=100;MATE_BND_DEPTH=107 GT:FT:GQ:PL:PR:SR       0/1:PASS:999:999,0,999:65,8:15,51
chr1   3412098   MantaBND:2:0:1:0:0:0:1     A       A]chr13:79916623]      .      PASS    SVTYPE=BND;MATEID=MantaBND:2:0:1:0:0:0:0;IMPRECISE;CIPOS=-66,67;BND_PAIR_COUNT=2;PAIR_COUNT=2
chr1   3412098   MantaBND:2:0:1:0:0:0:1     A       A]chr13:79916621]      .      PASS    SVTYPE=BND;MATEID=MantaBND:2:0:1:0:0:0:0;IMPRECISE;CIPOS=-66,67;BND_PAIR_COUNT=2;PAIR_COUNT=2
1      37938377  normal_1     T   C              .      PASS                              .
1      37938377  BND_has_BRCA   T   C[17:43044184[ .      PASS                              SVTYPE=BND;CHR2=17
1      37938377  BND_nos_BRCA   T   ]17:43044184]C .      PASS                              SVTYPE=BND;CHR2=17
11      94975747        MantaBND:0:2:3:0:0:0:1  G       G]8:107653520]  .       PASS    SVTYPE=BND;MATEID=MantaBND:0:2:3:0:0:0:0;CIPOS=0,2;HOMLEN=2;HOMSEQ=TT;BND_DEPTH=216;MATE_BND_DEPTH=735  PR:SR   722,9:463,15
11      94975753        MantaDEL:0:1:2:0:0:0    T       <DEL>   .       PASS    END=94987865;SVTYPE=DEL;SVLEN=12112;IMPRECISE;CIPOS=-156,156;CIEND=-150,150        PR      161,13
11      94987872        MantaBND:0:0:1:0:0:0:0  T       T[8:107653411[  .       PASS    SVTYPE=BND;MATEID=MantaBND:0:0:1:0:0:0:1;BND_DEPTH=171;MATE_BND_DEPTH=830       PR:SR   489,4:520,19
17     43044184  BND_noBRCA1   T   C[2:7938377[  .      PASS                              SVTYPE=BND
17     43044184  BND_BRCA1   T   ]2:7938377]C  .      PASS                              SVTYPE=BND
17     43045682  normal_17    T   C              .      PASS                              .
17     43045682  BND_diff17   T   C[1:37938377[  .      PASS                              SVTYPE=BND;CHR2=1
chr22   10717890    ref_panel_1kg_v1_BND_chr22_1    N   <BND>   999 BOTHSIDES_SUPPORT;PESR_GT_OVERDISPERSION;UNRESOLVED ALGORITHMS=wham;CHR2=chr22;END=10717890;EVIDENCE=PE,SR;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=OR11H1;PREDICTED_NONCODING_BREAKPOINT=DNase;STRANDS=-+;SVLEN=5170;SVTYPE=BND;UNRESOLVED_TYPE=MIXED_BREAKENDS
chr22   17577704    ref_panel_1kg_v1_BND_chr22_6    N   <BND>   487 BOTHSIDES_SUPPORT;UNRESOLVED    ALGORITHMS=manta;CHR2=chr22;END=17577704;END2=20098034;EVIDENCE=PE,SR;PREDICTED_INTRONIC=DGCR8,SLC25A18;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=++;SVLEN=2520330;SVTYPE=BND;UNRESOLVED_TYPE=INVERSION_SINGLE_ENDER_++
chr22   17636024    ref_panel_1kg_v1_BND_chr22_7    N   <BND>   666 HIGH_SR_BACKGROUND;UNRESOLVED   ALGORITHMS=manta;CHR2=chr22;END=17636024;EVIDENCE=SR;PREDICTED_INTRONIC=BCL2L13;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVLEN=10709;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-
chr22   22120897    ref_panel_1kg_v1_BND_chr22_14   N   <BND>   447 UNRESOLVED  ALGORITHMS=manta;CHR2=chrX;END=22120897;END2=126356858;EVIDENCE=PE;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=DCAF12L2,VPREB1;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=++;SVLEN=-1;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_++
chr22   22636515    ref_panel_1kg_v1_BND_chr22_27   N   <BND>   302 UNRESOLVED  ALGORITHMS=manta;CHR2=chr22;END=22636515;EVIDENCE=PE;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;PREDICTED_UTR=BCR;STRANDS=-+;SVLEN=679426;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_-+
chr22   22857058    ref_panel_1kg_v1_BND_chr22_33   N   <BND>   710 BOTHSIDES_SUPPORT;UNRESOLVED    ALGORITHMS=manta;CHR2=chr22;END=22857058;EVIDENCE=PE,SR;PREDICTED_BREAKEND_EXONIC=IGLL5;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVLEN=36722;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-
chr22   22857058    ref_panel_1kg_v1_BND_chr22_33_M1    A   A[chr22:22893780[   710 BOTHSIDES_SUPPORT;UNRESOLVED    ALGORITHMS=manta;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_BND_chr22_33_M2;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=IGLL5;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-
chr22   22893780    ref_panel_1kg_v1_BND_chr22_33_M2    G   ]chr22:22857058]G   710 BOTHSIDES_SUPPORT;UNRESOLVED    ALGORITHMS=manta;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_BND_chr22_33_M1;PREDICTED_BREAKEND_EXONIC=IGLL5;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=+-;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-

Some of the BND examples were retrieved from:

@nuno-agostinho nuno-agostinho changed the title Add support to breakend variants in VCF Add support for breakend variants in VCF Apr 18, 2023
@nuno-agostinho nuno-agostinho changed the title Add support for breakend variants in VCF Add support for breakend variants from VCF Apr 18, 2023
Copy link
Contributor

@likhitha-surapaneni likhitha-surapaneni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nuno-agostinho , thank you for this. Tested parsing of alt allele with different examples and seems to be working as expected. Test cases look good. In this iteration, do we also plan to support alt alleles like C[<ctg1>: 1[?

@nuno-agostinho
Copy link
Contributor Author

Thanks for testing @likhitha-surapaneni! I would say that support for contigs is not as urgent, but it is something we may want to think about in the future.

@jamie-m-a jamie-m-a self-assigned this Jun 13, 2023
@jamie-m-a jamie-m-a self-requested a review June 13, 2023 13:11
Copy link
Contributor

@jamie-m-a jamie-m-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with logic @nuno-agostinho , thanks for testing examples @likhitha-surapaneni

@nuno-agostinho nuno-agostinho requested review from sarahhunt and removed request for dglemos June 21, 2023 17:07
@nuno-agostinho nuno-agostinho changed the base branch from postreleasefix/110 to postreleasefix/111 June 26, 2023 13:39
@nuno-agostinho
Copy link
Contributor Author

To backport for release/110.

Copy link
Contributor

@likhitha-surapaneni likhitha-surapaneni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @nuno-agostinho

@nuno-agostinho nuno-agostinho merged commit 6426e75 into Ensembl:postreleasefix/111 Jun 28, 2023
@nuno-agostinho nuno-agostinho deleted the add/bnd-support branch June 28, 2023 12:44
@nuno-agostinho
Copy link
Contributor Author

  • Merged to postreleasefix/111, release/111, main.
  • Backported to postreleasefix/110 and release/110.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants