Skip to content
Brian Haas edited this page Sep 20, 2023 · 24 revisions

TrinityFusion - Fusion and foreign transcript detection via RNA-seq de novo assembly

TrinityFusion leverages chimeric and unmapped reads to assemble fusion transcripts and transcripts of likely foreign origin (microbes and viruses), as a way of facilitating analysis of cancer transcriptomes.

TrinityFusion performs de novo transcriptome assembly from RNA-seq data using Trinity, and uses CTAT-LR-fusion to identify candidate fusion transcripts. An overview of the process is illustrated below:

Note, as of release v0.4.0, CTAT-LR-fusion (which leverages minimap2) replaces the GMAP-fusion module.

TrinityFusion has three execution modes:

  • TrinityFusion-C uses only chimeric reads identified by the STAR aligner for de novo assembly and subsequent fusion detection.

  • TrinityFusion-UC uses both the chimeric reads and reads that do not map to the genome as per the STAR aligner for de novo reconstruction followed by fusion detection.

  • TrinityFusion-D uses all input reads for de novo assembly followed by fusion detection.

TrinityFusion-UC has been found to be most generally useful for both fusion detection and exploring the assembled unmapped reads for potential transcripts of foreign origin, such as tumor viruses and microbes. Note, TrinityFusion-D is included for the sake of completeness, but TrinityFusion-C and TrinityFusion-UC were found far more impactful and in most cases these alternative modes should be used.

Installing TrinityFusion

TrinityFusion can be downloaded from the TrinityFusion Releases site. Simply unpack the code and it's ready to use (no compilation necessary). TrinityFusion does have several software dependencies, however, such as Trinity (see below). It's easiest to use our Docker or Singularity images to hit the ground running.

TrinityFusion Data dependencies

TrinityFusion is part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT), and as such, also leverages the CTAT Genome Library (as also used by STAR-Fusion and FusionInspector. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz). If you need to build a genome lib from the provided source data, see our companion CTAT Genome Lib Builder software and documentation.

If you already have a fully functioning version of STAR-Fusion installed, then you do not need to install any additional data resources. You're almost ready to hit the ground running with TrinityFusion.

Before running TrinityFusion, you must configure your CTAT Genome Lib for use with minimap2. You can index the ref genome and prep resources like so:

% TrinityFusion/CTAT-LR-fusion/ctat-LR-fusion --prep_reference_only -T ladeda --genome_lib_dir /path/to/ctat_genome_lib_build_dir

TrinityFusion software dependencies

If you can use our Docker or Singularity images, they come with everything needed. Otherwise, please ensure the following software are installed for use with TrinityFusion:

  • see [Dockerfile](Dockerfile for the full software stack, including Trinity and STAR and versioning info corresponding to the current release.

Running TrinityFusion

Before running TrinityFusion, you'll first need to run STAR in order to define the chimeric and unmapped reads:

 STAR --genomeDir ${star_index_dir} \                                                                                     
      --readFilesIn ${left_fq_filename} ${right_fq_filename} \                                                                      
      --twopassMode Basic \                                                                                                      
      --outReadsUnmapped None \                                                                                                  
      --chimSegmentMin 12 \                                                                                                    
      --chimJunctionOverhangMin 12 \                                                                                           
      --alignSJDBoverhangMin 10 \                                                                                              
      --alignMatesGapMax 100000 \                                                                                             
      --alignIntronMax 100000 \                                                                                                
      --chimSegmentReadGapMax 3 \                                                                                    
      --alignSJstitchMismatchNmax 5 -1 5 5 \
      --runThreadN ${THREAD_COUNT} \                                                                                                           
      --outSAMstrandField intronMotif \
      --outSAMunmapped Within \
      --outSAMtype BAM Unsorted \
      --outSAMattrRGline ID:GRPundef \
      --chimMultimapScoreRange 10 \
      --chimMultimapNmax 10 \
      --chimNonchimScoreDropMin 10 \
      --peOverlapNbasesMin 12 \
      --peOverlapMMp 0.1 \
      --chimOutJunctionFormat 1 # required as of STAR v2.6.1

After running STAR, you'll have access to the STAR 'Chimeric.out.junction' and 'Alignment.bam' files. These can be used as input to TrinityFusion.

TrinityFusion usage is shown below:

   %    ./TrinityFusion


################################################################
#
#  Required:
#
#  --left_fq <string>    reads_1.fq
#
#  --right_fq <string>   reads_2.fq
#
#  (If just given the reads, runs Trinity de novo assembly first on all reads)
#
#  --output_dir STR_OUT_DIR          output directory
#
# Alternative TrinityFusion modes, using STAR outputs
#
#  --chimeric_junctions <string>  STAR Chimeric.out.junction file
#                        
#  (if given the chimeric junctions file, restricts to the chimeric junction reads alone)
#
#  --aligned_bam <string>         STAR aligned bam file
#
#  (if given the aligned_bam & the chimeric junctions), assembles the unmapped and chimeric reads, not all reads).
#                        
#
# Optional:
#
#  --genome_lib_dir <string>  directory for CTAT genome lib  (or use env var $CTAT_GENOME_LIB
#                                      current setting: (/Users/bhaas/DB/CTAT_GENOME_LIB/GRCh37_v19_CTAT_lib_Feb092018/ctat_genome_lib_build_dir)
#  --CPU <int>                     :number threads (default 4)
#
#  --show_full_usage_info     flag, shows all options available.
#
#  --version                   show TrinityFusion version info: 0.3.0
#
################################################################

An example TrinityFusion command leveraging both the genome aligned bam file and the chimeric junctions file (TrinityFusion-UC mode of execution) would be:

%   TrinityFusion --left_fq reads_1.fq --right_fq reads_2.fq \
       --chimeric_junctions Chimeric.out.junction \
       --aligned_bam Aligned.bam \
       --genome_lib_dir /path/to/ctat_genome_lib_build_dir/

For TrinityFusion-C mode, do not provide the Aligned.bam file. For TrinityFusion-D mode, provide just the fastq files as input.

TrinityFusion output

TrinityFusion will generate a tab-delimited output file: TrinityFusion-*.fusion_predictions.tsv formatted like so:

#FusionName     num_LR  LeftGene        LeftLocalBreakpoint     LeftBreakpoint  RightGene       RightLocalBreakpoint    RightBreakpoint SpliceType      LR_accessions   LR_FFPM JunctionReadCount       SpanningFragCount
       est_J   est_S   LeftGene_SR     RightGene_SR    LargeAnchorSupport      NumCounterFusionLeft    NumCounterFusionRight   FAR_left        FAR_right       LeftBreakDinuc  LeftBreakEntropy        RightBreakDinuc RightBreakEntropy       FFPM    microh_brkpt_dist       num_microh_near_brkpt   annots  max_LR_FFPM     frac_dom_iso    above_frac_dom_iso
THRA--AC090627.1        1.0     THRA    11793   chr17:40086853:+        AC090627.1      21580   chr17:48294347:+        ONLY_REF_SPLICE TRINITY_DN45_c0_g1_i1   107.759 92.0    102.0   92.0    98.63   THRA^ENSG00000126351.11 AC090627.1^ENSG00000235300.4    YES     28.0    12.0    6.72    15.0    GT      1.8892  AG      1.9656  8.8952  3112.0  0.0     ["INTRACHROMOSOMAL[chr17:8.20Mb]"]      107.759 1.0     True
ACACA--STAC2    1.0     ACACA   64929   chr17:37122531:-        STAC2   79395   chr17:39218173:-        ONLY_REF_SPLICE TRINITY_DN2714_c0_g1_i1 107.759 55.0    44.0    55.0    44.0    ACACA^ENSG00000278540.3 STAC2^ENSG00000141750.6 YES     255.0   6.0     0.39    14.29   GT      1.9656  AG      1.9656  4.6195  3457.0  0.0     ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:1.80Mb]"]       107.759 1.0     True
RPS6KB1--SNF8   1.0     RPS6KB1 1280    chr17:59893325:+        SNF8    26129   chr17:48943975:-        ONLY_REF_SPLICE TRINITY_DN378_c0_g3_i1  107.759 37.0    49.0    37.0    47.71   RPS6KB1^ENSG00000108443.12      SNF8^ENSG00000159210.8  YES     115.0   570.0   0.75    0.15    GT      1.3753  AG      1.8323  3.9528  1796.0  0.0     ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:10.95Mb]"]      107.759 1.0
     True
VAPB--IKZF3     1.0     VAPB    1396    chr20:58389517:+        IKZF3   28254   chr17:39777767:-        ONLY_REF_SPLICE TRINITY_DN229_c0_g1_i1  107.759 21.0    40.0    21.0    27.56   VAPB^ENSG00000124164.14 IKZF3^ENSG00000161405.15        YES     399.0   12.0    0.15    4.77    GT      1.9656  AG      1.7819  2.2659  1848.0  0.0     ["Klijn_CellLines","DEEPEST2019","ChimerPub","ChimerSeq","CCLE_StarF2019","INTERCHROMOSOMAL[chr20--chr17]"]     107.759 1.0     True
MED1--STXBP4    1.0     MED1    1249    chr17:39451038:-        STXBP4  44835   chr17:55141310:+        ONLY_REF_SPLICE TRINITY_DN2790_c0_g1_i1 107.759 13.0    15.0    13.0    15.0    MED1^ENSG00000125686.10 STXBP4^ENSG00000166263.12       YES     249.0   11.0    0.12    2.42    GT      1.3996  AG      1.7968  1.3065  1519.0  0.0     ["Klijn_CellLines","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:15.52Mb]"]  107.759 1.0     True
AHCTF1--NAAA    1.0     AHCTF1  1401    chr1:246931578:-        NAAA    50972   chr4:75925811:- ONLY_REF_SPLICE TRINITY_DN4758_c0_g1_i1 107.759 8.0     32.0    8.0     25.58   AHCTF1^ENSG00000153207.13       NAAA^ENSG00000138744.13 YES     27.0    67.0    1.46    0.6     GT      1.7232  AG      1.8062  1.5669  2677.0  0.0     ["CCLE_StarF2019","INTERCHROMOSOMAL[chr1--chr4]"]       107.759 1.0     True
MED1--ACSF2     1.0     MED1    6915    chr17:39439165:-        ACSF2   36449   chr17:50471028:+        ONLY_REF_SPLICE TRINITY_DN422_c0_g1_i1  107.759 10.0    11.0    10.0    11.0    MED1^ENSG00000125686.10 ACSF2^ENSG00000167107.11        YES     277.0   250.0   0.08    0.09    GT      1.9656  AG      1.9656  0.9799  2386.0  0.0     ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:10.97Mb]"]    107.759 1.0     True
STX16--RAE1     1.0     STX16   1835    chr20:58652087:+        RAE1    18185   chr20:57354032:+        ONLY_REF_SPLICE TRINITY_DN89_c0_g1_i1   107.759 7.0     29.0    7.0     14.5    STX16^ENSG00000124222.20        RAE1^ENSG00000101146.11 YES     227.0   506.0   0.16    0.07    GT      1.9899  AG      1.9656  1.0032  2394.0  0.0     ["CCLE_StarF2019","INTRACHROMOSOMAL[chr20:1.27Mb]"]     107.759 1.0     True
STARD3--DOK5    1.0     STARD3  1167    chr17:39637231:+        DOK5    23084   chr20:54643458:+        ONLY_REF_SPLICE TRINITY_DN4629_c0_g1_i1 107.759 7.0     6.0     7.0     6.0     STARD3^ENSG00000131748.14       DOK5^ENSG00000101134.10 YES     547.0   0.0     0.03    14.0    GT      1.8892  AG      1.9656  0.6066  3274.0  0.0     ["CCLE_StarF2019","INTERCHROMOSOMAL[chr17--chr20]"]     107.759 1.0     True
SKA2--MYO19     1.0     SKA2    1139    chr17:59155131:-        MYO19   29653   chr17:36507512:-        ONLY_REF_SPLICE TRINITY_DN129_c0_g1_i2  107.759 5.0     6.0     5.0     2.14    SKA2^ENSG00000182628.11 MYO19^ENSG00000278259.3 YES     172.0   111.0   0.07    0.11    GT      1.9086  AG      1.9086  0.3332  773.0   0.0     ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:22.57Mb]"]      107.759 1.0     True
MED13--BCAS3    1.0     MED13   3547    chr17:62052537:-        BCAS3   94863   chr17:61391977:+        ONLY_REF_SPLICE TRINITY_DN142_c0_g1_i1  107.759 2.0     3.0     2.0     3.0     MED13^ENSG00000108510.8 BCAS3^ENSG00000141376.19        YES     16.0    69.0    0.35    0.09    GT      1.5546  AG      1.9086  0.2333  1338.0  0.0     ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:0.55Mb]"]     107.759 1.0     True
TRIM37--MYO19   1.0     TRIM37  7217    chr17:59084002:-        MYO19   52575   chr17:36507924:-        ONLY_REF_SPLICE TRINITY_DN129_c0_g1_i1  107.759 2.0     3.0     2.0     2.67    TRIM37^ENSG00000108395.12       MYO19^ENSG00000278259.3 NO      107.0   73.0    0.06    0.08    GT      1.7465  AG      1.7819  0.2179  3078.0  0.0     ["Klijn_CellLines","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:22.44Mb]"]  107.759 1.0     True
PIP4K2B--RAD51C 1.0     PIP4K2B 9121    chr17:38777687:-        RAD51C  32535   chr17:58732484:+        ONLY_REF_SPLICE TRINITY_DN279_c0_g1_i2  107.759 2.0     2.0     2.0     0.62    PIP4K2B^ENSG00000276293.3       RAD51C^ENSG00000108384.13       YES     451.0   99.0    0.01    0.05    GT      1.7968  AG      1.9329  0.1222  2067.0  0.0     ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:19.89Mb]"]    107.759 1.0     True
TRPC4AP--MRPL45 1.0     TRPC4AP 2387    chr20:35078046:-        MRPL45  30344   chr17:38322126:+        ONLY_REF_SPLICE TRINITY_DN35_c0_g2_i1   107.759 2.0     2.0     2.0     0.57    TRPC4AP^ENSG00000100991.10      MRPL45^ENSG00000278845.3        YES     368.0   726.0   0.01    0.01    GT      1.6895  AG      1.9086  0.1199  5534.0  0.0     ["Klijn_CellLines","CCLE_StarF2019","INTERCHROMOSOMAL[chr20--chr17]"]   107.759 1.0     True
DIDO1--TTI1     1.0     DIDO1   1157    chr20:62937796:-        TTI1    34646   chr20:38006397:-        ONLY_REF_SPLICE TRINITY_DN4033_c0_g1_i1 107.759 1.0     7.0     1.0     1.4     DIDO1^ENSG00000101191.15        TTI1^ENSG00000101407.11 NO      19.0    28.0    0.45    0.31    GT      1.6402  AG      1.8892  0.112   2916.0  0.0     ["ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr20:24.84Mb]"]        107.759 1.0     True

A preliminary fusion report will also be included. The final fusions are the subset of the preliminary fusions that match perfectly with reference gene exon annotations at the fusion junction breakpoint.

In addition to the fusion report, you will have access to a Trinity.fasta file containing the de novo assembled transcripts. This can be used for further downstream analyses, such as exploring potential foreign transcripts (eg. tumor viruses, microbes, etc.)

Example data

Example data are available for exploring the different execution modes of TrinityFusion.

Referencing TrinityFusion

Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Haas, Brian J.; Dobin, Alexander; Li, Bo; Stransky, Nicolas; Pochet, Nathalie; Regev, Aviv; Genome Biology; 2013 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1842-9

Questions, comments, etc?

Contact us via our google group: https://groups.google.com/forum/#!forum/trinity_ctat_users