SVannotator was developed to alleviate the cumbersome annotation process of structural variants (SV). It uses outputs from SV callers to transform SV events into a human-interpretable format such as exon skipping, tandem duplication, and disruption as well as fusion.
- python 3.7
- GTF file, which should be compatible with your SV file.
- COSMIC census gene file or your choice of cancer genes in which genes are listed in the first column in a tab separated text file.
- cytogenetic band file
Depending on python in your system, select appropriate byte code compiled python files.
You need GTF file, census gene file, and cytogenetic band files of your choice of genome assembly version to build an SVAnnotator database.
python3 SVAnnotator_create_DB_1.0.0.cpython-37.pyc <GTF file> <census gene file> <cytogenetic file>
should generate SVAnnotator.db
python3 SVAnnotator_1.0.0.cpython-37.pyc SVAnnotator.db <input file> <output file>
SVAnnotator uses tab separated text files with coordinates, strand orientations, and types of event as inputs. Headers must start with ‘#’ if any. Most SV callers provide with ranges rather than chromosomal position for break points; however, SVAnnotator cannot handle the range and users need to pick a position. Also, it is important to remove false positives which originate from alignment issues in low complexity regions and other reasons.
- Breakpoint 1 chromosome
- Breakpoint 1 position
- Breakpoint 2 chromosome
- Breakpoint 2 position
- Strand 1 orientation: + or -
- Strand 2 orientation: + or -
- SV type must be one of DEL, DUP, INV, or TRA
SVAnnotator analysis provides information on breakpoints with coordinates and cytogenetic band in an eXtensible Markup Language (XML) file. If a breakpoint falls in a transcript, information about whether it lies in an exonic or intronic region is shown. For fusions, 5 prime and 3 prime genes are identified. Likewise, for dysfunctional fusions, genes are displayed without order assignment. For exon skipping events, internal tandem duplications (ITDs), deletions, and deletion-insertion events, the information for each transcript is provided. More details are found in the XML schema in SVAnnotator.xsd.
A test input file and corresponding output xml are in the directory.
Takahiko Koyama tkoyama@us.ibm.com
Please, refer to the license in the directory.
Currently, a paper is under submission. Once it gets published. This section will be updated.