To keep up with the structural variant field.
For each category, papers are sorted by year of publication and provided in the following format:
Title. Author Year, Journal (link_to_paper).
Tools include both the software/GitHub link and the original paper, as follows:
Software (link_to_software): Description. Author Year, Journal (link_to_paper).
1. Introduction to structural variation
2. Detecting SVs in whole genome data
- 2.1 Genome quality checks
- 2.2 Stuctural variant calling methods
- 2.3 Filtering false calls
- 2.4 Standardizing output
3. Identifying medically relevant SVs
- 3.1 Population frequencies
- 3.2 Gene annotation
- 3.3 Impact prediction
- 3.4 Visualization
- 3.5 Crowd curation
- A decade of structural variants: description, history and methods to detect structural variation. Escaramís et al 2015, Brief Funct Genomics.
- The functional impact of structural variation in humans. Hurles et al 2008, Trends Genet.
- How complete are ‘complete’ genome assemblies? An avian perspective. Peona et al 2018, Mol. Ecol. Resour.
- The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Jain et al 2016, Genome Biology.
- Resolving the complexity of the human genome using single-molecule sequencing. Chaisson et al 2015, Nature.
- Performance comparison of whole-genome sequencing platforms. Lam et al 2011, Nature Biotech.
The following tools were collated from recent reviews, e.g.:
- Recent advances in the detection of repeat expansions with short-read next-generation sequencing. Bahlo et al 2018, F1000Research.
- Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Tattini et al 2015, Front Bioeng Biotechnol.
- Transposable element detection from whole genome sequence data. Ewing et al 2015, Mobile DNA.
- Socrates: Identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Schroder et al 2014, Bioinformatics.
- MultiBreak-SV: Characterization of structural variants with single molecule and hybrid sequencing approaches. Ritz et al 2014, Bioinformatics.
- Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone. Trappe et al 2014, Bioinformatics.
- SVseq2: An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. Zhang et al 2012, BMC Bioinformatics.
- Splitread: Detection of structural variants and indels within exome data. Karakoc et al 2012, Nature Methods.
- The CLEVER Toolkit: Clique-enumerating variant finder. Marschall et al 2012, Bioinformatics.
- VariationHunter: Simultaneous structural variation discovery among multiple paired-end sequenced genomes. Hormozdiari et al 2011, Genome Research.
- BreakDancer: An algorithm for high-resolution mapping of genomic structural variation. Chen et al 2009, Nature Methods.
- PEMer: A computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Korbel et al 2009, Genome Biology.
- MoDIL: Detecting small indels from clone-end sequencing with mixtures of distributions. Lee et al 2009, Nature Methods.
- CODEX: A normalization and copy number variation detection method for whole exome sequencing. Jiang et al 2015, Nucleic Acids Research.
- ONCOCNV: Multi-factor data normalization enables the detection of copy number aberrations in amplicon sequencing data. Boeva et al 2014, Bioinformatics.
- EXCAVATOR: Detecting copy number variants from whole-exome sequencing data. Magi et al 2013, Genome Biology.
- CoNIFER: Copy number variation detection and genotyping from exome sequence data. Krumm et al 2012, Genome Research.
- XHMM: Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Fromer et al 2012, Am. J. Hum. Genet.
- CNAnorm: Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Gusnanto et al 2012, Bioinformatics.
- cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Klambauer et al 2012, Nucleic Acids Research.
- CNVnator: An approach to discover, genotype, and characterize typical and atypical cnvs from family and population genome sequencing. Abyzov et al 2011, Genome Research.
- ExomeCNV: Exome sequencing-based copy-number variation and loss of heterozygosity detection. Sathirapongsasuti et al 2011, Bioinformatics.
- JointSLM: Detecting common copy number variants in high-throughput sequencing data by using jointslm algorithm. Magi et al 2011, Nucleic Acids Res.
- ReadDepth: A parallel r package for detecting copy number alterations from short sequencing reads. Miller et al 2011, PLoS ONE.
- BIC-seq: Copy number variation detection in whole-genome sequencing data using the bayesian information criterion. Xi et al 2011, Proc. Natl. Acad. Sci..
- CNAseg: A novel framework for identification of copy number changes in cancer from second-generation sequencing data. Ivakhno et al 2010, Bioinformatics.
- SegSeq: High-resolution mapping of copy-number alterations with massively parallel sequencing. Chiang et al 2009, Nature Methods.
- CNV-seq: A new method to detect copy number variation using high-throughput sequencing. Xie et al 2009, BMC Bioinformatics.
- RDXplorer: Sensitive and accurate detection of copy number variants using read depth of coverage. Yoon et al 2009, Genome Research.
- Fast-SG: An alignment-free algorithm for hybrid assembly. Di Genova et al 2018, Gigascience.
- IDP-denovo: De novo transcriptome assembly and isoform annotation by hybrid sequencing. Fu et al 2018, Bioinformatics.
- TIGRA: A targeted iterative graph routing assembler for breakpoint assembly. Chen et al 2014, Genome Research.
- Cortex: De novo assembly and genotyping of variants using colored de bruijn graphs. Iqbal et al 2012, Nature Genetics.
- Magnolya: De novo detection of copy number variation by co-assembly. Nijkamp et al 2012, Bioinformatics.
- LUMPY: A probabilistic framework for structural variant discovery. Layer et al 2014, Genome Biology.
- SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations. Hart et al 2013, PLoS ONE.
- MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Marschall et al 2013, Bioinformatics.
- DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Rausch et al 2012, Bioinformatics.
- PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants. Jiang et al 2012, Bioinformatics.
- Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Ye et al 2009, Bioinformatics.
- GASVPro: An Integrative Probabilistic Model for Identification of Structural Variation in Sequencing Data. Sindi et al 2012, Genome Biology.
- inGAP-sv: A novel scheme to identify and visualize structural variation from paired end mapping data. Qi et al 2011, Nucleic Acids Res..
- CNVer: Detecting copy number variation with mated short reads. Medvedev et al 2010, Genome Research.
- SVDetect: A tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Zeitouni et al 2010, Bioinformatics.
- SVseq: An approach for detecting exact breakpoints of deletions with low-coverage sequence data. Zhang et al 2011, Bioinformatics.
- HYDRA-SV: Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Quinlan et al 2010, Genome Research.
- NovelSeq: Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Hajirasouliha et al 2010, Bioinformatics.
- CREST: Maps somatic structural variation in cancer genomes with base-pair resolution. Wang et al 2011, Nature Methods.
- Genome STRiP (v2.0): A new CNV discovery and genotyping pipeline. Large multiallelic copy number variations in humans. Handsaker et al Nature Genetics, 2015.
- GRIDSS: Sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Cameron et al Genome Research, 2017.
- MELT: Population-scale mobile element discovery and biology. Gardner et al 2017, Genome Research.
- T-lex/T-lex2: Genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Fiston-Lavier et al 2015, Nucleic Acids Res..
- DD_DETECTION: Detecting dispersed duplications in high-throughput sequencing data using a database-free approach. Kroon et al 2015, Bioinformatics.
- Jitterbug: Somatic and germline transposon insertion detection at single-nucleotide resolution. Henaff et al 2015, BMC genomics.
- ITIS: A bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. Jiang et al 2015, BMC Bioinformatics.
- Mobster: Accurate detection of mobile element insertions in next generation sequencing data. Thung et al 2014, Genome Biology.
- Tangram: A comprehensive toolbox for mobile element insertion detection. Wu et al 2014, BMC Genomics.
- TE-Tracker: Systematic identification of transposition events through whole-genome resequencing. Gilly et al 2014, BMC Bioinformatics.
- TranspoSeq: Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Helman et al 2014, Genome Research.
- TraFiC: Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Tubio et al 2014, Science.
- TEMP: A computational method for analyzing transposable element polymorphism in populations. Zhuang et al 2014, Nucleic Acids Res.
- RetroSeq: Transposable element discovery from next-generation sequencing data. Keane et al 2013, Bioinformatics.
- GRIPper: Retrotransposition of gene transcripts leads to structural variation in mammalian genomes. Ewing et al 2013, Genome Biology.
- RelocaTE: The use of RelocaTE and unassembled short reads to produce high-resolution snapshots of transposable element generated diversity in rice. Robb et al 2013, G3.
- Tea: Landscape of somatic retrotransposition in human cancers. Lee et al 2012, Science.
- ngs_te_mapper: Whole Genome Resequencing Reveals Natural Target Site Preferences of Transposable Elements in Drosophila melanogaster. Linheiro et al 2012, Plos ONE.
- TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data. Platzer et al 2012, Biology.
- STRetch: Detecting and discovering pathogenic short tandem repeat expansions. Dashnow et al 2018, Genome Biology.
- exSTRa: Detecting tandem repeat expansions in cohorts sequenced with short-read sequencing data. Tankard et al 2018, BioRxiv.
- ExpansionHunter: Detection of long repeat expansions from PCR-free whole-genome sequence data. Dolzhenko et al 2017, Genome Research.
- TREDPARSE: Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Tang et al 2017, Am J Hum Genet.
- Variation Graph: Sequence variation aware genome references and read mapping with the variation graph toolkit. Garrison et al 2017, BioRxiv.
- Graph Genome Suite (Commercial): Fast and Accurate Genomic Analyses using Genome Graphs. Rakocevic et al 2017, BioRxiv.
- Genome in a Bottle: A human genome standard. 2015, Nature Biotech.
- BCFtools/csq: Haplotype-aware variant consequences. Danecek et al 2017, Bioinformatics.
- BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Quinlan 2014, Curr. Protoc. Bioinformatics.
- GenomicRanges: Software for computing and annotating genomic ranges. Lawrence et al 2013, PLoS Comput Biol.
- FusorSV: An algorithm for optimally combining data from multiple structural variation detection methods. Becker et al 2018, Genome Biology.
- MetaSV: An accurate and integrative structural-variant caller for next generation sequencing. Mohiyuddin et al 2015, Bioinformatics.
- American College of Medical Genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants. Kearney et al 2011, Genet Med.
- gnomAD: Genome Aggregation Database, a coalition of investigators seeking to aggregate and harmonize exome and genome sequencing data from a variety of large-scale sequencing projects. Lek et al 2016, Nature.
- ExAC: Exome Aggregation Consortium. Lek et al 2016, Nature.
- Genome of the Netherlands Project: A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Hehir-Kwa et al 2016, Nat Commun.
- 1000 Genomes Project: A global reference for human genetic variation. The 1000 Genomes Project Consortium et al 2015, Nature.
- GenePANDA: A novel network-based gene prioritizing tool for complex diseases. Yin et al 2017, Sci Rep.
Note: software link (from paper) appears to be broken! - Vcfanno: Fast, flexible annotation of genetic variants. Pedersen et al 2016, Genome Biology.
- ENCODE: An integrated encyclopedia of DNA elements in the human genome. ENCODE Project Consortium 2012, Nature.
- GLAD4U: Deriving and prioritizing gene lists from PubMed literature. Jourquin et al 2012, BMC Genomics.
- UCSC Genome Browser: The Human Genome Browser at UCSC, gene tracks available. Kent et al 2002, Genome Research.
- DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders. Quinodoz et al 2017, Am J Hum Genet.
- SVScore: SVScore: an impact prediction tool for structural variation. Ganel et al 2017, Bioinformatics.
- VEP: The Ensembl Variant Effect Predictor. McLaren et al 2016, Genome Biology.
- SnpEFF: A program for annotating and predicting the effects of single nucleotide polymorphisms. Cingolani et al 2012, Fly.
- ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Wang et al 2010, Nucleic Acids Res.
- Samplot: A command line tool for rapid, multi-sample structural variant visualization. Samplot takes SV coordinates and bam files and produces high-quality images that highlight any alignment and depth signals that substantiate the SV. Ryan Layer 2018, GitHub.
- New Genome Browser: A web-based NGS data viewer with SV visualization capabilities, high performance, scalability, and cloud data support. EPAM Systems 2017, GitHub.
- Ribbon: For visualizing complex genome alignments and structural variation. Nattestad et al 2016, BioRxiv.
- CGDV: Another webtool for Circos genomics and transcriptomics data. Jha et al 2016, BMC Genomics.
- ClicO FS: An interactive web-based service of Circos. Cheong et al 2015, Bioinformatics.
- GASVPro: An integrative probabilistic model for identification of structural variation in sequencing data. Sindi et al 2012, Genome Biology.
- Integrative Genomics Viewer: A high-performance visualization tool for interactive exploration of large, integrated genomic datasets. Robinson et al 2011, Nature Biotech.
- Pairoscope: Quick and simple diagrams indicating the relationship of paired end sequencing reads. It functions by displaying multiple genomic regions, their read depth at each base in the region and arcs within or between regions to indicate pairing information. David Larson 2010, SourceForge.
- Savant: A genome browser for next generation data. Fuime et al 2010, Bioinformatics.
- Circos: A software package for visualizing data and information in a circular layout. Krzywinski et al 2009, Genome Research.
- SV-plaudit: A cloud-assisted framework for manually curating thousands of structural variants. Belyeu et al 2018, GigaScience.