This repository contains the commands and scripts used for Genotyping structural variants in pangenome graphs using the vg toolkit, 2019, in press. They are primarily dependent on toil-vg, which can run most other dependencies via Docker. There is a WIKI here for genotyping SVs with toil-vg. Github issues is the best place to raise questions or concerns.
Links to necessary data are also listed for each analysis.
Of note the code to reproduce the figures and tables in the manuscript are available in the manuscript's repo.
This repository is distributed under the MIT license terms.
The different methods were compared on simulated sequence and SVs.
Different depth were tested.
We also tested the effect of errors in the breakpoint location of the SVs.
Scripts are available and described in the simulation
folder.
These were run on AWS via Toil. In theory, they could use any other framework that Toil supports, though the scripts will have to be modified accordingly.
In the human
directory, there is one folder for each dataset with the commands to download/prepare the data and genotype SV with vg and the other methods.
- Human Genome Structural Variation Consortium (HGSVC)
- Genome in a Bottle (GiaB)
- Pseudo-diploid CHM genome (CHMPD)
- SV catalog from Audano et al. Cell 2019 (SVPOP)
There is also a toil-scripts
folder with helper scripts that were used to run the analysis on AWS.
The commands for the evaluation, using Snakemake, are available in the sveval
folder.
The VCFs produced produced by vg and the other methods across these datasets are available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=vgsv2019/vcfs/.
The yeast experiments are written as snakemake pipelines. Each pipeline consists of a set of rules that process a set of input files into a set of output files.
In the yeast
directory, there are several folders for the different phases of the experiment as well as detailed descriptions on how to re-run it.