Structural Variant Benchmarking

Manuscript available here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642177/

Structural Variant Benchmarking

This project contains a collection of scripts used for SV benchmarking and visualisation of the results. The project structure is as follows:

Scripts for creating benchmarking VCFs

input.*: reference data for a given sample
data.*: directory containing the sequencing data and variant calls for a given sample
data./.metadata: generated file continaing bash variables containing the metadata about the file (e.g. library fragment size, aligner, variant caller, ...)
alignbam.sh: script for running aligners (fastq to BAM)
common.sh: helper utility script. Contains generic functions such as loading/saving metadata and executing jobs.
call_*.sh: bash shell scripts for running the relevant variant caller (fastq/BAM to VCF)
bamtofastq.sh: conversion script so de novo assembly based variant callers can be run on samples with input BAM files
*2vcf.py: helper scripts to convert the results of a variant caller using a custom format to VCF
clean*.sh: helper scripts to remove incomplete data
setting.*: settings file for a given sample.
xcall_*.sh: variant caller script that aren't working/don't work with the latest version of the variant caller

Scripts for processing benchmarking VCFs

R/global.R: contains constants and common data
R/contig.R: contains instance-dependent configurations data such as directory locations
R/install.R: helper script to install require packages in a new R environment
R/lib*.R: collections of functions used
R/manuscript_figures.R: script for generating non-simulation benchmarking paper figures
R/shiny.Rproj: shiny app.
R/server.R: shiny app
R/ui.R: shiny app
R/shinyCache2.R: caching infrastructure. We run out of memory if we load 5000+ VCFs at the same time so we heavily cache intermediate results
R/precache.R: script for pre-caching shiny app results. Usability is poor when it takes 4 hours respond to a change in a drop-down so by precomputing all possible combinations of input elements on the cluster, we reduce both deployed app CPU usage, and disk usage (since we only need to deploy the final cached results, not all raw and intermediate results as well). Use --plot to generate simulation benchmarking paper figures

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
R		R
data.HG002		data.HG002
data.chm		data.chm
data.chm1		data.chm1
data.chm13		data.chm13
data.chmboth		data.chmboth
data.na12878		data.na12878
input.HCC1395/INTEGRATE		input.HCC1395/INTEGRATE
input.common		input.common
input.na12878		input.na12878
input.neo		input.neo
.gitignore		.gitignore
2019-02-14_chm1_chm13_merge_truthsets_script.sh		2019-02-14_chm1_chm13_merge_truthsets_script.sh
LICENSE		LICENSE
README.md		README.md
alignbam.sh		alignbam.sh
annotateImperfectHomology.sh		annotateImperfectHomology.sh
archive_caller.sh		archive_caller.sh
bamtofastq.sh		bamtofastq.sh
breakdancer2vcf.py		breakdancer2vcf.py
call_all.sh		call_all.sh
call_breakdancer.sh		call_breakdancer.sh
call_clever.sh		call_clever.sh
call_cluster.sh		call_cluster.sh
call_cortex.sh		call_cortex.sh
call_crest.sh		call_crest.sh
call_defuse.sh		call_defuse.sh
call_delly.sh		call_delly.sh
call_gasv.sh		call_gasv.sh
call_gridss.sh		call_gridss.sh
call_hydra.sh		call_hydra.sh
call_lumpy.sh		call_lumpy.sh
call_manta.sh		call_manta.sh
call_meerkat.sh		call_meerkat.sh
call_pindel.sh		call_pindel.sh
call_samtools.sh		call_samtools.sh
call_socrates.sh		call_socrates.sh
call_tigra.sh		call_tigra.sh
call_variationhunter.sh		call_variationhunter.sh
call_whamg.sh		call_whamg.sh
cleandata.sh		cleandata.sh
cleanlocks.sh		cleanlocks.sh
clusterjobs.sh		clusterjobs.sh
common.sh		common.sh
crest2vcf.py		crest2vcf.py
delly2vcf.py		delly2vcf.py
gasv.CHROMOSOME_NAMING_FILE.txt		gasv.CHROMOSOME_NAMING_FILE.txt
gasv2vcf.py		gasv2vcf.py
gendownsample.sh		gendownsample.sh
genhumanchr12variants.sh		genhumanchr12variants.sh
genreads.sh		genreads.sh
hydra2vcf.py		hydra2vcf.py
indel.vcf		indel.vcf
indelrealign.sh		indelrealign.sh
manta_combine.sh		manta_combine.sh
meerkat.vcf		meerkat.vcf
prism2vcf.py		prism2vcf.py
sanitycheck.sh		sanitycheck.sh
sc2sq.sh		sc2sq.sh
settings.HCC1395		settings.HCC1395
settings.HG002		settings.HG002
settings.chm		settings.chm
settings.fs		settings.fs
settings.na12878		settings.na12878
settings.neo		settings.neo
settings.rd		settings.rd
settings.rl		settings.rl
settings.test		settings.test
socrates2vcf.py		socrates2vcf.py
sortbam.sh		sortbam.sh
status.sh		status.sh
sv_benchmark.Rproj		sv_benchmark.Rproj
tigra.vcf		tigra.vcf
tobedpe.R		tobedpe.R
variationhunter2vcf.py		variationhunter2vcf.py
variationhunter_divit_filter.awk		variationhunter_divit_filter.awk
varscan2vcf.py		varscan2vcf.py
xcall_breakpointer.sh		xcall_breakpointer.sh
xcall_dindel.sh		xcall_dindel.sh
xcall_gatk.sh		xcall_gatk.sh
xcall_lasv.sh		xcall_lasv.sh
xcall_prism.sh		xcall_prism.sh
xcall_soapindel.sh		xcall_soapindel.sh
xcall_soapsv.sh		xcall_soapsv.sh
xcall_svdetect.sh		xcall_svdetect.sh
xcall_svmerge.sh		xcall_svmerge.sh
xcall_varscan2.sh		xcall_varscan2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structural Variant Benchmarking

Scripts for creating benchmarking VCFs

Scripts for processing benchmarking VCFs

About

Releases

Packages

Contributors 2

Languages

License

PapenfussLab/sv_benchmark

Folders and files

Latest commit

History

Repository files navigation

Structural Variant Benchmarking

Scripts for creating benchmarking VCFs

Scripts for processing benchmarking VCFs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages