This is a Snakemake project designed to facilitate the validation of de novo structural variation. The Snakefile
is under workflow
.
This pipeline starts with a simple setup of a pedigree file (e.g. .test/pedigree.tab) with IDs for the parents and child. From there, it can take in raw reads, assemblies aligned to a reference, intersection files generated by SVPOP (https://github.com/EichlerLab/svpop), and regions bed files to determine de novo variants. An explanation of the config structure can be seen in config/README.md
It uses a combination of SUBSEQ calls (extract reads and compare lengths) and multi-sequence alignments to determine if variants which are proposed to be de novo are supported by other callers, or seen in the parents.
This is important because some variants escape detection by traditional callers, and manual inspection of the reads can be tedious and impractical for large callsets.
The input bed file must have a header and be of the format:
#CHROM POS END ID SVTYPE
The output is a TSV with bed ID and a summary of the validation metrics for each step as well as a final validation call in column (VAL_DNSV)