A tool to quickly assembly SV breakpoints using Long Reads
The bp_assemble.py script uses samtools, minimap2 and racon to assemble and polish a list of candidate SV breakpoints. Taking a tsv list of breakpoint positions as input along with the read fastq and bam the script follows 5 steps:
- Extract reads at the breakpoint positions
- Find reads that support and span the breakpoint on both chromosome copies
- Generate scaffold breakpoint sequences using the longest reads that support each arm
- Align all reads at breakpoint positions to the scaffolds
- Polish the scaffold sequence using racon
The script assumes that the reads are zipped and indexed by bgzip
- pysam & samtools
- mappy & minimap2
- bgzip
- racon
Installing the dependencies through a conda environment is recommended, however installation from source will work as well
# Setup:
git clone https://github.com/adcosta17/BreakPointAssembly.git
cd BreakPointAssembly
# Usage:
python bp_assemble.py --sniffles-input <sniffles_translocation_calls.tsv> \
--input-bam <input.bam> \
--input-fastq <input.fastq.gz> \
--output-folder <path/to/output/folder> \
--reference-genome <reference_genome.fa>
--sniffles-input A tsv of SV calls. 6 columns are needed: chromsome_A, start, end, chromsome_B, start, end
--input-bam A bam file containing alignments of reads to the reference genome
--input-fastq A fastq of the reads, zipped and indexed by bgzip
--reference-genome A reference genome fasta
--racon [Optional] The path to racon. By default assumes racon is in the PATH
--cleanup [Optional Flag] Cleanup temp files generated by the script
--output-bam [Optional Flag] Request that the assembled breakpoints be aligned to the reference genome and a BAM be written with their alignments.
--small-window [Optional Flag] Gets the sequence within a small window of the breakpoint rather than a full assembly of the region
--bp-window [Optional] The window size for the small window around the breakpoint if --small-window is specified [150 bp]