BoostSV is a machine learning based tool to predict the quality of SV breakpoint by assessing the alignment quality at the breakpoint location. The model is trained on transmitted breakpoints from a four-generation pedigree (David Porubsky et.al Nature) with both HiFi and ONT reads.
pysam
xgboost
pandas
edlib
joblib
pyspoa
{input.svs}: SVs in BED format with chrom, start, end, type (INS/DEL) and length. Current model only supports INS/DEL.
{input.bam}: The long-read alignment file.
{input.depth}: The read coverage for each SV site. This can be created by samtools depth.
python ./BoostSV.py collect -i {input.svs} -b {input.bam} -t {threads} -o $( dirname {output.res} ) -p {params.info} -r {REFGENOME} -d {input.depth}
This will create the feature file sv_feats_v1.0.txt under the output directory specified by option -o.
{input.svs}: SVs in BED format with chrom, start, end, type (INS/DEL) and length. Current model only supports INS/DEL.
{input.feat}: Alignment feature file sv_feats_v1.0.txt created for each SV in the input.svs.
{MODEL}: The pretrained model on the pedigree models/final_trainsv_ALL.XGB.model.
python ./BoostSV.py predict -s {input.feat} -p {params.info} -i {input.svs} -o $( dirname {output.res} ) -m {MODEL}
The predicted quality for each SV is saved in pred_qual_v1.0.txt.
v1.0: This is the beta release for autism long-read study. The trained model is models/final_trainsv_ALL.XGB.model.
The tool is still under-development.