Skip to content

jiadong324/BoostSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BoostSV

BoostSV is a machine learning based tool to predict the quality of SV breakpoint by assessing the alignment quality at the breakpoint location. The model is trained on transmitted breakpoints from a four-generation pedigree (David Porubsky et.al Nature) with both HiFi and ONT reads.

Packages

pysam
xgboost
pandas
edlib
joblib
pyspoa

Basic usage

Collect alignment features

Required inputs

{input.svs}: SVs in BED format with chrom, start, end, type (INS/DEL) and length. Current model only supports INS/DEL.

{input.bam}: The long-read alignment file.

{input.depth}: The read coverage for each SV site. This can be created by samtools depth.

python ./BoostSV.py collect -i {input.svs} -b {input.bam} -t {threads} -o $( dirname {output.res} ) -p {params.info} -r {REFGENOME} -d {input.depth}

This will create the feature file sv_feats_v1.0.txt under the output directory specified by option -o.

Predict quality

Required inputs

{input.svs}: SVs in BED format with chrom, start, end, type (INS/DEL) and length. Current model only supports INS/DEL.

{input.feat}: Alignment feature file sv_feats_v1.0.txt created for each SV in the input.svs.

{MODEL}: The pretrained model on the pedigree models/final_trainsv_ALL.XGB.model.

python ./BoostSV.py predict -s {input.feat} -p {params.info} -i {input.svs} -o $( dirname {output.res} ) -m {MODEL}

The predicted quality for each SV is saved in pred_qual_v1.0.txt.

Release note

v1.0: This is the beta release for autism long-read study. The trained model is models/final_trainsv_ALL.XGB.model. The tool is still under-development.

About

Long-read alignment evaluation and SV breakpoint quality prediction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages