This R package aims at creating a predictive model of regulatory sequences used to score unknown sequences based on the content of DNA motifs, next-generation sequencing (NGS) peaks and signals and other numerical scores of the sequences using supervised classification. The package contains a workflow based on the support vector machine (SVM) algorithm that maps features to sequences, optimize SVM parameters and feature number and creates a model that can be stored and used to score the regulatory potential of unknown sequences. This R package is available in the Bioconductor release and Bioconductor devel repositories. We have also created a LedPreddata repository with data and command examples froms the Bioinformatics paper.
- Bioinformatics. 2015 Dec 1. pii: btv705. [Epub ahead of print]