- Regression dataset - HIV-1 TAR RNA QSAR study (Cai et al., 2022)
- Classification dataset - ROBIN repository (Yazdani et al., 2022)
- All the datasets provided have been pre-processed as per the pipeline explained in Dataset_preprocessing README file and provided here. They can be directly used for model evaluation.
- Anaconda or Miniconda with Python 3.8.
- A conda environment with the following libraries:
- Python (>v3.8)
- pandas
- numpy
- scipy
- scikit-learn
- pickle
- matplotlib
- seaborn
- Generic libraries: sys, re, csv, os, collections...
* python test_on_regression_dataset.py "./data/QSAR_dataset_final.csv" "../Feature_selection/data/Final_sample_dataset_v1.csv" "best_models.log" "test_results.csv"
* python test_on_classification_dataset.py <positive dataset> <negative dataset> "../Feature_selection/data/Final_sample_dataset_v1.csv" "best_models.log"
- The complete dataset can be downloaded for each RNA subtype from the R-SIM database
- All programs take inputs as command-line arguments only
- The same pipeline is followed to prepare all custom test datasets used in the study, in the format necessary for model evaluation
- The header fields or columns provided in the sample dataset must be present (column order can be random) in your custom dataset file for the programs to work in the intended fashion