casp-rna is the official CASP15 pipeline to assess the accuracy of submitted models for RNA structure prediction.
This repository contains the code for the casp-rna pipeline, a tool developed to assess the accuracy of submitted models for RNA structure prediction. The pipeline calculates ZRNA, a weighted Z-score average of several different assessment metrics, to evaluate the models. To capture the topology, local environment, and geometries of RNA, ZRNA incorporates additional metrics beyond RMSD, including TM-score, GDT-TS, INF scores, lDDT, and clashscore. The pipeline encompasses a workflow for data wrangling, job parallelization, and ranking visualizations. A two-pass procedure is employed for Z-scores, and models with initial Z-scores falling under a tolerance threshold of -2 are discarded in the first pass. The pipeline also compares submitted predictor models to all available experimental models for RNA with multiple conformations.
The CASP-RNA repository contains all scripts and code used to obtain and analyse scores for the CASP15 RNA category. The repository is organised as follows:
Clone the repository to your local machine:
$ git clone https://github.com/DasLab/casp-rna.git
$ cd casp-rna
python >= 3.4
py-numpy/1.20.3_py39
scipy
py-matplotlib/3.4.2_py39
py-pandas/1.3.1_py39
seaborn
biopython
viz
py-scipy/1.6.3_py39
US-align
for TM-score calculationsLGA
for GDT-TS calculationsPHENIX
for molprobity clashscore calculationsOpenStructure
for lDDT calcuationsrna-tools
for ClaRNA-based INF scores calculations
Streamline installation of the five packages required for metrics calculations is provided in the setup.sh
script. The script will install the required packages to the bins/
directory. For LGA and PHENIX, their binaries must be obtained through their website and placed into casp-rna's downloads/
folder. Subsequently, the script can be run as follows:
$ bash setup.sh
Existing installations of the above packages can be used. When running setup.sh
script, the script can create a symlink to the existing installation of the packages. Alternatively, the script can install the packages in the bins/
directory.
Installation of PHENIX is described in the PHENIX documentation. PHENIX can be downloaded here and placed in the project's downloads/
folder.
Binary for LGA can be downloaded here. The binary must be placed in the project's downloads/
folder before running the setup.sh
script.
The scripts can be run from the command line, or imported into a python script.
Each target directory must contain two subdirectories: references/
(ground truth models) and models/
(predicted models). The references/
and models/
subdirectories each must contain at least one .pdb file. Different states and configurations can be stored in separate files. The target directory can be placed anywhere on the system, but the path to the target directory must be provided to the script.
Metric summary is exported to a .csv file in the scores/
directory, and graphs are exported to the figures/
directory. Intermediate files for debugging for further data exploration are stored in the runs/
directory.
This script is used to run the pipeline on a computing cluster. The script takes in a list of target directories and runs the pipeline on each target directory. The script can be run as follows:
bash parallel/parallel.sh {path_to_target} {metric}
where {path_to_target} is the relative path containing the references/
and models/
subdirectories and {metric} is the desired metric to be calculated ("inf", "gdt", "tm_score", "lddt", "clashscore", "rmsd", or "all").
Execution of parallel.sh
will launch a large number of jobs. Some minor modifications to match to the specification of your institution's computing cluster job scheduler may be required.
This notebook contains the code to generate the figures in the paper.
If you use this code, please cite the following paper:
@article{casp15_rna,
author = {TBD},
title = {Assessment of three-dimensional RNA structure prediction in CASP15},
journal = {TBD},
year = {TBD},
volume = {TBD},
number = {TBD},
pages = {TBD},
doi = {TBD},
url = {TBD}
}