This repository contains information from our preprint: Preprint Link Data and results are here https://zenodo.org/records/10993618
For any queries, please open an issue on GitHub or contact the CBBIO Group.
Author Contact: a.rojas.m@csic.es or icases@gmail.com please open an issue on GitHub or contact the repository owners.
This repository is designed to facilitate the analysis and evaluation of various prediction methods against a gold standard (Uniprot) for different species. It includes scripts for collecting data, generating reports, and evaluating scores and semantic similarities.
- R (version 4.0 or higher)
- R libraries:
tidyverse
,furrr
,GO.db
,GOSemSim
,knitr
,rmarkdown
- Ensure you have the necessary permissions to execute the scripts and write outputs to the specified directories.
-
Supplementary_tables1-9.xlsx:
- An Excel file containing supplementary tables that support the findings presented in the preprint.
-
collect_data.R:
- An R script used for collecting data necessary for the analysis.
-
report.Rmd:
- An R Markdown file used to generate a report, containing analysis and results in a reproducible format. It uses
rmarkdown::render
to create a report for different species.
- An R Markdown file used to generate a report, containing analysis and results in a reproducible format. It uses
-
scores.Rmd:
- Another R Markdown file, used to score or evaluate data. It generates a report that compares predictions from different methods against a gold standard (Uniprot) for different species.
-
ss_report.Rmd:
- An R Markdown file for generating a semantic similarity report. It calculates and visualizes the semantic similarity of predictions from various methods against a gold standard (Uniprot) for different species.
-
Data Collection (
collect_data.R
):- Collects and preprocesses data required for the analysis.
- Ensures data is in the correct format for subsequent scripts.
-
Generating Reports (
report.Rmd
):- Produces a detailed report for automatic annotation of a species.
- Contains sections for:
- Gold Standard (UNIPROT) annotations and statistics.
- Predictions from various methods and their comparison to the gold standard.
- Visualizations and summaries of annotations and predictions.
-
Evaluating Scores (
scores.Rmd
):- Analyzes and compares scores from different prediction methods.
- Visualizes the distribution of scores and their coverage across different ontologies (BP, CC, MF).
- Includes functions to calculate and visualize the drop in hits, coverage in annotations, and information content.
-
Semantic Similarity Report (
ss_report.Rmd
):- Generates a semantic similarity report.
- Calculates semantic similarity between different prediction methods and the gold standard.
- Uses parallel processing for efficient computations.
- Visualizes semantic similarity scores using ggplot2.
-
Reports:
- HTML and PDF reports generated by
report.Rmd
,scores.Rmd
, andss_report.Rmd
. - Contain detailed analyses, visualizations, and summaries of predictions and annotations.
- HTML and PDF reports generated by
-
Tables:
- Supplementary tables in
Supplementary_tables1-9.xlsx
. - Additional output tables generated by the scripts, saved in the specified directories.
- Supplementary tables in