Skip to content

CBBIO/func-lm

Repository files navigation

Func-LM Repository

This repository contains information from our preprint: Preprint Link Data and results are here https://zenodo.org/records/10993618

For any queries, please open an issue on GitHub or contact the CBBIO Group.

Author Contact: a.rojas.m@csic.es or icases@gmail.com please open an issue on GitHub or contact the repository owners.


Overview

This repository is designed to facilitate the analysis and evaluation of various prediction methods against a gold standard (Uniprot) for different species. It includes scripts for collecting data, generating reports, and evaluating scores and semantic similarities.

Prerequisites

  • R (version 4.0 or higher)
  • R libraries: tidyverse, furrr, GO.db, GOSemSim, knitr, rmarkdown
  • Ensure you have the necessary permissions to execute the scripts and write outputs to the specified directories.

Repository Contents

  1. Supplementary_tables1-9.xlsx:

    • An Excel file containing supplementary tables that support the findings presented in the preprint.
  2. collect_data.R:

    • An R script used for collecting data necessary for the analysis.
  3. report.Rmd:

    • An R Markdown file used to generate a report, containing analysis and results in a reproducible format. It uses rmarkdown::render to create a report for different species.
  4. scores.Rmd:

    • Another R Markdown file, used to score or evaluate data. It generates a report that compares predictions from different methods against a gold standard (Uniprot) for different species.
  5. ss_report.Rmd:

    • An R Markdown file for generating a semantic similarity report. It calculates and visualizes the semantic similarity of predictions from various methods against a gold standard (Uniprot) for different species.

Script Workflow

  1. Data Collection (collect_data.R):

    • Collects and preprocesses data required for the analysis.
    • Ensures data is in the correct format for subsequent scripts.
  2. Generating Reports (report.Rmd):

    • Produces a detailed report for automatic annotation of a species.
    • Contains sections for:
      • Gold Standard (UNIPROT) annotations and statistics.
      • Predictions from various methods and their comparison to the gold standard.
      • Visualizations and summaries of annotations and predictions.
  3. Evaluating Scores (scores.Rmd):

    • Analyzes and compares scores from different prediction methods.
    • Visualizes the distribution of scores and their coverage across different ontologies (BP, CC, MF).
    • Includes functions to calculate and visualize the drop in hits, coverage in annotations, and information content.
  4. Semantic Similarity Report (ss_report.Rmd):

    • Generates a semantic similarity report.
    • Calculates semantic similarity between different prediction methods and the gold standard.
    • Uses parallel processing for efficient computations.
    • Visualizes semantic similarity scores using ggplot2.

Output

  • Reports:

    • HTML and PDF reports generated by report.Rmd, scores.Rmd, and ss_report.Rmd.
    • Contain detailed analyses, visualizations, and summaries of predictions and annotations.
  • Tables:

    • Supplementary tables in Supplementary_tables1-9.xlsx.
    • Additional output tables generated by the scripts, saved in the specified directories.

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages