GitHub - CBBIO/func-lm: Data from paper

Func-LM Repository

This repository contains information from our preprint: Preprint Link Data and results are here https://zenodo.org/records/10993618

For any queries, please open an issue on GitHub or contact the CBBIO Group.

Author Contact: a.rojas.m@csic.es or icases@gmail.com please open an issue on GitHub or contact the repository owners.

Overview

This repository is designed to facilitate the analysis and evaluation of various prediction methods against a gold standard (Uniprot) for different species. It includes scripts for collecting data, generating reports, and evaluating scores and semantic similarities.

Prerequisites

R (version 4.0 or higher)
R libraries: tidyverse, furrr, GO.db, GOSemSim, knitr, rmarkdown
Ensure you have the necessary permissions to execute the scripts and write outputs to the specified directories.

Repository Contents

Supplementary_tables1-9.xlsx:
- An Excel file containing supplementary tables that support the findings presented in the preprint.
collect_data.R:
- An R script used for collecting data necessary for the analysis.
report.Rmd:
- An R Markdown file used to generate a report, containing analysis and results in a reproducible format. It uses rmarkdown::render to create a report for different species.
scores.Rmd:
- Another R Markdown file, used to score or evaluate data. It generates a report that compares predictions from different methods against a gold standard (Uniprot) for different species.
ss_report.Rmd:
- An R Markdown file for generating a semantic similarity report. It calculates and visualizes the semantic similarity of predictions from various methods against a gold standard (Uniprot) for different species.

Script Workflow

Data Collection (collect_data.R):
- Collects and preprocesses data required for the analysis.
- Ensures data is in the correct format for subsequent scripts.
Generating Reports (report.Rmd):
- Produces a detailed report for automatic annotation of a species.
- Contains sections for:
  - Gold Standard (UNIPROT) annotations and statistics.
  - Predictions from various methods and their comparison to the gold standard.
  - Visualizations and summaries of annotations and predictions.
Evaluating Scores (scores.Rmd):
- Analyzes and compares scores from different prediction methods.
- Visualizes the distribution of scores and their coverage across different ontologies (BP, CC, MF).
- Includes functions to calculate and visualize the drop in hits, coverage in annotations, and information content.
Semantic Similarity Report (ss_report.Rmd):
- Generates a semantic similarity report.
- Calculates semantic similarity between different prediction methods and the gold standard.
- Uses parallel processing for efficient computations.
- Visualizes semantic similarity scores using ggplot2.

Output

Reports:
- HTML and PDF reports generated by report.Rmd, scores.Rmd, and ss_report.Rmd.
- Contain detailed analyses, visualizations, and summaries of predictions and annotations.
Tables:
- Supplementary tables in Supplementary_tables1-9.xlsx.
- Additional output tables generated by the scripts, saved in the specified directories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Func-LM Repository

Overview

Prerequisites

Repository Contents

Script Workflow

Output

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
Supplementary_tables1-9.xlsx		Supplementary_tables1-9.xlsx
collect_data.R		collect_data.R
report.Rmd		report.Rmd
scores.Rmd		scores.Rmd
ss_report.Rmd		ss_report.Rmd

License

CBBIO/func-lm

Folders and files

Latest commit

History

Repository files navigation

Func-LM Repository

Overview

Prerequisites

Repository Contents

Script Workflow

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages