Skip to content

michbur/AmpGram-analysis

Repository files navigation

Read me

This repository contains data and code necessary to reproduce analysis from the article: Burdukiewicz, M., Sidorczuk, K., Rafacz, D., Pietluch, F., Chilimoniuk, J., Rödiger, S. and Gagat, P. AmpGram: a proteome screening tool for prediction and design of antimicrobial peptides.

The analysis conducted in this article resulted in a predictor of antimicrobial peptides AmpGram, available as a R package (https://cran.r-project.org/package=AmpGram) and a web server (www.smorfland.uni.wroc.pl/shiny/AmpGram/).

How to reproduce the main part of the analysis?

Source analysis.R. Be warned that computations are time consuming.

How to generate results and plots for publication?

Run generate_benchmark_data.R and then benchmark.R an plots.R. The first script generates AmpGram model and files necessary for running benchmark. Moreover, to generate plots, you have to run benchmark first.

Repository structure

data

Processed data used in the study:

  • SuppTable1.tsv and SuppTable2.tsv - data sets and results of benchmark from Gabere and Noble.
  • ampscanner_noble.csv - predictions of AMPScanner for data sets of Gabere and Noble.
  • apd_df.csv - data downloaded from APD3 database.
  • benchmark_all.csv - predictions of other AMP predictors on our benchmark data set.
  • benchmark_all_program_list - list of predictors considered in our benchmark.
  • benchmark_data.RData - object generated by generate_benchmark_data.R script, contains model, important features and counted n-grams for benchmark data set.
  • dbamp_df.csv - data downloaded from dbAMP database, contains sequences used to train and benchmark AmpGram.
  • iAMPpred_benchmark.csv - predictions of iAMPpred on our benchmark data set.
  • bovine_lactoferrin.fasta and thrombin.fasta - sequences of bovine lactoferrin and human prothrombin used to generate plots with prediction results for publication.

functions

All functions necessary to repeat the analysis.

reports

Report summing up the results obtained with the first random models that predicts AMP properties in 10-mers.

results

  • AmpGram_model.rda - object containing AmpGram stacked random forest model and important features
  • AmpGram_model.rda - object containing AmpGram stacked random forest model trained on a smaller subset of important features
  • Nobles_datasets_benchmark_res.rds - results of AmpGram predictions for data sets from Gabere and Noble
  • benchmark.fasta - our benchmark data set generated by writing_benchmarks.R function.

RSession information

All scripts used in this study are compatible with following version of R:

  • R version 3.6.2 (2019-12-12)
  • Platform: x86_64-pc-linux-gnu (64-bit)

Necessary packages and their versions used in the analyses are listed in renv.lock

Additional information

Uniprot quotation used to obtain sequences for construction of negative data set:

NOT antimicrobial NOT annotation:(type:transit) NOT antibacterial NOT antifungal NOT antiviral AND reviewed:yes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •