This repository contains data and code necessary to reproduce analysis from the article: Burdukiewicz, M., Sidorczuk, K., Rafacz, D., Pietluch, F., Chilimoniuk, J., Rödiger, S. and Gagat, P. AmpGram: a proteome screening tool for prediction and design of antimicrobial peptides.
The analysis conducted in this article resulted in a predictor of antimicrobial peptides AmpGram, available as a R package (https://cran.r-project.org/package=AmpGram) and a web server (www.smorfland.uni.wroc.pl/shiny/AmpGram/).
Source analysis.R. Be warned that computations are time consuming.
Run generate_benchmark_data.R and then benchmark.R an plots.R. The first script generates AmpGram model and files necessary for running benchmark. Moreover, to generate plots, you have to run benchmark first.
Processed data used in the study:
- SuppTable1.tsv and SuppTable2.tsv - data sets and results of benchmark from Gabere and Noble.
- ampscanner_noble.csv - predictions of AMPScanner for data sets of Gabere and Noble.
- apd_df.csv - data downloaded from APD3 database.
- benchmark_all.csv - predictions of other AMP predictors on our benchmark data set.
- benchmark_all_program_list - list of predictors considered in our benchmark.
- benchmark_data.RData - object generated by generate_benchmark_data.R script, contains model, important features and counted n-grams for benchmark data set.
- dbamp_df.csv - data downloaded from dbAMP database, contains sequences used to train and benchmark AmpGram.
- iAMPpred_benchmark.csv - predictions of iAMPpred on our benchmark data set.
- bovine_lactoferrin.fasta and thrombin.fasta - sequences of bovine lactoferrin and human prothrombin used to generate plots with prediction results for publication.
All functions necessary to repeat the analysis.
Report summing up the results obtained with the first random models that predicts AMP properties in 10-mers.
- AmpGram_model.rda - object containing AmpGram stacked random forest model and important features
- AmpGram_model.rda - object containing AmpGram stacked random forest model trained on a smaller subset of important features
- Nobles_datasets_benchmark_res.rds - results of AmpGram predictions for data sets from Gabere and Noble
- benchmark.fasta - our benchmark data set generated by writing_benchmarks.R function.
All scripts used in this study are compatible with following version of R:
- R version 3.6.2 (2019-12-12)
- Platform: x86_64-pc-linux-gnu (64-bit)
Necessary packages and their versions used in the analyses are listed in renv.lock
Uniprot quotation used to obtain sequences for construction of negative data set:
NOT antimicrobial NOT annotation:(type:transit) NOT antibacterial NOT antifungal NOT antiviral AND reviewed:yes