-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
58 lines (33 loc) · 2.15 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
output: github_document
---
# NegativeDatasets
```{r echo = FALSE, results = 'asis'}
source("https://raw.githubusercontent.com/BioGenies/NegativeDatasets/main/docs/rmd_scripts.R")
cat(negative_sampling_citation())
```
## Getting started
This repository contains the data and code necessary to reproduce the results from the paper *Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data*. It uses [renv](https://CRAN.R-project.org/package=renv) and [targets](https://CRAN.R-project.org/package=targets) packages to control the workflow and assure the reproducibility.
Some of the data files are too large to store them on GitHub but they can be downloaded using the links below:
- [UniProt data](https://www.dropbox.com/sh/n7hcu1byp1izuwv/AAB6irXnv8S5dE-LEW4QkM-ya?dl=0) - Data directory with reviewed sequences and their annotation downloaded from UniProtKB release 2020_06. These sequences and their annotations were used to create negative data sets in our study.
- [Prediction results for architectures](https://www.dropbox.com/sh/iuytufcl92kd61a/AAArrO0P9XhZavDxfTpqjIhua?dl=0) - Results directory with prediction results for all 660 models trained and tested in our study. These files are necessary for calculation of models' performance and generation of plots and tables from the paper.
To reproduce the results clone the repo, set your path to the directories with data files and:
``` r
renv::restore()
targets::tar_make()
```
## Content
**\_targets.R** - reproducible pipeline for generation of all data sets and results processing,
**data** - data files used during the study, e.g. for creation of the positive dataset,
**drafts** - draft codes used for initial exploratory analyses,
**functions** - all functions used for running the pipeline and obtaining results,
**presentations** - presentation files for this project,
**renv** - renv package files,
**reports** - reports with initial analyses,
**third-party** - third-party executables used in the pipeline.
```{r echo = FALSE, results = 'asis'}
cat(negative_sampling_links())
```
```{r echo = FALSE, results = 'asis'}
cat(negative_sampling_contact())
```