-
Notifications
You must be signed in to change notification settings - Fork 0
/
cheatsheet.qmd
118 lines (91 loc) · 4.93 KB
/
cheatsheet.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "Reproducible Research in R (and friends)"
subtitle: "Cheatsheet"
author:
- name: M. Rolland
degrees: MSc
orcid: 0000-0002-8076-5550
email: matthieu.rolland@univ-grenoble-alpes.fr
date: "2024-06-08"
date-modified: today
format: html
editor: visual
bibliography: bib_mr.bib
nocite: |
@marwick_packaging_2018
toc: true
number-sections: true
---
This cheatsheet provides essential guidelines and best practices for conducting reproducible research using R and related tools. It covers project organization, version control, data management, documentation, environment management, workflow automation, and more. For any suggestions or feedback, please feel free to email me.
## Project Organization
- Use a consistent folder structure:
- `data/` - Analysis data files
- `scripts/` - Analysis scripts
- `outputs/` - Results (figures, tables)
- `docs/` - Documentation and reports
- Use RStudio Projects to facilitate project management and environment isolation
- Maintain a project log to document progress, changes, and key decisions throughout the analysis
- Reference:
- [The concept of research compendium](https://peerj.com/preprints/3192/)
- [Using RStudio projects](https://support.posit.co/hc/en-us/articles/200526207-Using-RStudio-Projects)
## Version Control
- Use Git to track changes in scripts and documents
- Commit regularly with meaningful messages
- One repository per analysis
- Make sure your `data/` folder is in the `.gitignore` file
- Make sure there is no sensitive information in your code
- Reference:
- [Happy Git with R](https://happygitwithr.com/)
## Data Management
- Store raw data in `data/raw/` and never modify it directly
- Produce a README describing the source data
- Use scripts to clean and process data, save the cleaned data in `data/processed/`
- Document each step of data cleaning
- Keep data cleaning separate from analysis
- Organize your data in a tidy format: each variable is a column, each observation is a row
- Reference:
- [Principles of tidy data](https://www.jstatsoft.org/article/view/v059i10)
## Documentation
- Comment code extensively to explain steps and logic
- Create README files to explain project structure and instructions for running the analysis
- Document all functions clearly, including input parameters, output, and purpose
- Reference:
- [Example README file](https://gricad-gitlab.univ-grenoble-alpes.fr/iab-env-epi/rolland_effects_2022)
- [{roxygen2} for function documentation](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html)
## Environment Management
- Use `sessionInfo()` or `devtools::session_info()` to capture the R session information
- Use `{renv}` to manage package versions
- Reference:
- [Introduction to renv](https://rstudio.github.io/renv/articles/renv.html)
- [Example on how to show the session info (scroll to bottom)](https://gricad-gitlab.univ-grenoble-alpes.fr/iab-env-epi/rolland_effects_2022)
## Workflow Automation
- Organize your analysis into a series of numbered and ordered scripts to create a clear and reproducible workflow (e.g., 01-data-cleaning.R, 02-data-analysis.R, 03-visualization.R)
- Create a master script (e.g., run_all.R) that sequentially runs each numbered script
OR
- Use Makefile or `{targets}` package to automate and document the workflow
- Reference:
- [{targets} Package](https://books.ropensci.org/targets/)
- [Example Project using {targets}](https://mjrolland.github.io/ed-neuro-hpa/)
## Analysis Scripts
- Break analysis into small, reusable functions
- Use meaningful and consistent naming conventions such as provided by the [Tidyverse Naming Conventions](https://style.tidyverse.org/syntax.html#object-names) for variables and functions and by [data carpentry](https://datacarpentry.org/rr-organization1/01-file-naming/index.html) for folders and files
- Style your code according to standardized recommendations from the [Tidyverse Style Guide](https://style.tidyverse.org/)
- Reference:
- [Embrace functional programming](https://tidyverse.tidyverse.org/articles/manifesto.html#embrace-functional-programming)
- [Tidyverse Style Guide](https://style.tidyverse.org/)
## Computational reproducibility
- Set seeds to ensure reproducibility when using randomness in your analysis
- Document all warnings
- Reference:
- [Random number seed in R](https://makemeanalyst.com/r-programming/random-number-seed/)
## Reporting
- Use RMarkdown (.Rmd) or Quarto (.Qmd) files to combine code, results, and narrative for creating dynamic reports
- Reference:
- [Quarto Documentation](https://quarto.org/)
## Validation
- Get your code reviewed prior to publication
- Reference:
- [Code Review Practices](https://mtlynch.io/human-code-reviews-1/)
## Sharing Code And Data
- Use repositories like [GitHub](https://github.com/) or [GitLab](https://gitlab.com/) for sharing code
- Use repositories like [Zenodo](https://zenodo.org/) or [data.gouv.fr](https://www.data.gouv.fr/fr/) if you have data sets to share