-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
179 lines (143 loc) · 5.31 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# makepipe <img src='man/figures/logo.png' align="right" height="139"/>
<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/kinto-b/makepipe/branch/master/graph/badge.svg)](https://app.codecov.io/gh/kinto-b/makepipe?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/makepipe)](https://CRAN.R-project.org/package=makepipe)
[![R-CMD-check](https://github.com/kinto-b/makepipe/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/kinto-b/makepipe/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of `makepipe` is to allow for the construction of make-like pipelines in R with very minimal overheads. In contrast to `targets` (and its predecessor `drake`) which offers an opinionated pipeline framework that demands highly functionalised code, `makepipe` is easy-going, being adaptable to a wide range of data science workflows.
A minimal example can be found here: https://github.com/kinto-b/makepipe_example
## Installation
You can install the released version of `makepipe` from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("makepipe")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("kinto-b/makepipe")
```
## Building a pipeline
To construct a pipeline, one simply needs to chain together a number of `make_with_*()` statements. When the pipeline is run through, each `make_with_*()` block is evaluated if and only if the `targets` are out-of-date with respect to the `dependencies` (and `source` file). But, whether or not the block is evaluated, a segment will be added to the Pipeline object behind the scenes. At the end of the script, once the entire pipeline has been run through, one can display the accumulated Pipeline object to produce a flow-chart visualisation of the pipeline. For example:
``` r
make_with_source(
note = "Clean raw survey data and do derivations",
source = "one.R",
targets = "data/1 data.Rds",
dependencies = c("data/raw.Rds", "lookup/concordance.csv")
)
make_with_recipe(
label = "Merge it!",
note = "Merge demographic variables from population data into survey data",
recipe = {
dat <- readRDS("data/1 data.Rds")
pop <- readRDS("data/pop.Rds")
merged_dat <- merge(dat, pop, by = "id")
saveRDS(merged_dat, "data/2_data.Rds")
},
targets = c("data/2 data.Rds"),
dependencies = c("data/1 data.Rds", "data/pop.Rds")
)
make_with_source(
note = "Convert data from 'wide' to 'long' format",
source = "three.R",
targets = "data/3 data.Rds",
dependencies = "data/2 data.Rds"
)
show_pipeline()
```
```{r out.width = "75%", echo = FALSE, fig.align = "center"}
knitr::include_graphics("man/figures/pipeline_nomnoml_uptodate.png")
```
We can also get an interactive visNetwork widget:
```r
show_pipeline(as = "visnetwork")
```
```{r out.width = "75%", echo = FALSE, fig.align = "center"}
knitr::include_graphics("man/figures/pipeline_visnetwork_uptodate.png")
```
Or a text summary (which can be saved to a .md file),
```r
show_pipeline(as = "text")
#> # Pipeline
#>
#> ## one.R
#>
#> Clean raw survey data and do derivations
#>
#> * Source: 'one.R'
#> * Targets: 'data/1 data.Rds'
#> * File dependencies: 'data/raw.Rds', 'lookup/concordance.csv'
#> * Executed: FALSE
#> * Environment: 0x0000015399acfeb8
#>
#> ## Merge it!
#>
#> Merge demographic variables from population data into survey data
#>
#> * Recipe:
#>
#> {
#> dat <- readRDS("data/1 data.Rds")
#> pop <- readRDS("data/pop.Rds")
#> saveRDS(dat, "data/2_data.Rds")
#> }
#>
#> * Targets: 'data/2 data.Rds'
#> * File dependencies: 'data/1 data.Rds', 'data/pop.Rds'
#> * Executed: TRUE
#> * Execution time: 0.00103879 secs
#> * Result: 0 object(s)
#> * Environment: 0x0000015390c6c568
#>
#> ## three.R
#>
#> Convert data from 'wide' to 'long' format
#>
#> * Source: 'three.R'
#> * Targets: 'data/3 data.Rds'
#> * File dependencies: 'data/2 data.Rds'
#> * Executed: FALSE
#> * Environment: 0x00000153928570f8
```
Once you've constructed a pipeline, you can 'clean' it (i.e. delete all registered targets):
```r
p <- get_pipeline()
p$clean()
```
Then, when you look again at the visualisation, the target nodes will be red not green since they're out-of-date:
```r
show_pipeline()
```
```{r out.width = "75%", echo = FALSE, fig.align = "center"}
knitr::include_graphics("man/figures/pipeline_nomnoml_outofdate.png")
```
And then you can 'rebuild' to re-execute the entire pipeline and re-create the cleaned targets:
```r
p <- get_pipeline()
p$build()
```
Another way to build a pipeline is to add a roxygen header into your .R scripts containing a special `@makepipe` tag along with the `@targets`, `@dependencies`,
and so on. For example, at the top of script `one.R` you might have
```r
#'@title Load
#'@description Clean raw survey data and do derivations
#'@dependencies "data/raw.Rds", "lookup/concordance.csv"
#'@targets "data/1 data.Rds"
#'@makepipe
NULL
```
You can then call `make_with_dir()`, which will construct a pipeline using all
the scripts it finds in the provided directory containing the `@makepipe`
tag.