-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
244 lines (167 loc) · 11.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(crayon.enabled=NULL)
```
# mpipe <img src='man/figures/logo.png' align="right" height="139" />
<!-- badges: start -->
[![R build status](https://github.com/MyKo101/mpipe/workflows/R-CMD-check/badge.svg)](https://github.com/MyKo101/mpipe/actions)
[![Codecov test coverage](https://codecov.io/gh/MyKo101/mpipe/branch/main/graph/badge.svg)](https://codecov.io/gh/MyKo101/mpipe?branch=main)
[![Version Badge](https://img.shields.io/badge/Version-0.0.0.9016-orange.svg)](https://github.com/MyKo101/mpipe)
<!-- badges: end -->
The `mpipe` package is designed to add to extra functionality to the "pipelining" of `tidyverse` style processes. When running a pipeline, it is common to break out of a pipeline to perform other actions, and then to commence a new pipeline. `mpipe` aims to permit a stronger use of pipelining, whilst reducing the need to break out. This piece of software is still under very active development and suggestions for improvements are welcome.
## Installation
You can install the development version of `mpipe` from [GitHub](https://github.com/MyKo101/mpipe) with:
```{r install_mpipe}
# install.packages("devtools")
# devtools::install_github("MyKo101/mpipe")
library(mpipe)
```
The `mpipe` package is not currently available on CRAN.
### Examples
The `palmerpenguins` package provides data on three species of penguins (Gentoo, Adelie and Chinstrap) over three islands (Biscoe, Dream and Torgersen). We'll use the length and depth of their bills. A lot of the examples in this page are rather convoluted (and for many, there are other easier ways to get the same results, but they're here to demonstrate `mpipe` functionality)
```{r load_packages, echo=T, results='hide'}
library(dplyr)
library(ggplot2)
library(magrittr)
penguins <- palmerpenguins::penguins %>%
select(-body_mass_g,-flipper_length_mm) %>%
filter_all(~!is.na(.))
```
## fseq
A little used feature of the `magrittr` package's `%>%` pipe is that is can be used to create functions. By using a `.` as the initial argument, the output will be a function and can be applied to other pieces of data. These kinds of functions are called `fseq` functions (short for functional sequences).
```{r fseq_define_f}
summarise_my_data <- . %>%
group_by(species,island,sex) %>%
summarise_all(mean,.groups="keep")
summarise_my_data
summarise_my_data(penguins)
```
The `mpipe` package includes a couple of extra things that can be done to these kinds of functions, including composition of `fseq` by adding them together. We'll define another `fseq` to filter out some penguins and variables.
```{r fseq_define_g}
filter_my_data <- . %>%
filter(bill_length_mm > 43) %>%
select(species,island,sex,bill_length_mm)
filter_my_data
```
Then we can create the composite functional sequence
```{r fseq_add}
filter_my_data + summarise_my_data
```
We can either assign this to a new function name, or apply it to the data implicitly (using brackets)
```{r fseq_add_apply}
(filter_my_data + summarise_my_data)(penguins)
```
The `mpipe` package also includes a `length()` method for `fseq` which returns the number of elements in the functional sequence, as well as a pair of functions to check whether something is an `fseq`: `is.fseq()` and `is_fseq()` (depending on your own preference). More functions for using `fseq` to their full potential will be added in future releases.
## Side Effects
A large reason for breaking out of a pipeline is to do actions such as plotting or providing feedback to the user. the `mpipe` package provides two functions that are particularly useful for avoiding leaving a pipeline. The way this is done is that the functions side effects are activated, but the functions return an untouched version of the data it was originally provided. This is similar to using the `%T>%` pipe in magrittr, but can be done with the traditional `%>%` pipe and is explicit in the name of the function.
* `pipe_qplot()` - allows the user to create `ggplot2` style plots using `qplot()` syntax (akin to the base R `plot()` syntax). This is essentially a wrapper function that also permits saving and theming of the plots.
* `pipe_cat()` - allows the user to output information to the console (or any other sink) in the much the same way that `cat()` does. Any calls/functions will be evaluated appropriately for this output. Grouped tibbles will be parsed separately.
Here we will create a plot of Bill length against Bill depth stratified by the three grouping variables: `species`, `sex` and `island.` We'll colour-code `species` using `col = species`, and `facets = sex ~ island` will create facets based on these two variables. We will then focus on the Biscoe island penguins (using `filter()`), and output the average Bill length, then stratify it by sex and species. Finally, we will return a dataset that contains the mean of Bill length and depth across these groups.
```{r side_effects_example}
penguins %>%
pipe_qplot(bill_length_mm, bill_depth_mm, col = species,
xlab = "Bill Length (mm)", ylab = "Culmun Depth (mm)",
theme = "light", facets = sex ~ island) %>%
filter(island == "Biscoe") %>%
pipe_cat("Biscoe Average Culmun Length (mm):", mean(bill_length_mm), "\n\n") %>%
group_by(species,sex) %>%
pipe_cat(sex, species, "Average Culmun Length (mm):", mean(bill_length_mm), "\n") %>%
summarise_if(is.numeric, mean)
```
## Control Flow
One of the key aspects of programming is being able to manipulate which lines of code are ran and when. This manipulation is called "Control Flow" (i.e. you control the flow of the processes through your code). The two main methods of this are via branches and loops. Branches force our code to make a choice, where loops allow repetition of the same lines of code.
In traditional R (as in many other languages), Branching is performed using `if()` or `switch()` statements, and can also be vectorised with functions such as `if_else()` and `case_when()`. Loops are implemented using the `for()`, `while()` and `repeat()` statements and can essentially be vectorised using the `apply()` and `map()` families of functions.
the `mpipe` package allows us to integrate this kind of control flow into a pipe by supplying different paths for data to follow based on a predicate, case or condition. Branches are implemented via the `if_branch()` and `switch_branch()` functions, and loops are implemented via the `while_pipe()` function.
### Branches
#### if_branch()
The `if_branch()` works similar to an `if` statement. The function chooses whether to data should proceed down the `fun` branch or the `elsefun` branch (if supplied) depending on how `predicate` is evaluated.
First, we'll define an `fseq` style function, and apply it to the data twice. The `fseq` function will first shuffle the data and then if a male penguin ends up at the top of our list, we'll pick just the Biscoe island penguins and extract the bill length summary statistics, if a female penguin ends up on top, we'll pick out the Biscoe & Dream island penguis and extract the bill depth summary statistics.
```{r if_example_def}
f <- . %>%
sample_n(nrow(.)) %>% #Shuffle the data
if_branch(sex[1] == "MALE", #predicate: Is the top row a male penguin?
. %>% #TRUE path
pipe_cat("Male penguin on top, extracting Bill Length for Biscoe penguins\n") %>%
filter(island == "Biscoe") %>%
pull("bill_length_mm") %>%
summary(), # End of TRUE Path
. %>% #FALSE path
pipe_cat("Female penguin on top, extracting Bill Depth for Biscoe & Dream penguins\n") %>%
filter(island != "Torgersen") %>%
pull("bill_length_mm") %>%
summary()) # End of FALSE Path & end of if_branch()
```
```{r if_example_One}
set.seed(1)
f(penguins)
```
```{r if_example_Two}
set.seed(2)
f(penguins)
```
#### switch_branch()
Similarly, the `switch_branch()` function allows us to expand on this by following a different path depending on the evaluation of `case`. The `case` argument will be evaluated in the context of `data` and shoudl return either a character value or a numeric value. Characters will be matched to the other arguments provided to the `switch_branch()` function and numerics will be matched by position.
This time, we will do similar to the previous example except checking which island ends up on top. We can see from the below (using the `%$%` exposure pipe), that if Torgersen is chosen as our random island, only Adelie penguins will be in our output.
```{r table_species_island}
penguins %$%
table(species,island)
```
This means we can drop this level and output only based on sex. `switch_branch()` will output the data untouched if `case` doesn't match any of the other arguments supplied. So we will adapt our function for this:
```{r switch_example_setup}
f <- . %>%
sample_n(nrow(.)) %>%
pipe_cat(as.character(island[1]),"island is on top, so it will be chosen.\n") %>%
filter(island == island[1]) %>%
group_by(species,sex) %>%
switch_branch(. %>% # What island ends up at the top row?
slice(1) %>% #This time, we write case as an `fseq` as well.
pull("island") %>%
as.character,
Torgersen = . %>%
ungroup %>%
pipe_cat("\t\tSpecies will be dropped.\n") %>%
group_by(sex) %>%
select(-species)) %>%
summarise_if(is.numeric,mean)
```
```{r switch_example1}
## Force the "random" shuffle to put a "Biscoe" species is at the top
set.seed(100)
f(penguins)
```
```{r switch_example2}
## Force the "random" shuffle to put a "Torgersen" species is at the top
set.seed(1000)
f(penguins)
```
### Loops
In terms of loops, we introduce the `while_pipe()` function which acts similar to a `while()` loop, but within a pipeline. It will repeatedly apply the `fun` argument to the `data` until `cond` is not `TRUE`. `cond` will be evaluated within the context of `data` at each iteration.
For this example, we're simply going to shuffle until the first penguin is a female and then output the top 5 results.
```{r while_example1}
penguins %>%
while_pipe(sex[1] != "female",
. %>%
sample_n(nrow(.))) %>%
slice(1:5)
```
The `while_pipe()` function also provides a `.counter` pronoun to keep track of how many times the loop as been run. This can be used within `cond` and within `fun` and even allows for simple leves of tidy evaluation.
```{r while_example2}
tibble(x = runif(5)) %>%
while_pipe(.counter <= 5,
. %>%
mutate(!!paste0("x_",.counter) := x - x[.counter]))
```
## Code of Conduct
Please note that the mpipe project is released with a [Contributor Code of Conduct](https://michaelbarrowman.co.uk/mpipe/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
## References
`palmerpenguins`
Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. [https://doi.org/10.1371/journal.pone.0090081](https://doi.org/10.1371/journal.pone.0090081)