generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 25
/
20_Evaluation.Rmd
415 lines (282 loc) · 8.37 KB
/
20_Evaluation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
# Evaluation
**Learning objectives:**
- Learn evaluation basics
- Learn about **quosures** and **data mask**
- Understand tidy evaluation
```{r message=FALSE,warning=FALSE}
library(rlang)
library(purrr)
```
## A bit of a recap
- Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.
- Quasiquotation: combine code written by the *function's author* with code written by the *function's user*.
- Unquotation: it gives the *user* the ability to evaluate parts of a quoted argument.
- Evaluation: it gives the *developer* the ability to evluated quoted expression in custom environments.
**Tidy evaluation**: quasiquotation, quosures and data masks
## Evaluation basics
We use `eval()` to evaluate, run, or execute expressions. It requires two arguments:
- `expr`: the object to evaluate, either an expression or a symbol.
- `env`: the environment in which to evaluate the expression or where to look for the values.
Defaults to current env.
```{r}
sumexpr <- expr(x + y)
x <- 10
y <- 40
eval(sumexpr)
```
```{r}
eval(sumexpr, envir = env(x = 1000, y = 10))
```
## Application: reimplementing `source()`
What do we need?
- Read the file being sourced.
- Parse its expressions (quote them?)
- Evaluate each expression saving the results
- Return the results
```{r}
source2 <- function(path, env = caller_env()) {
file <- paste(readLines(path, warn = FALSE), collapse = "\n")
exprs <- parse_exprs(file)
res <- NULL
for (i in seq_along(exprs)) {
res <- eval(exprs[[i]], env)
}
invisible(res)
}
```
The real source is much more complex.
## Quosures
**quosures** are a data structure from `rlang` containing both and expression and an environment
*Quoting* + *closure* because it quotes the expression and encloses the environment.
Three ways to create them:
- Used mostly for learning: `new_quosure()`, creates a quosure from its components.
```{r}
q1 <- rlang::new_quosure(expr(x + y),
env(x = 1, y = 10))
```
With a quosure, we can use `eval_tidy()` directly.
```{r}
rlang::eval_tidy(q1)
```
And get its components
```{r}
rlang::get_expr(q1)
rlang::get_env(q1)
```
Or set them
```{r}
q1 <- set_env(q1, env(x = 3, y = 4))
eval_tidy(q1)
```
- Used in the real world: `enquo()` o `enquos()`, to capture user supplied expressions. They take the environment from where they're created.
```{r}
foo <- function(x) enquo(x)
quo_foo <- foo(a + b)
```
```{r}
get_expr(quo_foo)
get_env(quo_foo)
```
- Almost never used: `quo()` and `quos()`, to match to `expr()` and `exprs()`.
## Quosures and `...`
Quosures are just a convenience, but they are essential when it comes to working with `...`, because you can have each argument from `...` associated with a different environment.
```{r}
g <- function(...) {
## Creating our quosures from ...
enquos(...)
}
createQuos <- function(...) {
## symbol from the function environment
x <- 1
g(..., f = x)
}
```
```{r}
## symbol from the global environment
x <- 0
qs <- createQuos(global = x)
qs
```
## Other facts about quosures
Formulas were the inspiration for closures because they also capture an expression and an environment
```{r}
f <- ~runif(3)
str(f)
```
There was an early version of tidy evaluation with formulas, but there's no easy way to implement quasiquotation with them.
They are actually call objects
```{r}
q4 <- new_quosure(expr(x + y + z))
class(q4)
is.call(q4)
```
with an attribute to store the environment
```{r}
attr(q4, ".Environment")
```
**Nested quosures**
With quosiquotation we can embed quosures in expressions.
```{r}
q2 <- new_quosure(expr(x), env(x = 1))
q3 <- new_quosure(expr(x), env(x = 100))
nq <- expr(!!q2 + !!q3)
```
And evaluate them
```{r}
eval_tidy(nq)
```
But for printing it's better to use `expr_print(x)`
```{r}
expr_print(nq)
nq
```
## Data mask
A data frame where the evaluated code will look first for its variable definitions.
Used in packages like dplyr and ggplot.
To use it we need to supply the data mask as a second argument to `eval_tidy()`
```{r}
q1 <- new_quosure(expr(x * y), env(x = 100))
df <- data.frame(y = 1:10)
eval_tidy(q1, df)
```
Everything together, in one function.
```{r}
with2 <- function(data, expr) {
expr <- enquo(expr)
eval_tidy(expr, data)
}
```
But we need to create the objects that are not part of our data mask
```{r}
x <- 100
with2(df, x * y)
```
Also doable with `base::eval()` instead of `rlang::eval_tidy()` but we have to use `base::substitute()` instead of `enquo()` (like we did for `enexpr()`) and we need to specify the environment.
```{r}
with3 <- function(data, expr) {
expr <- substitute(expr)
eval(expr, data, caller_env())
}
```
```{r}
with3(df, x*y)
```
## Pronouns: .data$ and .env$
**Ambiguity!!**
An object value can come from the env or from the data mask
```{r}
q1 <- new_quosure(expr(x * y + x), env = env(x = 1))
df <- data.frame(y = 1:5,
x = 10)
eval_tidy(q1, df)
```
We use pronouns:
- `.data$x`: `x` from the data mask
- `.env$x`: `x` from the environment
```{r}
q1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))
eval_tidy(q1, df)
```
## Application: reimplementing `base::subset()`
`base::subset()` works like `dplyr::filter()`: it selects rows of a data frame given an expression.
What do we need?
- Quote the expression to filter
- Figure out which rows in the data frame pass the filter
- Subset the data frame
```{r}
subset2 <- function(data, rows) {
rows <- enquo(rows)
rows_val <- eval_tidy(rows, data)
stopifnot(is.logical(rows_val))
data[rows_val, , drop = FALSE]
}
```
```{r}
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))
# Shorthand for sample_df[sample_df$b == sample_df$c, ]
subset2(sample_df, b == c)
```
## Using tidy evaluation
Most of the time we might not call it directly, but call a function that uses `eval_tidy()` (becoming developer AND user)
**Use case**: resample and subset
We have a function that resamples a dataset:
```{r}
resample <- function(df, n) {
idx <- sample(nrow(df), n, replace = TRUE)
df[idx, , drop = FALSE]
}
```
```{r}
resample(sample_df, 10)
```
But we also want to use subset and we want to create a function that allow us to resample and subset (with `subset2()`) in a single step.
First attempt:
```{r}
subsample <- function(df, cond, n = nrow(df)) {
df <- subset2(df, cond)
resample(df, n)
}
```
```{r error=TRUE}
subsample(sample_df, b == c, 10)
```
What happened?
`subsample()` doesn't quote any arguments and `cond` is evaluated normally
So we have to quote `cond` and unquote it when we pass it to `subset2()`
```{r}
subsample <- function(df, cond, n = nrow(df)) {
cond <- enquo(cond)
df <- subset2(df, !!cond)
resample(df, n)
}
```
```{r}
subsample(sample_df, b == c, 10)
```
**Be careful!**, potential ambiguity:
```{r}
threshold_x <- function(df, val) {
subset2(df, x >= val)
}
```
What would happen if `x` exists in the calling environment but doesn't exist in `df`? Or if `val` also exists in `df`?
So, as developers of `threshold_x()` and users of `subset2()`, we have to add some pronouns:
```{r}
threshold_x <- function(df, val) {
subset2(df, .data$x >= .env$val)
}
```
Just remember:
> As a general rule of thumb, as a function author it’s your responsibility
> to avoid ambiguity with any expressions that you create;
> it’s the user’s responsibility to avoid ambiguity in expressions that they create.
## Base evaluation
Check 20.6 in the book!
## Meeting Videos
### Cohort 1
`r knitr::include_url("https://www.youtube.com/embed/4En_Ypvtjqw")`
### Cohort 2
`r knitr::include_url("https://www.youtube.com/embed/ewHAlVwCGtY")`
### Cohort 3
`r knitr::include_url("https://www.youtube.com/embed/0K1vyiV8_qo")`
### Cohort 4
`r knitr::include_url("https://www.youtube.com/embed/kfwjJDuyN8U")`
### Cohort 5
`r knitr::include_url("https://www.youtube.com/embed/WzfD9GK6nCI")`
### Cohort 6
`r knitr::include_url("https://www.youtube.com/embed/8FT2BA18Ghg")`
<details>
<summary> Meeting chat log </summary>
```
01:00:42 Trevin: They just want to help you present that’s all
```
</details>
### Cohort 7
`r knitr::include_url("https://www.youtube.com/embed/g77Jfl_xrXM")`
<details>
<summary>Meeting chat log</summary>
```
00:55:22 collinberke: https://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/wPLrafScijE")`