-
Notifications
You must be signed in to change notification settings - Fork 14
/
function-vector.Rmd
198 lines (133 loc) · 5.75 KB
/
function-vector.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# Vector functions
```{r message = FALSE, warning = FALSE}
library(tidyverse)
```
**Vector functions** take a vector as input and produce a vector of the same length as output.
```{r echo=FALSE}
knitr::include_graphics("images/vector-function.png", dpi = image_dpi)
```
Vector functions make working with vectors easy. For example, `log10()`, like most mathematical functions in R, is a vector function, which allows you to take the log of each element in a vector all at once.
```{r}
x <- c(5, 2, 1)
log10(x)
```
The simple mathematical operators are also vector functions:
```{r}
y <- c(1, 2, 4)
x + y
x * y
```
In contrast, functions that can only take a length one input and produce a length one output are called **scalar functions**.
```{r echo=FALSE}
knitr::include_graphics("images/scalar-function.png", dpi = image_dpi)
```
As you'll see in the next section, the distinction between scalar and vector functions is important when working with tibbles.
## Temperature recommendations
A common way to create a scalar function is by using an if-else statement. For example, you might write the following function that tells you what to do based on the temperature outside:
```{r}
recommendation_1 <- function(x) {
if (x >= 90) {
"locate air conditioning"
} else if (x >= 60) {
"go outside"
} else if (x >= 30) {
"wear a jacket"
} else if (x >= 0) {
"wear multiple jackets"
} else {
"move"
}
}
```
This works well when applied to single values,
```{r}
recommendation_1(92)
recommendation_1(34)
recommendation_1(-15)
```
but fails when applied to a vector with more than one element.
```{r}
temps <- c(1, 55, 101)
recommendation_1(temps)
```
`if` can only handle a single value, but we gave `recommendation_1()` a vector. Instead of producing an error, `if` just processed the first element of that vector and gave us a warning.
```{r echo=FALSE}
knitr::include_graphics("images/scalar-function-vector.png", dpi = image_dpi)
```
## Vector functions and `mutate()`
Recall that a tibble is a list of vectors. Each column of the tibble is a vector, and all these vectors have to be the same length. New columns must also be vectors of the same length, which means that when you use `mutate()` to create a new column, `mutate()` has to create a new vector of the correct length.
If you want, you can actually explicitly give `mutate()` a vector of the correct length.
```{r}
set.seed(128)
df <- tibble(temperature = sample(x = -15:110, size = 5, replace = TRUE))
df %>%
mutate(new_column = c(1, 2, 3, 4, 5))
```
You can also give `mutate()` a single value, and it will repeat that value until it has a vector of the correct length.
```{r}
df %>%
mutate(one_value = 1)
```
However, if you try to give `mutate()` a vector with a length other than 1 or `nrow(df)`, you'll get an error:
```{r, error=TRUE}
df %>%
mutate(two_values = c(1, 2))
```
`mutate()` doesn't know how to turn a length 2 vector into a vector that has a value for each row in the tibble.
As you already know, you usually create new columns by applying functions to existing ones. Say we want to convert our temperatures from Fahrenheit to Celsius.
```{r}
fahrenheit_to_celsius <- function(degrees_fahrenheit) {
(degrees_fahrenheit - 32) * (5 / 9)
}
df %>%
mutate(temperature_celsius = fahrenheit_to_celsius(temperature))
```
When you reference a column inside `mutate()`, you reference the entire vector. So when we pass `temperature` to `fahrenheit_to_celsius()`, we pass the entire `temperature` vector.
```{r echo=FALSE}
knitr::include_graphics("images/mutate.png", dpi = image_dpi)
```
Mathematical operations are vectorized, so `fahrenheit_to_celsius()` takes the `temperature` vector and return a vector of the same length.
```{r echo=FALSE}
knitr::include_graphics("images/mutate-2.png", dpi = image_dpi)
```
`mutate()` then takes this new vector and successfully adds a column to the tibble.
```{r echo=FALSE}
knitr::include_graphics("images/mutate-3.png", dpi = image_dpi)
```
You can probably predict now what will happen if we try to use our scalar function, `recommendation_1()`, in the same way:
```{r}
df %>%
mutate(recommendation = recommendation_1(temperature))
```
`mutate()` passes the entire `temperature` vector to `recommendation_1()`, which can't handle a vector and so only processes the first element of `temperature`.
```{r echo=FALSE}
knitr::include_graphics("images/mutate-scalar.png", dpi = image_dpi)
```
However, because of how `mutate()` behaves when given a single value, the recommendation for the first temperature is copied for every single row.
```{r echo=FALSE}
knitr::include_graphics("images/mutate-scalar-2.png", dpi = image_dpi)
```
This isn't very helpful, because now our tibble gives the same recommendation for every temperature.
## Vectorizing if-else statements
There are several ways to vectorize `recommendation_1()` so that it gives an accurate recommendation for each temperature in `df`.
First, there's a vectorized if-else function called `if_else()`:
```{r}
x <- c(1, 3, 4)
if_else(x == 4, true = "four", false = "not four")
```
However, in order to rewrite `recommendation_1()` using `if_else()`, we'd need to nest `if_else()` repeatedly and the function would become difficult to read. Another vector function, `case_when()`, is a better option.
```{r}
recommendation_2 <- function(x) {
case_when(
x >= 90 ~ "locate air conditioning",
x >= 60 ~ "go outside",
x >= 30 ~ "wear a jacket",
x >= 0 ~ "wear multiple jackets",
TRUE ~ "move"
)
}
recommendation_2(temps)
df %>%
mutate(recommendation = recommendation_2(temperature))
```
For other helpful vector functions, take a look at the _Vector Functions_ section of the [dplyr cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf).