-
Notifications
You must be signed in to change notification settings - Fork 1
/
00_02_programming_practical.qmd
263 lines (178 loc) · 8.66 KB
/
00_02_programming_practical.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "P02. Programming Skills"
---
In this practical we will gain some experience using
1. control statements (for loops, if/else statements etc.),
2. Functions,
3. Installing a package
## (1) Control Statements
There are often multiple ways in R to calculate what you want - what you choose will be down to personal preference but there will be some guiding principles:
1. how fast is the code?
2. how easy is it to read - can other people understand what you've done and can you debug it easily?
Here we'll do the same tasks using three different ways to demonstrate the logics behind these guiding principles. The task involves calculating the R0 for a range of infectiousness durations and transmission rates (i.e. beta) and calculate the probability of an epidemic.
First, let's install and load our packages we will need
```{r eval = F}
install.packages("tictoc")
install.packages("purrr")
library(tictoc)
library(purrr)
```
```{r echo = F}
library(tictoc)
library(purrr)
```
Notice that at the start and end of "ways" we implement below, we have `tic()` and `toc()` - these are commands to tell R to, respectively, start and stop a timer and output how long each code chunk took. Therefore, to compare the time taken to run each script, you will need to select everything from `tic()` to `toc()` and press Cmd-Enter
**Question (a) Why do you initialise variables at the start of the script?**
Answer:
### The first way - using loops
```{r}
tic("loop") # start a timer called 'loop' to see how quickly it runs
# intitialise variables
infectiousness.duration <- 1:10
beta <- seq(0.1, 0.5, by = 0.1)
epidemic <- matrix(NA,nrow=length(infectiousness.duration), ncol=length(beta))
# set our loop over infectious duration and transmission rate (beta)
for (index_i in 1:length(infectiousness.duration)){
for (index_j in 1:length(beta)){
#calculate R0 for each combination
R0 <- beta[index_j] * infectiousness.duration[index_i]
# evaluate whethere there is an epidemic - R0>=1
if (R0 >= 1){
epidemic[index_i, index_j] <- 1
} else{
epidemic[index_i, index_j] <- 0
}
}
}
# calculate the proportion of the values that leads to an epidemic
mean(epidemic)
toc() # output the time that the script took
```
Instead of accessing each element of a matrix via for loops, R can also apply operations to matrices or dataframe as chunks. This can speed up the code, reduce the amount of code, and can make it easier to read.
### The second way - vectorisation
here is an example of using "expand.grid" to enumerate all the combinations of the two parameters
```{r}
tic("vectorised") # start a timer called 'vectorised' to see how quickly it runs
infectiousness.duration <- 1:10
beta <- seq(0.1, 0.5, by = 0.1)
epidemic <- expand.grid(beta.val = beta, id.val = infectiousness.duration)
# the '*' operator can then be used on the columns of the data.frame to create another column called 'R0'
epidemic[, "R0"] <- epidemic$beta.val * epidemic$id.val
# we can also check the proportion of the combinations that give rise to epidemics
mean(epidemic$R0 >= 1)
toc()
```
### The third way - "mapping"
Let's calculate the same thing finally using the `map` function in the package `purrr`
```{r}
tic("map") # start a timer called 'vectorised' to see how quickly it runs
infectiousness.duration <- 1:10
beta <- seq(0.1, 0.5, by = 0.1)
parametervals <- expand.grid(beta.val = beta, id.val = infectiousness.duration)
epidemic <- unlist(purrr::map2(.x = parametervals[,"id.val"],
.y = parametervals[,"beta.val"],
.f = ~(.x * .y)))
mean(epidemic >= 1)
toc()
```
How did each of the three scripts do according to our 2 criteria?
i) how fast is the code?
ii) how easy is it to read - can other people understand what you've done and can you debug it easily?
Which would you choose and why?
## (2) Functions
For a simple "SIR" model, we can calculate the fraction of a population at the end of an epidemic that remain susceptible. We can calculate this via what is termed the 'final size' equation and this is written as:
$$
ln(S_{inf}) = R0*(S_{inf} - 1)
$$
That is,the natural log of the proportion susceptible at the end of the epidemic is equal to R0 multiplied by 1 minus the proportion susceptible at the end of the epidemic. How do we find the solution i.e. What is $S_{inf}$ for each value of R0? Let's turn this expression into our own R function so we can solve it.
```{r}
final.size.root <- function(s.inf) {
final.size <- R0*(s.inf - 1) - log(s.inf)
return(final.size)
}
```
We have set the final.size.root such that whenever it evaluates 0, we have found a solution. R has some in built functions to help you solve this equation. Let's use `uniroot` to find a solution. We want an answer bigger or equal to 0 and less than 1. First have a look what uniroot does, by running
```{r eval = F}
?uniroot
```
Take a look at the 'Value' that uniroot provides: 'A list with at least four components'. To display the value of the root, we need to tell R to use the output called \`root\` of the function. We can do this by using the '\$' notation below:
```{r}
sol.root <- uniroot(final.size.root, c(0,0.9999))$root
print(sol.root)
```
**Question (b) Did you expect R to give you an answer?**
Answer:
**Question (c) What variables did it use to evaluate the function?**
Answer:
We have to make sure that we either
(i) define all the variables that a function needs within the function itself or
(ii) pass these variables as arguments. Let's try both ways.
First, rewrite the function so that R0 is defined within the function:
```{r}
final.size.root <- function(s.inf) {
R0 <- 2
final.size <- R0*(s.inf - 1) - log(s.inf)
return(final.size)
}
```
This looks like it could be a useful function, perhaps we don't want to have to 'hard code' R. Let's try it a second way so that R0 is passed as an argument to the function. Start by defining a function that takes both R0 and `s.inf` as arguments:
```{r}
final.size.root.twoargs <- function(R0, s.inf){
final.size <- R0*(s.inf - 1) - log(s.inf)
return(final.size)
}
```
Question (d) What are the arguments of this new function?
Answer:
Question (e) What is the output of this new function?
Answer:
Now let's pick a number for R0 that we can easily change, let's call it rep.num
```{r}
rep.num <- 2
```
Question (f) Why have we called this rep.num and not R0?
Answer:
`Uniroot` takes only one argument so we need to wrap our function inside another function that only has one argument. This is how we do it:
```{r}
sol.root <- uniroot(function(s.inf){
return(final.size.root.twoargs(rep.num, s.inf))
},
c(0,0.9999))$root
```
Notice that we have used the keyword 'function' without assigning the function a name (e.g. like we did with final.size). This type of function is called an 'anonymous function' and they are used when you can write a simple function on one line that you do not need to keep using. The output value of the function is the output value of the two args function.
Question (g) What are all the functions that we have used to calculate the root of the equation and how many arguments do they have?
Answer:
`uniroot` only gives us one root (unsurprisingly). `rootSolve` is a package that has functions to solve for multiple roots. You can install this package by:
- Tools \> InstallPackage \> \<type rootSolve\>
- type `install.packages("rootSolve")` in R
Now, simply load the package so your work environment has access to all its functions
```{r}
library("rootSolve")
```
We will now use the `rootSolve` function `uniroot.all` to find all the solutions of the final size equation. Let's use the same syntax as we did before, remembering to define `rep.num` again
```{r}
rep.num <- 2
sol.all.roots <- uniroot.all(function(s.inf){
return(final.size.root.twoargs(rep.num, s.inf))
},
c(0,1))
print(sol.all.roots)
```
Question (h) What is the epidemiological interpretation of these two roots?
Answer:
Finally, let's see how our R0 changes our solution values. We can do this with a loop as above
```{r}
r0.vector <- 1:10
sol.all.roots <- vector(length = length(r0.vector))
for (r0 in r0.vector){
sol.all.roots[r0] <- min(uniroot.all(function(s.inf) final.size.root.twoargs(r0, s.inf), c(0,1)))
}
par(new=FALSE)
plot(r0.vector, sol.all.roots, type= "b", xlim=c(1,10), ylim=c(0,1), ylab = "Proportion of population uninfected", xlab = "R0")
```
Can you rewrite the for loop calculation above in a different way?
**Hint:** use the `purrr::map` function - but note that previously we had 2 variables and used `map2` . Now we only have one variable, so we will use `purrr::map`
```{r eval = F}
sol.all.roots <- YOUR CODE HERE
```
Solutions can be accessed [here](00_02_programming_solutions.qmd).