-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
476 lines (367 loc) · 22.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
---
author: Emilio Berti
output: github_document
bibliography: references.bib
---
[![R-CMD-check](https://github.com/emilio-berti/assembly/workflows/R-CMD-check/badge.svg)](https://github.com/emilio-berti/assembly/actions)
[![DOI](https://zenodo.org/badge/454057297.svg)](https://zenodo.org/badge/latestdoi/454057297)
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
fig.width = 4,
fig.height = 4,
fig.align='center',
dpi = 300,
out.width = "50%"
)
```
# Getting started
## Background
Following food web terminology, I talk here about _resources_ and _consumers_. In the case of plant-pollinator networks, this is analogous as replacing _resources_ with _plants_ and _consumers_ with _pollinators_. In this case, however, some filtering steps may be unnecessary or unwanted. __More about this in a separate section.__ I also talk about metawebs, which are the regional food web networks that emerge considering all interactions that occur in local communities within the region of interest.
The main scope of _assembly_ is to simulate top-down assembly processes. Top-down assembly means that all species are introduced into a local community at once. The opposite, i.e. communities are built by sequential introductions of one species, is called bottom-up assembly. Bottom-up assembly is problematic for community composed of many species, as the number of unique assembly sequences is _S!_, where _S_ is the number of species in the metaweb. If priority effects (historical contigencies) are important, @song2021untangling found that this number is (much) higher, roughly:
$$
\frac{2^{2^{S - 2}S^2}e^S}{\sqrt{2\pi}S^{S + 0.5}}
$$
Thus, for a conservative scenario where priority effects are not important in a 100 species community, there are $\sim 10^{157}$ unique sequences, making computations (and replications) virtually impossible. With priority effects, "already with 6 species, the diversity is significantly greater than the total number of atoms in the entire universe" (@song2021untangling).
Strikingly, @servan2021tractable showed that bottom-up and top-down assembly are equivalent under some specific conditions. Far from assuming this is the case in _assembly_, for which more research is needed, I simply want to highlight that _assembly_ only implements top-down assembly and that none of the procedures in _assembly_ can be considered steps in an ecological sequence. Whenever I talk about _steps_, _moves_, and _sequences_ in _assembly_, I always refer to _procedures steps_, _procedures moves_, and _procedures sequences_. Always keep that in mind and don't be fooled by the terminology; there is no bottom-up assembly in _assembly_.
At this point, if you're like me, you will ask _why not bottom-up assembly?_ The simple answer is: because it is computational unfeasible for large communities. To make it feasible, it is tempting to restrict the possible sequences to a (very) narrow space, only to 1,000,000 sequences ($10^{-152}$ of the total fraction of possible sequences for 100 species; much _much_ smaller than considering only a grain of sand for simulating the whole Earth). As this seems quite dangerous, potentially leading to biased conclusions, I completed neglected bottom-up assembly processes.
## General workflow of _assembly_
In general, _assembly_ implements this pipeline:
1. Draw random species from a metaweb.
2. Impose resource filtering, i.e. each basal species must be consumed and each consumer must feed on a resource.
3. Impose limiting similarity filtering, i.e. consumers are replaced by others, depending on a probability distribution proportional to their similarity of interactions.
These steps are not necessarily sequential, i.e. limiting similarity can be imposed independently from the resource filtering, but this is programmatically harder. Because ecological communities are likely filtered by resource availability before interaction competition, I wrote _assembly_ in a way that is straightforward to pass the output of the resource filtering procedure to the limiting similarity one. It is, however, possible to omit/invert procedures by performing extra-checks on input/output data and implementing few pre-procedure steps. I will show how to implement some of them at the end.
# Draw random species from a metaweb
Loading _assembly_ and set a random seed
```{r init}
library(assembly)
library(igraph)
set.seed(123)
```
Load the dataset _adirondack_ that comes with _assembly_:
```{r adirondack}
data(adirondack)
```
_adirondack_ is the Adirondack Lakes metaweb as obtained from the GATEWAy database.
`draw_random_species(n, sp.names)` draws random species from a metaweb and requires as input the number of species to draw (*n*) and the species names (*sp.names*). If the metaweb does not have species names, you can assign random ones with `name_metaweb(metaweb)`:
```{r name}
colnames(name_metaweb(matrix(as.numeric(runif(100) > .5), 10, 10)))
```
If names are already present, this will throw an error:
```{r name_error}
tryCatch(name_metaweb(adirondack),
error = function(e) print(e))
```
When the metaweb has names, define the number of species for the local community:
```{r S}
S <- 50 #species richness
```
And draw a random community, use the function `draw_random_species()`:
```{r random}
sp <- draw_random_species(S, colnames(adirondack))
show_graph(sp, adirondack)
```
## Hidden functions
There are several hidden functions in _assembly_. The reason there are hidden functions is because there is no need to call them directly. Hidden functions can be accessed by prefixing the `assembly:::` (three colon, not two). All hidden functions start with a dot `.`, e.g. `assembly:::.basals()`.
In general, you should not be bothered by hidden functions and should not call them directly, unless you have a good understanding of how they operate. Nevertheless, I summarize them for clarity.
`assembly:::.basals()` get all basal species in the metaweb, and is equivalent to subset the names of the metaweb where `colSums(metaweb) == 0`:
```{r basals}
identical(
sort(intersect(assembly:::.basals(adirondack), sp)),
sort(intersect(colnames(adirondack)[colSums(adirondack) == 0], sp))
)
```
`assembly:::.consumers()` and `assembly:::.top()` return the consumers and top consumers of the metaweb, respectively.
`assembly:::.find_isolated()` returns the species that are isolated in the local community:
```{r find_isolated}
assembly:::.find_isolated(sp, adirondack)
```
`assembly:::.find_replacements()` find suitable replacement for the isolated species:
```{r find_replacements}
assembly:::.find_replacements(sp,
assembly:::.find_isolated(sp, adirondack),
adirondack,
keep.n.basal = TRUE)
```
If _keep.n.basal_ is TRUE (default = FALSE), then the original number of basal species will not change.
`assembly:::.move()` performs a move in the limiting similarity procedure (more about this later):
```{r move_fail}
tryCatch(assembly:::.move(sp, adirondack, t = 1),
error = function(e) print(e))
```
This call to `assembly:::.move()` fails because isolated species are detected in the input. This is a desired property of the function, i.e. it fails when there is an unexpected behavior. All hidden functions have some kind of behavior-check, which is a safety net to assure the code is doing what you asked for.
Finally, `assembly:::.components()` returns the number of connected components in the graph of the local community:
```{r components}
assembly:::.components(sp, adirondack)
```
Usually, a proper food web has only one component, i.e. all species are connected by a path. Having more than one component means that the food web is actually made of several disconnected communities. In the case above, it also means that at least one of this disconnected communities is composed of only one isolated species.
# Resource filtering
To impose the resource filtering, call the function `resource_filtering()`. This takes as input the species names (*sp.names*), the metaweb (*metaweb*), and an optional argument _keep.n.basal_ to specify weather the original number of basal species should be kept constant (default = `FALSE`). **NOTE this may not be implemented correctly**
Behind the curtain, `resource_filtering()` calls the hidden functions as a way to compress code and make it consistent. That's why you shouldn't bother too much about hidden functions: they're there because they're useful in the development of the package, rather than for your usage. If they're useful for you and you understand how they work, use them.
```{r resource}
sp_resource <- resource_filtering(sp, adirondack, keep.n.basal = TRUE)
show_graph(sp_resource, adirondack)
```
Now the local community is fully connected, i.e. basal species always have a consumer and consumers always have an available resource. It's possible to check this manually calling the hidden functions and working on the adjacency matrix of the local community:
```{r check_connected}
bas <- intersect(sp_resource, assembly:::.basals(adirondack))
cons <- intersect(sp_resource, assembly:::.consumers(adirondack))
all(rowSums(adirondack[bas, cons]) > 0)
all(colSums(adirondack[union(bas, cons), cons]) > 0)
```
Usually you don't need to perform these checks, as I implemented them within `resource_filtering()`. I also implemented a check for disconnected components, to make sure that the resulting community has no isolated species and only one actual community.
Bonus: because of these checks, now it is safe to perform a move of the limiting similarity procedure:
```{r move}
assembly:::.move(sp_resource, adirondack, t = 1)
```
# Limiting similarity filtering
The limiting similarity filtering is a series of individuals moves. Each move is composed of three steps:
1. A metric ($J$) representing the similarity of interaction is calculated for each species in the community.
2. One species is removed with probability proportional to their $J$.
3. The species removed is replaced by another species selected at random from the metaweb.
Each move can then be accepted of discarded based on a probabilistic acceptance criterion, the Metropolis-Hasting algorithm (see below for details). If the move is accepted, the new community will differ from the starting one only in the removed/replaced species. If the move is discarded, the new community is identical to the starting one. Each move takes as input the community of the previous move (or the initial community for the first move) and returns as output the new community (or the original one).
The limiting similarity procedure is simply a series of moves (accepted or not). The starting community for the limiting similarity procedure is usually a community that has undergone already a resource filtering, imposing thus that trophic competition comes after resource availability. However, it is possible to skip the resource filtering step; the only requirement of the starting community for the limiting similarity filtering is that it does not have isolated species and has only one connected components.
To impose the limiting similariy filtering, call the function `similarity_filtering()`. This function requires the species names as input (*sp.names*), the metaweb (*metaweb*), the argument _t_ (default = 0), which is the temperature of the Metropolis-Hastings algorithm, and _max.iter_ (default = 1,000), which is the maximum number of moves allowed.
```{r limiting}
sp_sim <- similarity_filtering(sp_resource, adirondack, t = 1, max.iter = 10)
show_graph(sp_sim, adirondack)
```
## Metropolis-Hasting algorithm
To accept a move, it must pass a Metropolis-Hasting algorithm. If the similarity of the new community is lower than the similarity of the old one, the move is always accepted. When the new similarity is higher than the old similarity, the move is accepted if:
$$
exp \left[ \left( 1 - \frac{similarity_{new}}{similarity_{old}} \right) \frac{1}{t} \right]> \mathcal{U}(0, 1)
$$
This means that, even when the new similarity is higher than the old one and the new community has species with increased similarity of interaction, the move can still be accepted as valid depending on a probability density function. The probability of acceptance depends on how much the new similarity is higher than the old one and by the temperature parameter _t_. For increasing _t_, it is more likely to accept a non-favorable move:
```{r metro, fig.height=5, fig.width=6, out.width="80%"}
temp <- 10 ^ seq(-2, 1, by = .1)
ratio <- 10 ^ seq(-1, 2, by = .1)
move <- rep(NA, length(temp) * length(ratio))
i <- 1
for (t in temp){
for (x in ratio) {
move[i] <- metropolis.hastings(1, x, t)
i <- i + 1
}
}
d <- data.frame(Temp = rep(temp, each = length(ratio)),
Ratio = rep(ratio, length(temp)),
Move = as.numeric(move))
cols <- colorRampPalette(c("steelblue", "tomato"))
cols <- cols(length(unique(d$Temp)))
pal <- adjustcolor(cols, alpha.f = .2)
pal <- pal[sapply(d$Temp, \(x) which(unique(d$Temp) == x))]
plot(log10(d$Ratio), jitter(d$Move, factor = .2),
xlab = "Similiarity ratio (new / old)",
ylab = "Accepted move",
col = pal, pch = 20, frame = FALSE)
for (t in unique(d$Temp)) {
x <- log10(d$Ratio[d$Temp == t])
y <- d$Move[d$Temp == t]
fit <- loess(y ~ x)
lines(fit$x, fit$fitted, col = cols[which(unique(d$Temp) == t)])
}
legend(-1, .7, legend = seq(-2, 1, by = .5),
fill = colorRampPalette(c("steelblue", "tomato"))(7),
title = "log10(t)")
```
The reason to include this algorithm is to avoid a purely deterministic procedure and include some stochasticity in the process. However, if this is unwanted, it can be removed (and the process made purely deterministic), by specifying a very high temperature parameter _t_:
```{r tweak_t}
table(sapply(seq_len(1000), \(x) metropolis.hastings(1, 1e3, t = 1e9)))
```
## Similarity trend by move
```{r sim-trend}
get_similarity <- function(sp) {
g <- graph_from_adjacency_matrix(adirondack[sp, sp])
consumers <- intersect(sp, assembly:::.consumers(adirondack))
J <- similarity(
g,
vids = which(sp %in% consumers)
)
diag(J) <- NA
J <- sum(J, na.rm = TRUE)
return(J)
}
temps <- 10 ^ seq(-3, 0, by = 1)
sim <- matrix(rep(NA, 300 * length(temps)), ncol = length(temps))
for (t in seq_along(temps)) {
sp <- sp_resource
for (i in seq_len(nrow(sim))) {
sim[i, t] <- get_similarity(sp)
sp <- assembly:::.move(sp, adirondack, t = temps[t])
}
}
```
```{r sim-trend-plot, fig.height=5, fig.width=6, out.width="80%"}
pal <- colorRampPalette(colors = c("steelblue", "tomato"))(length(temps))
plot(seq_len(nrow(sim)), sim[, 1], frame = FALSE,
xlab = "Move number", ylab = "Similarity",
type = "l", col = pal[1], lwd = 2,
ylim = c(min(sim), max(sim) * 1.25))
for (i in 2:length(temps)) {
lines(seq_len(nrow(sim)), sim[, i], col = pal[i], lw = 2)
}
legend(x = 70, y = 130, fill = pal, legend = temps,
title = "Temperature", horiz = TRUE,
box.lwd = 0, bg = NA)
```
## Verbose output
`similarity_filtering()` has the options to return verbose output. In this case, the returned object is not the vector of the name of the species, but a list with:
- _species_, the vector with the species names
- _mean.niche_, a vector with containing the mean of the trophic levels of resources for each consumers, averaged over the whole community
- _var.niche_, a vector with containing the variance of the trophic levels of resources for each consumers, averaged over the whole community
- _similarity_ the similarity statistics used to evaluate the moves.
Each vector, except _species_, has length = _max.iter_. In the case a $i$ move was rejected, the `similarity[i] = 0`.
```{r limiting_verbose}
sp_sim <- similarity_filtering(sp_resource, adirondack, t = 1, max.iter = 10,
output.verbose = TRUE)
class(sp_sim)
names(sp_sim)
```
This can be useful to check trends in niche packing when species similarity changes. For instance, for increasing similarity, species niche variance should decrease, implying that niches of different species overlap less.
```{r limiting_verbose_plot_var, fig.height=5, fig.width=6, out.width="80%", echo=FALSE}
plot(sp_sim$var.niche, pch = 20, col = "grey20",
xlab = "Move", ylab = "Niche variance",
frame = FALSE, axes = FALSE)
axis(1, at = seq(1, 10))
axis(2, at = seq(0.023, 0.037, by = 0.004))
grid()
```
Average niche, however, may change more subtly. For example, decreasing similarity may imply decreasing average niche in the case that lower trophic levels are largely empty. However, it may also imply increasing average niche, e.g. by a new top consumer.
```{r limiting_verbose_plot_mean, fig.height=5, fig.width=6, out.width="80%", echo=FALSE}
plot(sp_sim$mean.niche, pch = 20, col = "grey20",
xlab = "Move", ylab = "Niche average",
frame = FALSE, axes = FALSE)
axis(1, at = seq(1, 10))
axis(2, at = seq(0.023, 0.037, by = 0.004))
grid()
```
# Filtering algorithms and modularity
Jaccard similarity is often used to detect modules in community detection problems. The rationale is that similarity is high within modules and low between modules. This implies that the limiting similarity filtering, in its purely deterministic version ($t = 0$), it will converge to food webs that are highly modular, as this minimize the total Jaccard similarity of the network. This effect is more pronounced in sparser networks, that is in food webs with lower connectance.
To illustrate this, I generate 200 random networks for five different connectance levels and calculate the average similarity of the network as well as its modularity metric:
```{r modularity}
REPS <- 200
CONNECTANCE <- seq(.1, .3, by = .05)
modularity <- matrix(rep(NA, REPS * length(CONNECTANCE)), nrow = REPS)
similarity <- matrix(rep(NA, REPS * length(CONNECTANCE)), nrow = REPS)
for (C in CONNECTANCE) {
for (rep in seq_len(REPS)) {
m <- matrix(as.numeric(runif(1e4) < C), 1e2, 1e2)
g <- graph_from_adjacency_matrix(m)
members <- membership(cluster_fast_greedy(as.undirected(g)))
modularity[rep, which (CONNECTANCE == C)] <- modularity(as.undirected(g), members)
sim <- similarity(g, mode = "in", method = "jaccard")
similarity[rep, which (CONNECTANCE == C)] <- mean(sim, na.rm = TRUE)
}
}
```
```{r modularity-plot, fig.height=5, fig.width=6, out.width="80%", echo=FALSE}
d <- data.frame(
ID = rep(seq_len(REPS), length(CONNECTANCE)),
Connectance = rep(CONNECTANCE, each = REPS),
Similarity = as.vector(similarity),
Modularity = as.vector(modularity)
)
pal <- colorRampPalette(c("steelblue", "goldenrod"))(length(CONNECTANCE))
cols <- pal[sapply(d$Connectance, function(x) which(CONNECTANCE == x))]
cols <- adjustcolor(cols, alpha.f = .2)
plot(d$Modularity ~ d$Similarity,
xlab = "Average Jaccard similarity",
ylab = "Modularity coefficient",
col = cols, pch = 20, frame = FALSE)
grid()
for (C in CONNECTANCE) {
m <- lm(Modularity ~ Similarity, data = d[d$Connectance == C, ])
segments(x0 = min(m$model$Similarity),
x1 = max(m$model$Similarity),
y0 = m$coefficients[[1]] + m$coefficients[[2]] * min(m$model$Similarity),
y1 = m$coefficients[[1]] + m$coefficients[[2]] * max(m$model$Similarity),
lw = 3, col = "grey20")
}
```
# Integrating food web dynamics
Until now, we focused on assembly processes and how this influence the topology of the network. It is possible to integrate the assembly with food web dynamics, e.g. comparing dynamics between filtering processes. I collaborated to the _ATNr_ package to solve food web dynamical systems.
Create a synthetic metaweb with virtual species:
```{r traits}
library(ATNr)
S <- 200
traits <- data.frame(
species = sapply(seq_len(S), \(x){
paste(sample(letters, 10, replace = TRUE), collapse = "")
}),
masses = 10 ^ runif(S, 0, 2), #log-uniform
biomasses = runif(S, 2, 5),
role = sapply(seq_len(S), \(x) ifelse(runif(1) > .7, "basal", "consumer"))
)
traits <- traits[order(traits$masses), ]
metaweb <- create_Lmatrix(traits$masses,
sum(traits$role == "basal"),
Ropt = 10,
th = .1)
sum(colSums(metaweb) == 0) == sum(traits$role == "basal")
metaweb[metaweb > 0] <- 1
colnames(metaweb) <- traits$species
rownames(metaweb) <- traits$species
show_fw(colnames(metaweb), metaweb)
```
Draw random species for a local community of 30 species and impose sequentially the resource filtering and the limiting similarity filtering:
```{r simulations}
sp <- draw_random_species(30, colnames(metaweb))
length(setdiff(sp, assembly:::.basals(metaweb)))
sp_resource <- resource_filtering(sp, metaweb, keep.n.basal = TRUE)
length(setdiff(sp_resource, assembly:::.basals(metaweb)))
sp_limiting <- similarity_filtering(sp_resource, metaweb, t = 1e6, max.iter = 1e2)
length(setdiff(sp_limiting, assembly:::.basals(metaweb)))
show_graph(sp, metaweb, title = "Random")
show_graph(sp_resource, metaweb, title = "Resource filtering")
show_graph(sp_limiting, metaweb, title = "Limiting similarity")
```
Create and initialize the ATNr models:
```{r atnr}
nb_b <- length(setdiff(sp, assembly:::.basals(metaweb)))
nb_s <- 30
dyn_random <- create_model_Unscaled(nb_s, nb_b,
traits$masses[traits$species %in% sp],
metaweb[sp, sp])
dyn_res <- create_model_Unscaled(nb_s, nb_b,
traits$masses[traits$species %in% sp_resource],
metaweb[sp_resource, sp_resource])
dyn_lim <- create_model_Unscaled(nb_s, nb_b,
traits$masses[traits$species %in% sp_limiting],
metaweb[sp_limiting, sp_limiting])
# default parameters
initialise_default_Unscaled(dyn_random)
initialise_default_Unscaled(dyn_res)
initialise_default_Unscaled(dyn_lim)
# initialize C++ fields
dyn_random$initialisations()
dyn_res$initialisations()
dyn_lim$initialisations()
```
And call the solver to obtain the dynamics:
```{r solver, fig.height=5, fig.width=6, out.width="80%"}
times <- seq(1, 1e9, 1e7)
sol_random <- lsoda_wrapper(times, traits$biomasses[traits$species %in% sp],
dyn_random)
sol_res <- lsoda_wrapper(times, traits$biomasses[traits$species %in% sp_resource],
dyn_res)
sol_lim <- lsoda_wrapper(times, traits$biomasses[traits$species %in% sp_limiting],
dyn_lim)
plot_odeweb(sol_random, nb_s)
plot_odeweb(sol_res, nb_s)
plot_odeweb(sol_lim, nb_s)
```
The number of extinct species can be extracted by:
```{r extinctions}
sum(sol_random[length(times), -1] < 1e-6)
sum(sol_res[length(times), -1] < 1e-6)
sum(sol_lim[length(times), -1] < 1e-6)
```
As well as the final total biomass of all consumers combined:
```{r biomasses}
sum(sol_random[length(times), (nb_b + 2) : nb_s])
sum(sol_res[length(times), (nb_b + 2) : nb_s])
sum(sol_lim[length(times), (nb_b + 2) : nb_s])
```
# References