01_Data_simulation.Rmd

---
title: "Bacterial EF across Scale - Simulations"
output:
html_notebook: default
---

```{r, message=FALSE}
install_if <- function(x) {
  
  if(x %in% rownames(installed.packages()) == FALSE) {
  message(paste(x, "is required but not installed. Installing now"))
  Sys.sleep(1)
  install.packages(x)
  library(x)
  } else{ 
    library(x, character.only=TRUE)}
  }

install_if("dplyr")
install_if("tidyr")
install_if("purrr")
install_if("ggplot2")
install_if("ggbeeswarm")
install_if("broom")
install_if("here")
install_if("pbapply")
install_if("viridis")

source(here("function_plotting_theme.R"))
```

This script exports: 

+ the subplots for Figure 2 (Fig 2 a and b as well as the insets)
+ Fig S8

See script `Multipanel_Fig_2.Rmd` for the assemblage and export of figure 2. 

make folder to export figures
```{r}

if(! dir.exists(here("figures"))){
  dir.create(here("figures"))
}

```


# set up simulations

+ We want to simulate 5 strains of Bacteria, growing in 5 different habitats. 
+ Each Strain differs in its ability to grow on each antibiotic.
+ We grow the bacteria in every possible combination (monocultures and all possible species mixtures)
+ We grow them on every possible combination of habitats
+ We measure EF at different scales going from 1 to 5 habitats  

define number of species and habitats
```{r}

specnum <- 5
habnum <- 5
 
```

## strains

We create 5 species (for `specnum = 5`) named strain_1 to strain_5 and get every possible combination of the five species for all 1,2,3,4 and 5 species combination (corresponding richness levels 1:5 in a biodiversity experiment)

```{r}
spec <- paste("strain_", c(1:specnum), sep = "")
spec_comb <- sapply(c(1:specnum), function(x) combn(spec, x)) 
names(spec_comb) <- paste("richness", c(1:specnum))
```

## environments

We create 5 habitats (for `habnum = 5`) named Habitat_1 to Habitat_5 and get every possible combination of the five habitats for all 1, 2, 3, 4 and 5 habitat combinations (corresponding to the different scales)

```{r}
hab <- paste("Habitat_", c(1:habnum), sep = "")
hab_comb <- sapply(c(1:habnum), function(x) combn(hab, x)) 
names(hab_comb) <- paste("Heterogeneity", c(1:habnum))
```

number of combinations

```{r}
tibble(spec_combs = 
         sum(choose(specnum,1:specnum)),
       hab_combs = 
         sum(choose(habnum,1:habnum)))
```

# step-by-step

In the following we draw a single draw of fitness values for each strain and develop the code step by step. The code that generates the results and figures presented in the paper relies on 1000 different draws and is found under the section `multiple draws from uniform`

### fitness values

We assign each strain a fitness value for each habitat. The values are drawn from a uniform distribution (0,1) for each strain and each environment. 

```{r}

set.seed(1346)

Traits <- matrix(
  nrow = length(hab),
  ncol = length(spec),
  dimnames = list(hab, spec))


Traits[1:nrow(Traits), 1:ncol(Traits)] <-
  runif( n = prod( dim( Traits)), 0, 1)

Traits
```

### calculate plot values

here we calculate the outcome (biomass / 'functioning') of every species combination in each habitat. We calculate the mixture performance as:

**Weighted mean**: The mixture value equals the mean performance of all species, weighted by their relative performance in the habitat

*what the code does*

for each habitat `sapply(env, ...` it cycles through all richness levels `lapply(spec_comb, ...)` and for each richness level it cycles through all species combinations `apply(..., 2, )`.
Then it calculates the mixture performance of that species combination of that richness level in that habitat given the species performance stored in `Traits` 

```{r}

plot_values <- sapply(hab, #for each environment
                      function(x) {
                        lapply(spec_comb, #for each richness level
                               function(y) {
                                 apply(y, 2, function(z){ #for each species combination
                                   mean(sum(Traits[x,z]^2) / abs(sum(Traits[x,z]))) #weighted mean
                                   }
                                 )
                                 }
                        )
                        }
                      )

plot_values_df <- 
  tibble(
    richness = rep( rep(1:specnum, unlist(lapply(spec_comb, ncol))), habnum),
    spec_comb = rep(lapply(spec_comb, 
                           function(x) apply(x, 2, function(y) 
                             paste(y, collapse = " "))) %>% unlist(),
                    habnum),
    habitat = rep(hab, each = sum(choose(specnum,1:specnum))),
    functioning = unlist(plot_values)
    )

```

### plot plot-values

Here we plot the single species and mixture functioning of each species combination in each habitat. The Mixture performance is calculated as the weighted mean of the monoculture performances. 

We show a power-fit (with 2*se CI) and a linear fit, both showing the positive richness ~ functioning relationship (that follows inherently from the way we calculate the mixture values)

```{r}

plot_values_df %>% 
  na.omit() %>% 
  group_by(habitat) %>%
  nest() %>% 
  mutate(power_fit = map(data, ~lm(log(functioning) ~ log(richness), data = .))) %>% 
  mutate(power_fit = map(power_fit, augment, se_fit = TRUE)) %>% 
  unnest(cols = c(data, power_fit)) %>% 
  ggplot(aes(x = richness, y = functioning))+
  geom_point(position = position_jitter(width = 0.1, height = 0), 
             alpha = 1, size = 1, colour = "#377EB8")+
  geom_line(aes(x = richness, y = exp(.fitted)), 
            colour = "black", size = 0.7)+
  geom_smooth(method = "lm", se = F, linetype = "dotted", size = 0.7, colour = "darkred")+
  geom_ribbon(aes(x = richness, 
                  ymin = exp(.fitted - 2*.se.fit), 
                  ymax = exp(.fitted + 2*.se.fit),
                  group = habitat), fill = "grey", alpha = 0.2)+
  facet_wrap(~habitat, scales = "free_y")+
  theme_bw()

```

## Transgressive overyielding

We calculate the transgressive overyielding for all combinations of habitats. 

+ for all levels of habitat heterogeneity (1:5 habitats)
+ for all possible unique combinations of habitats

We calculate the functioning of a given species mixture (or monoculture) in that combination of habitats at that level of habitat heterogeneity as the average functioning of that species mixture in the single habitats. 

**Transgressive overyielding** is calculated as:

+ highest mixture value / maximum monoculture value


```{r, warning=FALSE}

plot_values_wide <- 
plot_values_df %>% 
  spread(habitat, functioning)

#calculate average functioning for each env combination at each scale and store results in dataframe
hab_values <- 
lapply(hab_comb, function(x) { #for all scales (1:5)
  apply(x, 2, function(y) { #for all environmental combinations
   
     df <- data.frame( #average functioning for this environment combination
       hab_comb = paste0(y, collapse = " "),
       habnum = length(y),
       richness = plot_values_wide$richness,
       functioning = rowMeans(plot_values_wide[,y]) #mean of functioning across habitats
                     )
  })
  })

hab_values <- bind_rows(unlist(hab_values, recursive = FALSE))

# calculate BEF metrics for each environment combination
hab_metrics <- 
  hab_values %>% 
  group_by(hab_comb, habnum) %>%
  nest() %>%
  mutate(mix = map_dbl(data, function(x) {x[x$richness == specnum,]$functioning}),
         max_mono = map_dbl(data, function(x) {max(x[x$richness == 1,]$functioning)}),
         transgressive = mix / max_mono)
```

### plot power-fits 

```{r}

hab_values %>%
  group_by(hab_comb, habnum) %>% 
  nest() %>% 
  mutate(power_fit = map(data, ~nls(functioning ~ a*richness^b, data = ., start = list(a = 1, b = 1)))) %>%
  mutate(power_fit = map(power_fit, augment)) %>% 
  unnest(cols = power_fit) %>% 
  ggplot(aes(x = richness, y = functioning, colour = as.factor(habnum)))+
  geom_point(aes(group = hab_comb), 
             size = 1, alpha = 1, 
              position = position_dodge(width = 0.7)
              )+
  geom_line(aes(y = .fitted, group = hab_comb),
            position = position_dodge(width = 0.7)
            )+
  facet_wrap(~habnum)+
  scale_color_manual(values = viridis(habnum+1)[1:habnum], name = "scale")+
  theme_classic()+
  labs(y = "production", x = "species richness")+
  theme(legend.position = "none")
  

```


### plot transgressive overyielding by habitat heterogeneity
```{r}

hab_metrics %>% 
  select(-data) %>% 
  gather(metric, value, -habnum, -hab_comb) %>% 
  filter(metric == "transgressive") %>% 
  ggplot(aes(x = habnum, y = value))+
  geom_point( size = 2, alpha = 1, position = position_jitter(width = 0.1), colour = viridis(6)[3])+
  stat_smooth(method = "lm", se = F, size = 0.5, colour = "darkred")+
  theme_classic()+
  labs(y = "transgressive overyielding", x = "Habitat heterogeneity")+
  theme(legend.position = "none",
        axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")) )+
  scale_y_continuous(limits = c(0, 1.1))+
  NULL

```

```{r}

hab_metrics %>% 
  select(-data) %>% 
  gather(metric, value, -habnum, -hab_comb) %>% 
  #filter(metric == "transgressive") %>% 
  ggplot(aes(x = habnum, y = value, colour = metric))+
  geom_point( size = 2, alpha = 1, position = position_jitter(width = 0.1))+
  stat_smooth(method = "lm", se = F, size = 0.5)+
  theme_classic()+
  labs(y = "transgressive overyielding", x = "Habitat heterogeneity")+
  theme(axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")) )+
  scale_y_continuous(limits = c(0, 1.1))+
  scale_color_viridis_d(option = "C", end = 0.8)
  NULL

```


# multiple draws from uniform

In the above example with took a single draw for the trait values from the uniform distribution. In the code below, we repeat this 1000 times to get a distribution of BEF results from multiple draws

### fitness values
```{r}
Traits <- matrix(
  nrow = length(hab),
  ncol = length(spec),
  dimnames = list(hab, spec))

# take N random draws

set.seed(159)

N <- 1000

Trait_vals <- runif( n = prod( dim( Traits)) * N, 0, 1)

Trait_list <- 
lapply(1:N, function(x) {Traits[1:nrow(Traits), 1:ncol(Traits)] <- Trait_vals[c(((x-1)*25+1) : (x*25))]
return(Traits)})

```


### plot values
```{r}

plot_values_df_list <- 
pblapply(Trait_list, function(TM) {

plot_values <- sapply(hab, #for each environment
                      function(x) {
                        lapply(spec_comb, #for each richness level
                               function(y) {
                                 apply(y, 2, function(z){ #for each species combination
                                         mean(sum(TM[x,z]^2) / abs(sum(TM[x,z])))
                                   }
                                 )
                                 }
                        )
                        }
                      )


plot_values_df <- 
  tibble(
    richness = rep( rep(1:specnum, unlist(lapply(spec_comb, ncol))), habnum),
    spec_comb = rep(lapply(spec_comb, 
                           function(x) apply(x, 2, function(y) 
                             paste(y, collapse = " "))) %>% unlist(),
                    habnum),
    habitat = rep(hab, each = sum(choose(specnum,1:specnum))),
    functioning = unlist(plot_values)
    )

return(plot_values_df)
})
```


### Transgressive overyielding
```{r}

hab_metrics_list_oy <- 
  pblapply(plot_values_df_list, function(PV) {
    
    plot_values_wide <-
      PV %>% 
      spread(habitat, functioning)

#calculate average functioning for each env combination at each scale and store results in dataframe
hab_values <- 
lapply(hab_comb, function(x) { #for all scales (1:5)
  apply(x, 2, function(y) { #for all environmental combinations
   
     df <- data.frame( #average functioning for this environment combination
       hab_comb = paste0(y, collapse = " "),
       habnum = length(y),
       richness = plot_values_wide$richness,
       functioning = rowMeans(plot_values_wide[,y]),
       stringsAsFactors = FALSE
                     )
  })
  })

hab_values <- bind_rows(unlist(hab_values, recursive = FALSE))

# calculate BEF metrics for each environment combination
hab_metrics <- 
  hab_values %>% 
  group_by(hab_comb, habnum) %>%
  nest() %>% 
  mutate(mix = map_dbl(data, function(x) {x[x$richness == specnum,]$functioning}),
         max_mono = map_dbl(data, function(x) {max(x[x$richness == 1,]$functioning)}),
         transgressive = mix / max_mono)
})

  
hab_metrics_list <- 
lapply(hab_metrics_list_oy, function(x){
  x %>% 
  select(-data) %>% 
  gather(metric, value, -habnum, -hab_comb) %>%
  ungroup() %>% 
  nest(data = -metric) %>% 
  mutate(lm = map(data, ~lm(value ~ habnum, data = .x))) %>% 
  select(-data)
})


```

```{r}

Results <- 
 bind_rows(hab_metrics_list) %>% 
  mutate(slope = map_dbl(map(.$lm, coef), "habnum")) %>% 
  mutate(Intercept = map_dbl(map(.$lm, coef),"(Intercept)")) %>% 
  select(-lm) %>% 
   mutate(metric = factor(metric, 
                         levels = c("mix", "max_mono", "transgressive"),
                         labels = c("average\nmixture", "highest\nmonoculture", 
                                    "transgressive\noveryielding")))

```

##Fig 2 - slopes and histograms 

###Fig 2a: Transgressive overyielding
```{r}
# calculate the median slope and the 90% percentiles
Results %>% 
  group_by(metric) %>% 
  summarise(median = median(slope),
            q05 = quantile(slope, 0.05),
            q95 = quantile(slope, 0.95)) %>% 
  mutate(across(is.numeric, round, 3))
   
```


```{r}
# predict new data
hab_pred = seq(1,5,0.1)

newdata = tibble(rep = rep(1:(nrow(Results)/3), each = length(hab_pred)),
                 hab_pred = rep(hab_pred, (nrow(Results)/3)))

Results_pred <- 
  Results %>% 
  group_by(metric) %>% 
  mutate(rep = 1:n()) %>% 
  left_join(newdata) %>% 
  mutate(predict = slope*hab_pred+Intercept)

med_slope <- 
Results %>% 
  group_by(metric) %>% 
  mutate(rep = 1:n()) %>% 
  slice(1:999) %>% #has to be odd number to find exact media
  mutate(slope_q = 
  case_when(abs(slope - median(slope)) == min(abs(slope - median(slope))) ~ "median",
            abs(slope - quantile(slope, 0.025)) == min( abs(slope - quantile(slope, 0.025))) ~ "low_q",
            abs(slope - quantile(slope, 0.975)) == min( abs(slope - quantile(slope, 0.975))) ~ "high_q")) %>% 
  filter(!is.na(slope_q)) %>% 
  group_by(metric, slope_q) %>% 
  dplyr::slice(1)


Fig_2a <- 
ggplot(Results_pred, aes(x = hab_pred, y = predict, group = paste(rep, metric), colour = metric))+
 # geom_line(size = 0.5, alpha = 0.05)+
  geom_line(data = Results_pred %>% 
              left_join(med_slope) %>% 
              filter(slope_q == "median"), size = 1)+
 # facet_wrap(~metric)+
  geom_hline(yintercept = 1, colour = "black", linetype = "dashed") +
  theme_meta()+
  labs(y = "Transgressive overyielding/Function", x = "Habitat heterogeneity")+
  theme(legend.position = "none",
        axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")) )+
  scale_y_continuous(limits = c(-0.2, NA))+
  scale_color_viridis_d(option = "C", end = 0.8)


save("Fig_2a", file = here("figures", "Fig_2a.RData"))


Fig_2a
```

##Fig 2ai inset
```{r}

quantile_res <- med_slope %>%
                   filter(slope_q != "median") %>% 
                   select(-Intercept, -rep) %>% 
                   pivot_wider(names_from = slope_q, values_from = slope)

Fig_2a_i <- 
ggplot(Results, aes(fill = metric, colour = metric))+
  geom_density(aes(x = slope, ..scaled..))+
  geom_vline(xintercept = 0, linetype = "dashed")+
  geom_errorbarh(data = quantile_res,
                 aes(xmax = high_q, xmin = low_q, y=0.1), colour = "white", height = 0)+ 
  geom_point(data = med_slope %>%  filter(slope_q == "median"), 
           aes(x = slope, y = 0.1), fill = "white", colour = "black", 
           shape = 21, size = 0.8)+
  facet_wrap(~metric)+
  theme_meta()+
  theme(legend.position = "none", 
        title = element_text(size = 10))+
  labs(y = paste("N draws (total ", N, ")", sep = ""),
       title = "Change in transgressive overyielding with scale",
       x = "Transgressive overyielding ~ scale (est.)")+
 # scale_x_continuous(limits = c(-0.07, 0.157)) +
  theme(text = element_text(size = 8),
        axis.text.x = element_text(size = 8),
        axis.text.y = element_text(size = 8),
        axis.title.x = element_text(size = 8),
        axis.title.y = element_text(size = 8))+
  scale_fill_viridis_d(option = "C", end = 0.8)+
  scale_color_viridis_d(option = "C", end = 0.8)


  save("Fig_2a_i", file = here("figures", "Fig_2a_i.RData"))
 
Fig_2a_i
```

In what percentage of cases does the performance of the mixture exceed the performance of the highest monoculture at the highest level of habitat heterogeneity?

```{r}

hab_metrics_transgressive <- 
  pblapply(plot_values_df_list, function(PV) {
    
    plot_values_wide <-
      PV %>% 
      spread(habitat, functioning)

#calculate average functioning for each env combination at each scale and store results in dataframe
hab_values <- 
lapply(hab_comb, function(x) { #for all scales (1:5)
  apply(x, 2, function(y) { #for all environmental combinations
   
     df <- data.frame( #average functioning for this environment combination
       hab_comb = paste0(y, collapse = " "),
       habnum = length(y),
       richness = plot_values_wide$richness,
       functioning = rowMeans(plot_values_wide[,y]),
       stringsAsFactors = FALSE
                     )
  })
  })

hab_values <- bind_rows(unlist(hab_values, recursive = FALSE))

# calculate BEF metrics for each environment combination
hab_metrics <- 
  hab_values %>% 
  group_by(hab_comb, habnum) %>%
  nest() %>% 
  mutate(transgressive = map_dbl(data, function(x) {
    x[x$richness == specnum,]$functioning / 
      max(x[x$richness == 1,]$functioning)})) 

return(hab_metrics)
})


mean_transgressive <- 
lapply(hab_metrics_transgressive, function(df) {
  
  df %>% 
    group_by(habnum) %>% 
    summarise(transgressive = mean(transgressive), .groups = "drop")
}) %>% 
  bind_rows(.id = "run") 

mean_transgressive %>% 
  mutate(transgressive = transgressive > 1) %>% 
  group_by(habnum) %>% 
  summarise(`trans. OY > 1 (%)`=sum(transgressive) / n())

```

##Fig S6 - species specialisation

This code investigates the relationship between transgressive overyielding and the species specialization index.

From the article: 

>[Species specialisation is] calculated as the number of species in monoculture that maximise ecosystem function in at least one habitat divided by the number of habitats and scaled between 0 and 1. For example, if two different species maximise function in two different environments, the index is one. If one species maximises ecosystem function in both environments, the index is zero.


```{r}
Species_specialisation <- 
  lapply(Trait_list, function(x) {
    apply(x, 1, function(y){
      which(y == max(y))
    })
  }) %>% 
  lapply(., function(x) {
    ssp <- length(unique(x))/length(x)
    (ssp-(1/habnum))/(1-(1/habnum))
}) %>% unlist()

sppT <-   
tibble(spec_spec = Species_specialisation, 
       transgressive = filter(Results, metric == "transgressive\noveryielding")$slope)

ggplot(sppT, aes(x = spec_spec, y = transgressive))+
  geom_beeswarm(fill = "black", alpha = 0.1)+
  geom_smooth(se = TRUE, method = "lm", colour = "black", size = 0.5)+
  theme_bw()+
  labs(x = "Species specialisation index", y = "Trans. overyield. ~ heterogeneity (est.)"  )+
  theme_meta()

ggsave(filename = here("figures", "Fig_S6.pdf"), width = 6, height = 4)


```


### statistic

```{r}

sppT %>% 
  lm(transgressive~spec_spec, data = .) %>% 
  tidy()

```