02c_Experiment_analysis.nb.Rmd

---
title: "Bacterial EF across Scale - Experiments"
output: html_notebook
---


```{r, message=FALSE}

install_if <- function(x) {
  
  if(x %in% rownames(installed.packages()) == FALSE) {
  message(paste(x, "is required but not installed. Installing now"))
  Sys.sleep(1)
  install.packages(x)
  library(x)
  } else{ 
    library(x, character.only=TRUE)}
  }

install_if("here")
install_if("dplyr")
install_if("tidyr")
install_if("broom")
install_if("forcats")
install_if("readr")
install_if("ggplot2")
install_if("purrr")
install_if("gridExtra")
install_if("Hmisc")

# load the plotting theme function
source(here("function_plotting_theme.R"))
```

This script exports: 

+ the subplots for Fig. 3 (a and b)
+ Fig. 2b
+ Fig S3
+ Fig S4
+ the subplots for Figure S7

See script `03_Plot_Figs_2_3_S7.Rmd` for the assemblage and export of figure 2, 3 and S7. 

make folder to export figures
```{r}

if(! dir.exists(here("figures"))){
  dir.create(here("figures"))
}

```


This script analyses the data from the experiments. The analysis of the data follow the same structure as the analysis of the simulations. 

## download data

The raw data are archived on [Figshare](https://doi.org/10.6084/m9.figshare.12279884.v1)

>Gamfeldt, Lars; Roger, Fabian; Palm, Martin; Hagan, James; Warringer, Jonas; Farewell, Anne (2020): Biodiversity and ecosystem functioning across gradients in spatial scale. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12279884.v1

Loading the data will throw 10 parsing failures caused by unexpected values in variables that we do not include in our analysis. These can be safely ignored. 

```{r}
Data <- read_tsv(url("https://ndownloader.figshare.com/files/22626299"))
```

```{r}
path1 <- here("figures/exp_setup.jpg")
path2 <- here("figures/exp_plate_setup.jpg")
```

example of agar plate with bacterial colonies - each dot represents a unique species combination

![example Plate](`r path1`)

Each large plate has 1536 spots, corresponding to 12 replicated 96-Well plates with a control plate on every 4th position. 

The position of the strain combinations on each 96-well plate is identical and as follows:

![Strain positions](`r path2`)

We are interested in one response variables from the raw data:

+ growth yield 

## clean data
+ exclude unnecessary variables
+ rename variables
+ change type to numeric
+ filter spots with control strain ("ATCC") as data are already normalized
+ add column with richness of species combination

```{r}
Data <- 
Data %>% 
  select(Plate, Antibiotic, Replicate,
         Row, Column, gene, 
         Phenotypes.ExperimentGrowthYield,
         Phenotypes.GenerationTime) %>% 
  rename(GrowthYield = Phenotypes.ExperimentGrowthYield) %>% 
  mutate(GrowthYield = as.numeric(GrowthYield)) %>% 
  rename(GenTime = Phenotypes.GenerationTime) %>% 
  mutate(GenTime = as.numeric(GenTime)) %>% 
  mutate(Replicate = as.character(Replicate)) %>% 
  mutate(withinPlateRep = gsub("\\w+\\.(\\d)", "\\1", gene)) %>% 
  mutate(gene = gsub("(\\w+)\\.\\d", "\\1", gene)) %>% 
  mutate(Antibiotic) %>% 
  mutate(Antibiotic = fct_relevel(Antibiotic, "Con", after = Inf)) %>% 
  filter(!grepl("ATCC", gene)) %>%
  filter(!grepl("control", gene)) %>% 
  mutate(Richness = nchar(gene)) 
```

## sensitivity analysis 

This part is a sensitivity analysis, producing Fig S4 and Fig S9.
It excludes strain "a" which has strong negative effects on the other strains in mixtures. 
To run the code and produce Fig S4 and S9, set `Sens_analysis = TRUE`

Note that all the code will be executed when `Sens_analysis = TRUE` but the other figures will not be exported. 
```{r}

Sens_analysis = FALSE
# Sens_analysis = TRUE


if(Sens_analysis){
  
Data <- 
Data %>% 
  filter(!grepl("a", gene))

}

```

### species specialisation

This calculates the species specialisation in our experiment, as defined in the main text. Note that it doesn't change with the exclusion of strain a, as this strain doesn't have the highest productivity in any antibiotic habitat

```{r}

#which strains dominate
dom_strains <- 
Data %>% 
  filter(Richness == 1) %>% 
  group_by(gene, Antibiotic) %>% 
  summarise(GrowthYield = mean(GrowthYield)) %>% 
  group_by(Antibiotic) %>% 
  filter(GrowthYield == max(GrowthYield)) %>% 
  pull(gene)

#species specialistaion
ssp <- length(unique(dom_strains)) / length(dom_strains)

#standardize between 0 and 1
(ssp-(1/length(dom_strains)))/(1-(1/length(dom_strains)))

```

## Fig. 3A 
Production of each strain in each habitat.

In this graph we look at the measured production of each strain in each antibiotic treatment. Strains have been selected to maximise the variance in growth performance such that different strains are expected to perform well under different antibiotic treatments and no strain is supposed to perform well in all conditions. 

NOTE: for the figure we rename gene c-f to b-e to match the text in the manuscript. 

```{r}
Fig_3A <- 
Data %>% 
  filter(Richness ==1) %>% 
  rowwise() %>% 
  mutate(gene = ifelse(gene == "a", "a", letters[which(letters == gene)-1])) %>% 
  ggplot(aes(x = gene, 
             y = GrowthYield/1e6, 
             colour = Antibiotic))+
  geom_point(position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.05),
             size = 0.75, alpha = 0.5, shape = 19)+
  geom_pointrange(data = . %>% 
                   group_by(gene, Antibiotic) %>% 
                   summarise(mean_cl_boot(GrowthYield)),
                  aes(x = gene, y = y/1e6, ymin = ymin/1e6, ymax = ymax/1e6, group = Antibiotic),
                  position = position_dodge(width = 0.5),
                  colour = "black",
               fill = "white", 
               shape = 21, 
               stroke = 0.4, 
               size = 1, 
               fatten = 2
               )+
  scale_y_log10()+
  theme_meta()+
  guides(colour = guide_legend(override.aes = list(size = 4, alpha = 1)))+
  scale_color_viridis_d(option = "C", end = 0.9) +
  labs(y = "Production (millions of cells)", x = "Strain", colour = "Habitat")

if(!Sens_analysis){
  save("Fig_3A", file = here("figures", "Fig_3A.RData"))
}

Fig_3A
```

+ Test if monoculture values vary significantly between species in each habitat (such that transgressive overyielding doesn't just vary with noise)

```{r}

Data %>% 
  filter(Richness ==1) %>% 
  rowwise() %>% 
  mutate(gene = ifelse(gene == "a", "a", letters[which(letters == gene)-1])) %>% 
  group_by(Antibiotic) %>% 
  nest() %>% 
  mutate(linear_fit = map(data, ~lm(GrowthYield ~ gene, data = .))) %>% 
  mutate(linear_fit = map(linear_fit, glance)) %>% 
  unnest(cols = linear_fit) %>% 
  select(Antibiotic, r.squared, p.value, df)

```


## Fig. 3B

Production-richness relationships.

In this graph we explore the biodiversity - ecosystem functioning (BEF) relationship in each antibiotic habitat. Community production is the metric of functioning and initial strain richness is the measure of biodiversity. 

```{r}

Fig_3B <- 
Data %>% 
  na.omit() %>% 
  group_by(Antibiotic) %>%
  nest() %>% 
  mutate(power_fit = map(data, ~lm(log(GrowthYield) ~ log(Richness), data = .))) %>% 
  mutate(power_fit = map(power_fit, augment, se_fit = TRUE)) %>% 
  unnest(cols = c(data, power_fit)) %>% 
  ggplot(aes(x = Richness, y = GrowthYield/1e6, colour = Antibiotic))+
  geom_point(position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.05),
             size = 0.75, alpha = 0.25, shape = 19)+
  geom_line(aes(x = Richness, 
                  y = exp(.fitted)/1e6, 
                  group = Antibiotic), 
            position = position_jitterdodge(jitter.width = 0.3, jitter.height = 0))+
  theme_meta()+
  labs(y = "production", x = "strain richness")+
  scale_y_log10()+
  scale_color_viridis_d(option = "C", end = 0.9)

if(!Sens_analysis){
  save("Fig_3B", file = here("figures", "Fig_3B.RData"))
}

Fig_3B

```
### power-fit statistics
```{r}

Data %>% 
  na.omit() %>% 
  group_by(Antibiotic) %>%
  nest() %>% 
  mutate(power_fit = map(data, ~lm(log(GrowthYield) ~ log(Richness), data = .))) %>% 
  mutate(power_fit = map(power_fit, tidy)) %>% 
  unnest(cols = power_fit) %>% 
  filter(term == "log(Richness)") %>% 
  select(Antibiotic, estimate, statistic, p.value) %>% 
  mutate_if(is.numeric, signif, digits = 3)


```

### Fig. S3/S4. 

Highlighting (in green) which strain is included in which combination in each environment.

In Figure 3B, we show the BEF relationship in each environment. Each point in that figure represents a replicate of either a single strain (Richness = 1) or a combination of strains (Richness 2-5). All possible species combinations are present. 

In this figure we highlight for each strain what species combination it is part of. This highlights some obvious patterns of positive and negative selection effects were the mixture performance depends strongly on the presence or absence of a particular strain in a particular environment. 

NOTE: for the figure we rename gene c-f to b-e to match the text in the manuscript. 
```{r, warning=FALSE}

strain <- filter(Data, Richness == 1) %>% pull(gene) %>% unique %>% sort

plots <- as.list(rep(NA, length(strain)))
names(plots) <- strain

for(i in seq_along(strain)){
  
plots[[i]] <-   
  Data %>% 
  na.omit() %>% 
  mutate(has_strain = grepl(strain[i], gene)) %>% 
  group_by(Antibiotic, has_strain) %>%
  nest() %>% 
  mutate(power_fit = map(data, ~lm(log(GrowthYield) ~ log(Richness), data = .))) %>% 
  mutate(power_fit = map(power_fit, augment)) %>% 
  unnest(cols = c(data, power_fit)) %>% 
  ggplot(aes(x = Richness, y = GrowthYield, colour = has_strain))+
  geom_point(position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = 0.5), 
             alpha = 1, size = 0.4)+
  geom_line(aes(x = Richness, y = exp(.fitted), group = has_strain), 
            size = 0.5)+
  facet_wrap(~Antibiotic, scales = "free_y")+
  scale_color_manual(values = c("grey", "#4DAF4A"), name = "Strain in mixture")+
  theme_meta()+
  theme(legend.position = if(i == 1){c(0.9, 0.2)} else{"none"})+
  labs(title = paste("Strain", ifelse(strain[i] == "a", "a", letters[which(letters == strain[i])-1])),
       y = "Production") +
  theme(title = element_text(size = 10))
  
}

#Fig S3 (with all strains)
if(!Sens_analysis){
ggsave(file = here("figures", "Fig_S3.pdf"), 
       arrangeGrob(grobs = plots, ncol = 2), 
       width = 16, height = 12 )
}

#Fig S4 (without strain a)
if(Sens_analysis){
ggsave(file = here("figures", "Fig_S4.pdf"), 
       arrangeGrob(grobs = plots, ncol = 2), 
       width = 16, height = 12 )
}

plots
```

##Transgressive overyielding

We calculate the transgressive overyielding for all combinations of habitats. 

+ for all leveks of habitat heterogeneity (1:5 habitats)
+ for all possible unique combinations of habitats 

We calculate the functioning of a given species mixture (or monoculture) in that combination of habitats at each level of habitat heterogeneity as the average functioning of that species mixture in the single environments. 

**Transgressive overyielding** is calculated as:

+ highest mixture value / maximum monoculture value


In a first step we calculate the mean functioning for each strain and strain combinations for all possible combinations of habitats. 

As there is no sensible way how the replicated measurements should be matched across environments, we first calculate the average for each strain combination in each environment. 

```{r, warning=FALSE}

#calculate the average growth yield of each of the 32 strain combinations in each habitat and reshape the data to one combination per row and the 5 columns for the 5 habitats. Note that habitats are different antiobiotics. 

Data_wide <- 
  Data %>% 
  select(Richness, gene, Antibiotic, GrowthYield) %>%
  #filter(Richness == 1 & gene == "a") %>% 
  group_by(Richness, gene, Antibiotic) %>% 
  summarise(GrowthYield = mean(GrowthYield, na.rm = T)) %>% 
  spread(Antibiotic, GrowthYield)

# vector of environemnts
ab <- as.character(unique(Data$Antibiotic))

# list of all possible combinations of environments
ab_comb <- sapply(1:length(ab), function(x) combn(ab, x))

# number of species
specnum <- Data %>% filter(Richness == 1) %>% pull(gene) %>% unique %>% length()

# calculate the average performance or each strain combination in each environment combination
hab_values <- 
lapply(ab_comb, function(x) { #for all levels of environmental richness (scale: 1:5)
  apply(x, 2, function(y) { #for all combination of environments
   
     df <- data.frame(
       hab_comb = paste0(y, collapse = " "), # which combination of habitats
       habnum = length(y), # what scale
       richness = Data_wide$Richness, 
       gene = Data_wide$gene,
       functioning = rowMeans(Data_wide[,y])) #mean of functioning across habitats
  })
  })

# bind results to one dataframe
hab_values <- bind_rows(unlist(hab_values, recursive = FALSE))
```


### plot power-fits 

The relationship between average strain community production and richness for the 5 levels of habitat heterogeneity. 
```{r}

hab_values %>%
  group_by(hab_comb, habnum) %>% 
  nest() %>% 
  mutate(log_fit = map(data, ~lm(log(functioning) ~ log(richness), data = .))) %>% 
  mutate(log_fit = map(log_fit, tidy)) %>% 
  unnest(log_fit) %>% 
  select(-std.error, -statistic, -p.value) %>% 
  spread(term, estimate) %>% 
  mutate(power_fit = map(data, ~nls(functioning ~ a*richness^b, data = ., 
                                    start = list(a = exp(`(Intercept)`) , b = `log(richness)`)))) %>%  
  mutate(power_fit = map(power_fit, augment)) %>% 
  unnest(cols = power_fit) %>% 
  ggplot(aes(x = richness, y = functioning/1e8, colour = as.factor(habnum)))+
  geom_point(aes(group = hab_comb), size = 1, alpha = 1, position = position_dodge(width = 0.7))+
  geom_line(aes(y = .fitted/1e8, group = hab_comb), position = position_dodge(width = 0.7))+
  theme_meta()+
  facet_wrap(~habnum)+
  labs(y = paste("Production x", expression(1e8)), x = "Species richness")+
  guides(colour = guide_legend(ncol = 3))+
  theme(legend.position = c(0.85,0.25), 
        legend.background = element_rect(fill = "transparent"),
        axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")))+
  NULL

```


Now we calculate transgressive overyielding:

```{r}

hab_metrics <- 
  hab_values %>% 
  group_by(hab_comb, habnum) %>%
  nest() %>%
  mutate(transgressive = map(data, function(x) {
   x %>% 
      group_by(gene) %>% 
      filter(richness %in% c(1, specnum)) %>% 
      summarise(functioning = mean(functioning, na.rm = T), 
                richness = unique(richness), .groups = "drop") %>% 
      group_by(richness) %>% 
      summarise(functioning = max(functioning, na.rm = T)) %>% 
  nest(data=everything())
    })) %>% 
  select(-data) %>% 
  unnest(transgressive) %>% 
  unnest(data) %>% 
  mutate(richness = case_when(richness == 1 ~ "max_mono",
                              TRUE ~ "mix")) %>% 
  pivot_wider(names_from = "richness", values_from = "functioning") %>% 
  mutate(transgressive = mix / max_mono)

```

```{r}

hab_metrics %>%
  group_by(habnum) %>%
  summarise(m.transgressive = mean(transgressive))

```


### Fig 2b / S7 - transgressive overyielding
```{r}
Fig_2b <- 
hab_metrics %>% 
 # select(-data) %>% 
  gather(metric, value, -habnum, -hab_comb) %>% 
  filter(metric == "transgressive") %>% 
  ggplot(aes(x = habnum, y = value)) +
  geom_point( size = 2, alpha = 1, position = position_jitter(width = 0.1) )+
  stat_smooth(method = "lm", se = T, size = 0.5, colour = "black")+
  geom_hline(yintercept = 1, colour = "black", linetype = "dashed") +
  theme_meta()+
  labs(y = "Transgressive overyielding/Standardised function", x = "Habitat heterogeneity")+
  theme(legend.position = "none",
        axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")))+
  scale_y_continuous(limits = c(0, 1))+
  NULL

if(!Sens_analysis){
save("Fig_2b", file = here("figures", "Fig_2b.RData"))
}

if(Sens_analysis){
save("Fig_2b", file = here("figures", "Fig_S7.RData"))
}
  
Fig_2b
```

```{r}
Fig_2b <- 
hab_metrics %>% 
  ungroup() %>% 
  #bring to common scale
  mutate(mix = mix/max(max_mono),
      max_mono = max_mono / max(max_mono)
         ) %>% 
  gather(metric, value, -habnum, -hab_comb) %>% 
   mutate(metric = factor(metric, 
                         levels = c("mix", "max_mono", "transgressive"))) %>% 
  ggplot(aes(x = habnum, y = value, colour = metric)) +
  geom_point( size = 2, alpha = 1, position = position_jitterdodge(jitter.width =  0.05, dodge.width = 0.4) )+
  stat_smooth(method = "lm", se = T, size = 0.5, position = position_dodge(width = 0.4))+
  geom_hline(yintercept = 1, colour = "black", linetype = "dashed") +
  theme_meta()+
  labs(y = "Transgressive overyielding/Standardised function", x = "Habitat heterogeneity")+
  theme(legend.position = "none",
        axis.ticks.length=unit(-0.25, "cm"),
        axis.text.x = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")),
        axis.text.y = element_text(margin=unit(c(0.5,0.5,0.5,0.5), "cm")))+
  scale_y_continuous(limits = c(0, 1))+
  scale_color_viridis_d(option = "C", end = 0.8)
  NULL

if(!Sens_analysis){
save("Fig_2b", file = here("figures", "Fig_2b.RData"))
}

if(Sens_analysis){
save("Fig_2b", file = here("figures", "Fig_S7.RData"))
}
  
Fig_2b
```


### transgressive overyielding - statistics
```{r}
 
hab_metrics %>% 
  ungroup() %>% 
  mutate(across(one_of("max_mono", "mix"), function(x) x/ max(max_mono))) %>% 
  gather(metric, value, -habnum, -hab_comb) %>%
  nest(data = !one_of("metric")) %>% 
  mutate(lm = map(data, ~lm(value ~ habnum, data = .x))) %>% 
  mutate(stat = (map(.$lm, tidy))) %>% 
  unnest(stat) %>% 
  filter(term == "habnum") %>% 
  mutate(across(is.numeric, round, 3)) %>%
  print()
  
```