Giulia Piazza
This is an example of the pipeline followed for the analysis in "Unweaving the web: Polygenic influences on networks of psychopathology symptoms". The example is focused only on the network with the polygenic score for ADHD for demonstration purposes. All parts of the analysis were run via a computer cluster due to the computing power required. Although this example is not directly reproducible (as it requires raw data), the resulting objects can be found in pipeline_example.RData in this Github repo.
The preregistration of the confirmatory analysis of this project can be found here in the respective OSF project.
Loading imputed and cleaned data from ALSPAC and TEDS (see paper for more details). Polygenic scores were all calculated with this script, which follows this LDpred2 tutorial (accessed in September 2022). The ADHD polygenic score was derived from effect sizes in Demontis et al. (2019).
library(qgraph)
library(bootnet)
library(dplyr)
library(psychonetrics)
library(readxl)
Estimating the ADHD polygenic score network in ALSPAC with unregularised model search with the package 'bootnet'.
selection = c("DEP.2","DEP.3", "DEP.5","DEP.6", "DEP.7",
"DEP.8","DEP.9","DEP.10","DEP.11","DEP.12","DEP.13","PRO.1",
"HYP.1","EMO.1","PRO.2","COND.1","PEER.1","COND.2","EMO.2",
"PRO.3","HYP.2","PEER.2","COND.3","EMO.3","PEER.3","HYP.3",
"EMO.4","PRO.4","COND.4","PEER.4","PRO.5","HYP.4","COND.5",
"PEER.5","EMO.5","HYP.5") # select relevant variables
adhd.network <- ALSPAC %>%
dplyr::select(selection, ADHD.ld) %>%
estimateNetwork(., default = "ggmModSelect", stepwise = TRUE) # estimate network with 'ggmModSelect', selected nodes and ADHD PRS calculated with LDPred2
Plotting the resulting network with the package 'qgraph' for a quick preliminary visualisation.
qgraph(adhd.network$graph, layout = "spring", theme = "colorblind")
Calculating non-parametric bootstraps (this is time intensive) with 'bootnet'.
adhd.boot <- bootnet(adhd.network, nBoots = 1000, default = "ggmModSelect", type = "nonparametric", nCores = 20)
Model 1 tests the fit of the network estimated in ALSPAC in TEDS. Here, I take the adjacency matrix of the ADHD polygenic score network calculated above (a matrix with ones/zeros for presence/absence of edges) and test its fit in TEDS data.
# extract structure
adhd.structure = 1*(adhd.network$graph != 0)
# test how well structure fits to teds data
adhd.model = TEDS %>% dplyr::select(DEP.2:HYP.5, ADHD) %>%
ggm(., estimator = "FIML", omega = adhd.structure) %>% runmodel()
adhd.model %>% fit() # inspect model fit
## Measure Value
## logl -222605.98
## unrestricted.logl -221910.32
## baseline.logl -242816.22
## nvar 37
## nobs 740
## npar 280
## df 460
## objective 28.26
## chisq 1391.31
## pvalue ~ 0
## baseline.chisq 41811.79
## baseline.df 666
## baseline.pvalue ~ 0
## nfi 0.97
## pnfi 0.67
## tli 0.97
## nnfi 0.97
## rfi 0.95
## ifi 0.98
## rni 0.98
## cfi 0.98
## rmsea 0.021
## rmsea.ci.lower 0.020
## rmsea.ci.upper 0.022
## rmsea.pvalue 1
## aic.ll 445771.96
## aic.ll2 445808.18
## aic.x 471.31
## aic.x2 1951.31
## bic 447574.94
## bic2 446685.21
## ebic.25 448586.00
## ebic.5 449597.05
## ebic.75 450405.90
## ebic1 451619.17
Model 2 involves multi-group testing and assesses whether edges have comparable weights in ALSPAC and TEDS. Here, I create a combined data frame with ALSPAC and TEDS data, with a column indicating the cohort. I then use the package 'psychonetrics' to run a model where edge weights in ALSPAC and TEDS are set to be equal, and compare it to a model where they are set free to vary.
alspac.data = ALSPAC %>% dplyr::select(selection, BMI.ld:ANX.P.ld, ADHD.ld) # create dataframe with relevant vars for alspac
colnames(alspac.data) = c(selection, "BMI", "EA", "DEP", "ANX", "ADHD")
alspac.data$group = "alspac"
teds.data = TEDS %>% dplyr::select(selection, "BMI", "EA", "DEP", "ANX", "ADHD") # select relevant vars for teds
teds.data$group = "teds"
comb.data = rbind(alspac.data, teds.data) # create dataset of combined teds and alspac
comb.model.adhd <- comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>%
ggm(., estimator = "FIML", omega = adhd.structure, groups = "group") %>%
groupequal("omega") %>% runmodel # run a model where edges are equal in alspac and teds
comb.model.adhd.fit = comb.model.adhd %>% fit()
## Measure Value
## logl -485929.72
## unrestricted.logl -484280.22
## baseline.logl -532672.39
## nvar 37
## nobs 1480
## npar 354
## df 1126
## objective 27.79
## chisq 3298.99
## pvalue ~ 0
## baseline.chisq 96784.32
## baseline.df 1332
## baseline.pvalue ~ 0
## nfi 0.97
## pnfi 0.82
## tli 0.97
## nnfi 0.97
## rfi 0.96
## ifi 0.98
## rni 0.98
## cfi 0.98
## rmsea 0.020
## rmsea.ci.lower 0.019
## rmsea.ci.upper 0.020
## rmsea.pvalue 1
## aic.ll 972567.43
## aic.ll2 972593.10
## aic.x 1046.99
## aic.x2 4006.99
## bic 975125.03
## bic2 974000.06
## ebic.25 976403.29
## ebic.5 977681.56
## ebic.75 978704.17
## ebic1 980238.09
free.model.adhd = comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>%
ggm(., estimator = "FIML", omega = adhd.structure, groups = "group") %>% runmodel # run a model where edges can vary
knitr::kable(compare("free" = free.model.adhd,
"constrained" = comb.model.adhd), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
free | 920 | 971926.3 | 975972.2 | 0.0168545 | 2245.815 | NA | NA | NA |
constrained | 1126 | 972567.4 | 975125.0 | 0.0195041 | 3298.985 | 1053.17 | 206 | 0 |
Model 3 tests the significance of all edges connected to the polygenic score for ADHD at the same time. Here, I compare a model where all edges between the polygenic score for ADHD are set to zero to the original network model in a combined dataset of TEDS and ALSPAC.
# set ALL edges connecting the PGS to zero
adhd.structure.3 = adhd.structure
adhd.structure.3[37,] = adhd.structure.3[,37] = 0
adhd.model.3 = comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>%
ggm(., estimator = "FIML", omega = adhd.structure.3, groups = "group") %>%
groupequal("omega") %>% runmodel
knitr::kable(compare("original network" = comb.model.adhd,
"network with no pgs edges" = adhd.model.3), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
original network | 1126 | 972567.4 | 975125 | 0.0195041 | 3298.985 | NA | NA | NA |
network with no pgs edges | 1128 | 972701.9 | 975245 | 0.0200892 | 3437.405 | 138.4199 | 2 | 0 |
Model 4 tests the significance of a single edge connected to the ADHD polygenic score at a time, for all edges connected to the polygenic score. Here, for every edge connected to the ADHD polygenic score in the original network (edges connecting nodes HYP.3 and COND.4), I create a model where they are set to zero with a function, in a combined dataset of TEDS and ALSPAC. I then compare both models to the original network model.
# Function to set ONE edge connecting the PGS to zero progressively
model4 = function(alspac.structure,node1,node2) {
# alspac.structure = structure of alspac network
# node 1 and node 2 = nodes of interest as strings;
# node 1 is the PGS
one = grep(paste(node1), colnames(alspac.structure)) # index of node 1 (PGS)
nodes = which(alspac.structure[,one] ==1) # list of edges connecting node 1
two = nodes[which(names(nodes) == paste(node2))] # index of node 2
new.structure = alspac.structure # duplicate the original alspac structure
new.structure[one,two] = new.structure[two,one] = 0 # set the edge between node 1 and 2 to zero
model = comb.data %>% dplyr::select(DEP.2:HYP.5, paste(node1),group) %>%
ggm(., estimator = "FIML", omega = new.structure, groups = "group") %>%
groupequal("omega") %>% runmodel # create and run the model
return(model)
}
## HYP.3
adhd.model.4.HYP = model4(adhd.structure,"ADHD", "HYP.3")
knitr::kable(compare("original network" = comb.model.adhd,
"without edge adhd-hyp3" = adhd.model.4.HYP), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
original network | 1126 | 972567.4 | 975125.0 | 0.0195041 | 3298.985 | NA | NA | NA |
without edge adhd-hyp3 | 1127 | 972650.6 | 975200.9 | 0.0198693 | 3384.125 | 85.13986 | 1 | 0 |
## COND.4
adhd.model.4.COND= model4(adhd.structure,"ADHD", "COND.4")
knitr::kable(compare("original network" = comb.model.adhd,
"without edge adhd-cond4" = adhd.model.4.COND), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
original network | 1126 | 972567.4 | 975125 | 0.0195041 | 3298.985 | NA | NA | NA |
without edge adhd-cond4 | 1127 | 972591.6 | 975142 | 0.0196081 | 3325.169 | 26.18379 | 1 | 3e-07 |
Lastly, model 5 tests whether ALSPAC and TEDS have comparable edge weights in a single edge connected to the ADHD polygenic score. Here, I create model where equality constrains are lifted between the edges connecting the polygenic score to nodes HYP.3 and COND.4, and compare them to a model where they are set to be equal.
model5 = function(alspac.structure,node1,node2,eq.model) {
# alspac.structure = structure of alspac network
# node 1 and node 2 = nodes of interest as strings;
# node 1 is the PGS;
# eq.model = model with equality constrains to modify
one = grep(paste(node1), colnames(alspac.structure)) # index of node 1 (PGS)
nodes = which(alspac.structure[,one] ==1) # list of edges connecting node 1
two = nodes[which(names(nodes) == paste(node2))] # index of node 2
model = eq.model %>%
groupfree("omega",one,two) %>%
runmodel() # modify equality constrains model, lifting one edge constrain
return(model)
}
# set edges connecting the PGS to free to vary, lift equality constrains
adhd.model.5.HYP <- model5(adhd.structure, "ADHD", "HYP.3", comb.model.adhd)
knitr::kable(compare("original network" = comb.model.adhd,
"free edge adhd-hyp3" = adhd.model.5.HYP), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
free edge adhd-hyp3 | 1125 | 972569.3 | 975134.1 | 0.0195166 | 3298.820 | NA | NA | NA |
original network | 1126 | 972567.4 | 975125.0 | 0.0195041 | 3298.985 | 0.1651108 | 1 | 0.6844941 |
adhd.model.5.COND <- model5(adhd.structure, "ADHD", "COND.4", comb.model.adhd)
knitr::kable(compare("original network" = comb.model.adhd,
"free edge adhd-cond4" = adhd.model.5.COND), row.names = FALSE)
model | DF | AIC | BIC | RMSEA | Chisq | Chisq_diff | DF_diff | p_value |
---|---|---|---|---|---|---|---|---|
free edge adhd-cond4 | 1125 | 972569.1 | 975133.9 | 0.0195156 | 3298.608 | NA | NA | NA |
original network | 1126 | 972567.4 | 975125.0 | 0.0195041 | 3298.985 | 0.3777227 | 1 | 0.5388245 |
Here, I plot the network focusing on the polygenic score by placing it in the centre of the graph, using custom functions which can be found here.
# load networks and legend ----
quest.sub = read_excel("mfq_sdq_key_11_subscale.xlsx")
quest.sub = quest.sub[-c(1,4),] # removing items not in teds and in these reduced networks
# load functions ----
source("qgraph.alt.R") # modified qgraph function: plots by avoiding highlighted edges crossing under other ones
source("support.functions.R") # support functions to qgraph.alt
source("unfade.centre.nolegend.R") # unfading function
# load average layout with PGS in the centre ----
avglay = as.matrix(read.csv("avglay.csv",
header = FALSE))
node.one = which(colnames(adhd.network$graph) == "ADHD.ld")
node.two = which(colnames(adhd.network$graph) == "COND.4")
curve.adhd = matrix(0, ncol = 37, nrow = 37)
curve.adhd[node.one, node.two] = curve.adhd[node.two, node.one] = 0.7
graph = unfade.nolegend(adhd.network$graph, PGSnode = "ADHD", PGSdesc = "Polygenic Score for ADHD - ALSPAC", PGScolor = "#46BAC8",
avglay = avglay, curve.matrix = curve.adhd)
plot(graph)