Dynamic gene regulatory network reconstruction from scRNA-seq data.
Epoch leverages single-cell transcriptomic data, single-cell analysis methods, and graph theoretic approaches to reconstruct dynamic GRNs. Additionally, Epoch contains functionality for top regulator prediction, network comparision, signaling pathway integration, amongst others. Here we show some examples of Epoch in action.
For a more in depth look at Epoch, and to see how we applied it to elucidate signaling-induced GRN topology changes in early mouse embryonic stem cell (ESC) directed differentiation, check out our publication here.
- Setup
- Example 1: Reconstruction
- Example 2: Network Comparison
- Example 3: Signaling Pathway Integration
devtools::install_github("pcahan1/epoch")
library(epoch)
Epoch reconstructs networks from single-cell RNA-sequencing data.
This data is sampled from day 0 through day 4 mESC directed differentiation toward mesodermal fate guided by Wnt3a, Activin A, and GSKi. It has already been normalized, and the varying genes have been identified. It has also been clustered, and analyzed with RNA Velocity.
library(epoch)
list12<-loadDataFromLoom("data/sampled_mesoderm_WAG.loom")
expDat<-list12[['expDat']]
sampTab<-list12[['sampTab']]
mmTFs<-utils_loadObject("data/mmTFs.rda")
mmTFs<-intersect(rownames(expDat),mmTFs)
# Note: 'loadDataFromLoom' uses loomR, and is sensitive to the version
# when calling lfile$get.attributes.df(). Versions of loomR that accept
# parameters 'attributes' or 'attribute.names' in get.attributes.df()
# are supported here. Future versions may require some adjustment.
Now, expDat contains the normalized and log-transformed gene-by-cell expression data, and sampTab contains the corresponding metadata, including pseudotime annotation.
Reconstruction occurs in three steps:
- Find dynamically expressed genes
- Infer edges across dynamic genes using CLR (or other supported method)
- Perform optional cross-weighting to refine network structure
# Find dynamically expressed genes
xdyn <- findDynGenes(expDat, sampTab, group_column="cluster", pseudotime_column="latent_time")
pThresh<-0.05
dgenes<-names(xdyn$genes)[xdyn$genes<pThresh]
# Reconstruct and perform crossweighting
grnDF <- reconstructGRN(expDat, mmTFs, dgenes, method="pearson", zThresh=3)
grnDF <- crossweight(grnDF,expDat,xdyn,filter_thresh=0)
# Example alternative: to reconstruct using GENIE3, run reconstructGRN_GENIE3 prior to crossweighting.
# grnDF <- reconstructGRN_GENIE3(expDat, mmTFs, dgenes, weightThresh=.5)
Note: To use Mutual Information, set method to "MI". Set zThresh and filter_thresh to tune interactions via removing edges below the threshold.
The object grnDF contains the reconstructed network. TG and TF refer to target gene and transcription factor respectively. The column "zscore" is the network prior to crossweighting. The column "weighted_score" is the network after crossweighting:
grnDF[5:15,]
# TG TF zscore corr offset weighted_score
# 5 Prdm14 Nfib 3.846711 0.2371545 3.0895522 3.846711
# 6 Prdm14 Zfp990 3.390523 0.2574008 3.7761194 3.390523
# 7 Prdm14 Rest 3.102711 0.2485766 1.7761194 3.102711
# 8 Prdm14 Nanog 4.598733 0.2892227 11.4776119 2.896073
# 9 Prdm14 Tead2 3.114365 -0.2491080 7.1492537 3.114365
# 10 Prdm14 Zfp42 5.641972 0.3185379 -0.6865672 5.641972
# 11 Prdm14 Klf2 3.344418 0.2563452 0.9104478 3.344418
# 12 Prdm14 Rhox6 3.648716 0.2528058 -8.4477612 3.648716
# 13 Eya1 Hoxb4 3.133878 0.1504617 11.7313433 1.947923
# 14 Eya1 Gli3 4.176600 0.1663259 5.5671642 4.176600
# 15 Eya1 Foxc1 3.976296 0.1660566 3.3582090 3.976296
We can further explore changes in the network across time by defining "epochs" or time periods in our trajectory, assigning genes to these epochs, and extracting a dynamic network across time.
Defining epochs can be done in a number of ways. Here we show an example with method="pseudotime". This will partition cells based on pseudotime (pseudotime will be divided evenly, unless specified with parameter psuedotime_cuts). The following sets of parameters are alternative methods of defining epochs:
- method = "pseudotime", num_epochs = n : This will divide pseudotime evenly, and paritition the cells into n epochs.
- method = "pseudotime", pseudotime_cuts = c(a,b) : This will partition by cells by pseudotime, cutting pseudotime at a and b. The resulting number of epochs is length(pseudotime_cuts)+1.
- method = "cell_order", num_epochs = n : This will evenly partition cells based on raw cell order.
- method = "group", group_assignment = L : This will partition cells into pre-defined epochs. L is a list describing the assignment. names(L) correspond to epoch names; elements in L are vectors containing groups in dynRes$cells$group.
- method = "con_similarity" : This will automatically detect epochs based on consecutive similarity (i.e. measures correlation between groups of cells along a sliding window across pseudotime). This method does not require specifying num_epochs.
- method="kmeans", num_epochs = n : This will automatically detect epochs based on k-means clustering.
- method="hierarchical", num_epochs = n : This will automatically detect epochs based on hierarchical clustering.
"con_similarity", "kmeans", and "hierarchical" call functions "find_cuts_by_similarity" and "find_cuts_by_clustering". You can refine the epoch definitions by employing these two functions separately and using the psuedotime cuts they return in the second method described above.
Additionally, define_epochs takes as input the expression matrix. For smoother epoch definitions, you can optionally pass a smoothed expression matrix instead. Smoothing can be done using the function grnKsmooth. See plotting dynamically expressed genes for an example.
For a simpler approach, assign_epoch_simple() will define and assign epochs based on maximum mean expression of a gene. However, unlike the standard approach, this approach assumes genes cannot belong to more than one epoch.
xdyn<-define_epochs(xdyn,expDat,method="pseudotime",num_epochs=3)
epoch_assignments<-assign_epochs(expDat,xdyn)
dynamic_grn<-epochGRN(grnDF,epoch_assignments)
# from to name
# 1 epoch1 epoch2 epoch1..epoch2
# 2 epoch2 epoch3 epoch2..epoch3
# 3 epoch1 epoch1 epoch1..epoch1
# 4 epoch2 epoch2 epoch2..epoch2
# 5 epoch3 epoch3 epoch3..epoch3
# Example alternative:
# assignment_list<-assign_epochs_simple(expDat[dgenes,],xdyn)
# Another example alternative:
# ccells <- xdyn$cells
# expSmoothed <- grnKsmooth(expDat, ccells, BW=0.02)
# xdyn <- define_epochs(xdyn,expSmoothed,method="kmeans",num_epochs=3)
The object dynamic_grn stores the dynamic network across epochs. The list includes active subnetworks at each epoch (in this example, "epoch1..epoch1","epoch2..epoch2", etc.) as well as potential transition networks (in this example, "epoch1..epoch2" and "epoch2..epoch3") describing how network topology transitions from one epoch to another.
We can use Epoch to identify the most influential regulators in the reconstructed dynamic (or static) network. Here's an example of accomplishing this via a PageRank approach on the dynamic network.
gene_rank<-compute_pagerank(dynamic_grn,weight_column="weighted_score")
The object gene_rank now contains a list of rankings for each epoch and transition network:
head(gene_rank$epoch1..epoch1)
# gene page_rank is_regulator
# Utf1 Utf1 0.03967890 TRUE
# Zfp42 Zfp42 0.03151797 TRUE
# Klf2 Klf2 0.03036168 TRUE
# Sox2 Sox2 0.02456489 TRUE
# Tead2 Tead2 0.01598365 TRUE
# Arid5b Arid5b 0.01581432 TRUE
head(gene_rank$epoch3..epoch3)
# gene page_rank is_regulator
# Tead2 Tead2 0.02687836 TRUE
# Sox11 Sox11 0.02512396 TRUE
# Rest Rest 0.01709808 TRUE
# Peg3 Peg3 0.01599903 TRUE
# Basp1 Basp1 0.01265988 TRUE
# Meis2 Meis2 0.01253277 TRUE
We can also use betweenness and degree.
another_gene_rank<-compute_betweenness_degree(dynamic_grn,weight_column="weighted_score")
The object another_gene_rank now contains a list of rankings for each epoch and transition network:
head(another_gene_rank$epoch1..epoch1)
# gene betweenness degree is_regulator
# Utf1 Utf1 0.27817182 0.42617801 TRUE
# Sox2 Sox2 0.27811474 0.39581152 TRUE
# Klf2 Klf2 0.15267323 0.45863874 TRUE
# Zfp42 Zfp42 0.12816798 0.45026178 TRUE
# Tead2 Tead2 0.09685535 0.26387435 TRUE
# Tgif1 Tgif1 0.16497086 0.04397906 TRUE
Here are some useful functions for understanding details of reconstructed networks.
First, we can add in the interaction type (also automatically added in plotting functions). This works on both static and dynamic GRNs.
dynamic_grn<-add_interaction_type(dynamic_grn)
#grnDF<-add_interaction_type(grnDF)
head(dynamic_grn$epoch2..epoch2)
# TG TF zscore corr offset weighted_score interaction
# 4 Prdm14 Basp1 3.096282 -0.2482408 5.164179 3.096282 repression
# 7 Prdm14 Rest 3.102711 0.2485766 1.776119 3.102711 activation
# 9 Prdm14 Tead2 3.114365 -0.2491080 7.149254 3.114365 repression
# 18 Ptp4a1 Lhx1 3.288639 0.1531601 -6.074627 3.288639 activation
# 19 Ptp4a1 Gata4 3.975216 0.1649048 13.432836 2.252686 activation
# 23 Sema4c Sox4 3.009982 0.2526573 7.701493 3.009982 activation
We can apply community detection to these networks. This is particularly useful when examining mean module expression (see below). This works on both static and dynamic GRNs.
community_assignments<-find_communities(dynamic_grn)
#community_assignments<-find_communities(grnDF)
head(community_assignments$epoch3..epoch3)
# gene communities
# 1 Prdm14 2
# 2 Eya1 3
# 3 Ptp4a1 3
# 4 Sema4c 2
# 5 1700019D03Rik 5
# 6 Fzd5 1
Epoch contains various plotting tools to visualize dynamic activity of genes and networks.
This is particularly useful for verifying epoch assignments, and gauging how many epochs should occur in a trajectory. Here we plot the expression of 50 TFs along pseudotime.
# First, smooth expression for a cleaner plot
ccells <- xdyn$cells
expSmoothed <- grnKsmooth(expDat, ccells, BW=0.03)
# Plot a heatmap of the dynamic TFs
tfstoplot<-intersect(dgenes,mmTFs)
dynTFs<-xdyn
dynTFs$genes<-dynTFs$genes[names(dynTFs$genes) %in% tfstoplot]
hm_dyn(expSmoothed,dynTFs,topX=50)
# Plot a heatmap of all dynamic TFs and target genes
# dyngenes<-xdyn
# dyngenes$genes<-dyngenes$genes[names(dynTFs$genes) %in% dgenes]
# hm_dyn(expSmoothed,dyngenes,topX=100)
plot_dynamic_network(dynamic_grn,mmTFs,only_TFs=TRUE,order=c("epoch1..epoch1","epoch2..epoch2","epoch3..epoch3"))
To plot all targets, we can set only_TFs=FALSE. We can specify which epochs to plot using the "order" parameter. These coincide with names(dynamic_grn). If left empty, all epoch networks and transition networks will be plotted.
The same function can be used to plot a static network.
plot_dynamic_network(list(mesoderm_network=grnDF),mmTFs,only_TFs=TRUE)
These may be cumbersome, so instead it may be useful to limit which interactions to plot. For example, we can limit the plot to edges with weight above a certain threshold, using the "thresh" parameter. Alternatively, it may be more informative to look at top regulators and their top targets:
This function will color the nodes by community and optionally fade by betweenness. If the 'communities' parameter is not specified, 'find_communities' will be run to detect community structure.
plot_dynnet_detail(dynamic_grn,tfs=mmTFs,only_TFs=TRUE,order=c("epoch1..epoch1","epoch2..epoch2","epoch3..epoch3"),communities=NULL,compute_betweenness=TRUE)
In the same fashion as above, we can do this for both the dynamic network and static network. Limiting visualization to top regulators and their targets is less cumbersome than the above plots.
plot_top_regulators(dynamic_grn, gene_rank, mmTFs, numTopTFs=6, only_TFs=FALSE,order=c("epoch1..epoch1","epoch3..epoch3"))
# We can specify additional parameters including the number of top TFs and targets:
# plot_top_regulators(dynamic_grn, gene_rank, tfs, numTopTFs=3, numTargets=5, only_TFs=TRUE, order=c("epoch1..epoch1","epoch1..epoch2","epoch2..epoch2"))
If we are interested in specific target genes, we can use one of the following functions to plot those and their top regulators as follows.
interesting_targets<-c("Mesp1","Foxc1","Tbx6","Meis2")
# plot_targets_with_top_regulators(dynamic_grn,interesting_targets,weight_column="zscore")
plot_targets_with_top_regulators_detail(dynamic_grn,interesting_targets,epoch_assignments,weight_column="weighted_score",numTopRegulators=3,declutter=FALSE,order=c("epoch2..epoch2","epoch3..epoch3"))
We can use Epoch to compare networks. Here's an example of doing so at the edge level. In this instance we use Epoch to extract "differential networks".
Starting with the network we reconstructed in Example 1, we can compare it to a network reconstructed using data collected from mESC directed differentiation toward mesoderm guided by a separate treatment. Alternatively, such a method may be used to compare in vitro networks with in vivo networks.
In this section, Epoch requires at least two reconstructed networks (in order to carry out the comparison) and the epoch assignments for these networks. These inputs are derived from the reconstruction in the previous section.
First, load in the data. The reconstructed network and epoch assignments from the previous section are provided here as 'net1' and epochs1':
# load in the reconstructed network and epochs
net1<-utils_loadObject("data/example1_net.rda")
epochs1<-utils_loadObject("data/example1_epochs.rda")
# load in a second network and epochs
net2<-utils_loadObject("data/example2_net2.rda")
epochs2<-utils_loadObject("data/example2_epochs2.rda")
mmTFs<-utils_loadObject("data/mmTFs.rda")
We can compute the differential network between network1 (GRN reconstructed in Example 1) and network2 (GRN just loaded in).
# Run edge_uniqueness to tally differences in edges
res<-edge_uniqueness(list(network1=net1,network2=net2),mmTFs,weight_column="weighted_score")
# Run dynamic_difference_network to extract the dynamic differential network
network1_on<-dynamic_difference_network(res, list(network1=epochs1,network2=epochs2), "network1", type="on", diff_thresh=7.5, condition_thresh=10)
# Add interaction type
network1_on<-add_type(network1_on,"on",net1,list(x=net2))
The edges in the resulting differential network are those that are differentially active in network 1. We can tune the threshold "diff_thresh" to increase or decrease the difference threshold at which an edge is considered differentially active. We can tune the threshold "condition_thresh" to change the threshold at which an edge is considered active in a given network.
The weight_column parameter in edge_uniquness can be changed to reflect the proper edge weight. For example, other metrics of importance can be used in place of the crossweighted score, such as degree product.
This is what the differential network looks like:
head(network1_on$epoch2)
# TG TF network1 diff corr interaction
# 1 1700003F12Rik Nfia 4.666154 4.666154 0.2032498 activation
# 2 1700019D03Rik Peg3 3.126144 2.746125 -0.3131408 repression
# 3 1700019D03Rik Sp5 3.168844 3.003888 -0.3071091 repression
# 4 1700097N02Rik Basp1 4.037780 2.194030 -0.4412966 repression
# 5 1700097N02Rik Sox4 3.409004 2.396285 -0.4235951 repression
# 6 1700097N02Rik Zfp428 4.664102 2.319699 0.3700874 activation
head(network1_on$epoch3)
# TG TF network1 diff corr interaction
# 1 1700003F12Rik Mllt11 3.737788 3.513406 0.1830396 activation
# 2 1700019D03Rik Peg3 3.126144 2.746125 -0.3131408 repression
# 3 1700019D03Rik Sp5 3.168844 3.003888 -0.3071091 repression
# 4 1700097N02Rik Basp1 4.037780 2.194030 -0.4412966 repression
# 5 1700097N02Rik Id2 3.039247 2.389706 -0.2079808 repression
# 6 1700097N02Rik Sox4 3.409004 2.396285 -0.4235951 repression
We can find community structure within the differential network (or other dynamic network). (Also mentioned above)
communities<-find_communities(network1_on)
We can plot the differential network, with or without some details (e.g. colored by cluster, faded by betweenness). Tune threshold parameters when calling dynamic_difference_network to change strictness of including or not including an edge. Here's what the third epoch looks like (with relatively strict thresholds for readability), using two separate methods:
# plot_dyn_diffnet(network1_on,mmTFs,only_TFs=TRUE)
plot_diffnet_detail(network1_on,tfs=mmTFs,order=c("epoch3"),weight_column="network1")
The above plotting will re-find clusters via Louvain clustering. Alternatively, we can use the earlier plotting function 'plot_dynnet_detail'. This function is more general, and allows us to specify a community assignment to use. For example:
plot_dynnet_detail(network1_on,tfs=mmTFs,order=c("epoch3"),weight_column="network1",communities=communities,compute_betweenness=TRUE)
We can use Epoch to integrate signaling activity and trace paths through the network. Starting with the dynamic network we constructed in Example 1.
In this section, Epoch requires a reconstructed network (which can be derived from Example 1: Reconstruction. As described below, Epoch will also require pre-computed effector targets.
list12<-loadDataFromLoom("data/sampled_mesoderm_WAG.loom")
expDat<-list12[['expDat']]
sampTab<-list12[['sampTab']]
dynamic_grn<-utils_loadObject("data/example1_dyanmicGRN.rda")
xdyn <- findDynGenes(expDat, sampTab, group_column="cluster", pseudotime_column="latent_time")
Effector targets of major signaling pathways are pre-computed and available within Epoch (mouse: see 'data/effectortargets_mouse.rda'). These lists were computed by: (1) aquiring binding score (MACS2) data for 18 signaling effector TFs from the ChIP-Atlas (Oki et al., 2018), (2) target genes were ranked by maximum binding score, (3) the top 2000 targets were retained (or all retained, if less than 2000 targets).
Alternatively, here's how we can derive new effector target lists:
effectors<-load_SP_effectors(path="data/mouse_signaling_binding")
effectors<-score_targets(effectors)
effectortargets<-find_targets(effectors,column="max_score",by_rank=TRUE,n_targets=2000)
# Instead of "max_score", we could have also ranked by "mean_score" or "percent_freq".
# Instead of retaining the top n_targets, we could have also specified a cutoff threshold by specifying the 'threshold' parameter.
We can estimate signaling activity over time by quantifying the expression of effector targets. As an example, let's look at Wnt signaling. Unsurprisingly, we see strong activity as cells progress toward mesodermal fate.
Alternatively, we could instead look at all pathways to understand differences in major signaling pathways between treatments, conditions, etc. (See manuscript for details)
# Wnt signaling
wnt_targets<-effectortargets[c("WNT_Ctnnb1","WNT_Lef1","WNT_Tcf7","WNT_Tcf7l2")]
# lets separate by pre-treatment and treatment
# adding in the treatment information
ccells<-xdyn$cells
ccells$treatment<-sampTab$treatment[match(as.character(ccells$cell_name),as.character(sampTab$obs_names))]
# compute mean module expression over time
pre_exp<-mean_module_expression(expDat[,rownames(ccells[ccells$treatment=='pre',])],wnt_targets)
treatment_exp<-mean_module_expression(expDat[,rownames(ccells[ccells$treatment=='WAG',])],wnt_targets)
heatmap_by_treatment_group(list(pre=pre_exp,treatment=treatment_exp),sampTab,pseudotime_column="latent_time",limits=c(0,3),order_by="treatment")
Now that we have reconstructed a dynamic network and predicted effector target genes, we can integrate signaling pathway activity with the GRN. For example, we can trace the effects of Wnt signaling in activating a mesoderm-like program (or simply, some genes of interest).
# Some interesting mesoderm TFs, plus Sox17 (an endoderm TF)
interesting_genes<-c("Mesp1","Foxc1","Foxc2","Tbx6","Sox17")
# Paths from b-catenin to interesting genes
bcat_to_meso<-dynamic_shortest_path_multiple(dynamic_grn,effectortargets$WNT_Ctnnb1,interesting_genes)
bcat_to_meso<-cor_and_add_action(bcat_to_meso,expDat)
# Paths from Tcf7 to interesting genes
tcf7_to_meso<-dynamic_shortest_path_multiple(dynamic_grn,effectortargets$WNT_Tcf7,interesting_genes)
tcf7_to_meso<-cor_and_add_action(tcf7_to_meso,expDat)
# Paths from Tcf7l2 to interesting genes
tcf7l2_to_meso<-dynamic_shortest_path_multiple(dynamic_grn,effectortargets$WNT_Tcf7l2,interesting_genes)
tcf7l2_to_meso<-cor_and_add_action(tcf7l2_to_meso,expDat)
Here's what this looks like:
head(tcf7l2_to_meso)
# from to path distance action distance_over_average action_by_corr
# 1 Zfp703 Mesp1 Zfp703--Peg3--Mesp1 1.815449 1 0.6427952 1
# 2 Zfp703 Foxc1 Zfp703--Hey1--Foxc1 1.784579 1 0.6318649 1
# 3 Zfp703 Foxc2 Zfp703--Notch1--Foxc2 1.755712 1 0.6216439 1
# 4 Zfp703 Tbx6 Zfp703--Lmo2--Tbx6 1.794707 1 0.6354509 1
# 5 Mtf2 Mesp1 Mtf2--Utf1--Mesp1 1.907707 -1 0.6754609 -1
# 6 Mtf2 Foxc1 Mtf2--Utf1--Mesp1--Foxc1 2.747748 -1 0.9728935 -1
TF's in the column 'from' are targets of the effector TF Tcf7l2. Genes in the column "to" are targets from interesting_genes. The column 'path' describes the shortest path through the reconstructed network. The column 'distance_over_average' is the normalized distance (paths with values < 1 are shorter than the average path length).
Overly long paths and infinite-length paths (i.e. no path exists), indicate a network topology in which activation or inhibition of the given signaling pathway cannot easily control the given target. For example, unsurprisingly, Sox17 (an endoderm marker) is not reachable from any Wnt effector in this mesodermal network:
bcat_to_meso[tcf7l2_to_meso$to=="Sox17",]
# [1] from to path distance
# [5] action distance_over_average action_by_corr
# <0 rows> (or 0-length row.names)
tcf7_to_meso[tcf7l2_to_meso$to=="Sox17",]
# [1] from to path distance
# [5] action distance_over_average action_by_corr
# <0 rows> (or 0-length row.names)
tcf7l2_to_meso[tcf7l2_to_meso$to=="Sox17",]
# [1] from to path distance
# [5] action distance_over_average action_by_corr
# <0 rows> (or 0-length row.names)
By applying this method to dynamic networks reconstructed from different datasets, it's possible to elucidate topologies unsuitable or restrictive to specific cell fates. By apply this method more broadly via exhaustive search of paths from all major signaling effectors to specific sets of targets, it's possible to predict the necessary signaling activity required for directing specific cell fates.
(For more info on how we used this method to elucidate network topologies favoring mesodermal fate, to compare how activation and suppression of different signaling pathways impacts network topology, and a more in depth analysis on tracing paths from signaling effectors to target genes, see our manuscript).