SpaceFlow: Identifying Multicellular Spatiotemporal Organization of Cells using Spatial Transcriptome Data
SpaceFlow is Python package for identifying spatiotemporal patterns and spatial domains from Spatial Transcriptomic (ST) Data. Based on deep graph network, SpaceFlow provides the following functions:
- Encodes the ST data into low-dimensional embeddings that reflecting both expression similarity and the spatial proximity of cells in ST data.
- Incorporates spatiotemporal relationships of cells or spots in ST data through a pseudo-Spatiotemporal Map (pSM) derived from the embeddings.
- Identifies spatial domains with spatially-coherent expression patterns.
Check out our paper (Ren et al., Nature Communications, 2022) for the detailed methods and applications.
SpaceFlow was developed in Python 3.7
with Pytorch 1.9.0
. Specific package versions are available in requirements.txt
. The marker gene identification analysis is performed using Scanpy 1.8.1
package. The cell-cell communication inference is performed through CellChat v1.1.3
in a R v4.1.2
environment.
To install SpaceFlow, we recommend using the Anaconda Python Distribution and creating an isolated environment, so that the SpaceFlow and dependencies don't conflict or interfere with other packages or applications. To create the environment, run the following script in command line:
conda create -n spaceflow_env python=3.7
After create the environment, you can activate the spaceflow_env
environment by:
conda activate spaceflow_env
Please install Pytorch
that match your machine and environment first by following the instructions on :
https://pytorch.org/get-started/locally/
Note that if you want to install Pytorch
on a GPU
machine, you need to install CUDA first, see guide here for installing CUDA https://developer.nvidia.com/cuda-downloads.
After successfully installed Pytorch
with the version that >=1.9.0
, install the SpaceFlow package using pip
by:
pip install SpaceFlow
If the installation is still not successful, try to install the required packages in requirements.txt
by:
pip install -r requirements.txt
Quick Start by Example (Jupyter Notebook)
We will use the mouse organogenesis ST data from (Lohoff, T. et al. 2022) generated by seqFISH to demonstrate the usage of SpaceFlow.
The data is available in squidpy package, so we first import the squidpy
package and load the data. If squidpy
is not installed. Please run pip install squidpy
to install.
import squidpy as sq
import scanpy as sc
from SpaceFlow import SpaceFlow
adata = sq.datasets.seqfish()
sc.pp.filter_genes(adata, min_cells=3)
We can create a SpaceFlow object through either anndata.AnnData
object or the count matrix as input:
To construct SpaceFlow object by inputting an anndata.AnnData
object:
sf = SpaceFlow.SpaceFlow(adata=adata)
Parameters:
adata
: the count matrix of gene expression, 2D numpy array of size (# of cells, # of genes), typeanndata.AnnData
, seehttps://anndata.readthedocs.io/en/latest/
for more info aboutanndata
.
To SpaceFlow object by raw count matrix:
sf = SpaceFlow.SpaceFlow(count_matrix=adata.X, spatial_locs=adata.obsm['spatial'], sample_names=adata.obs_names, gene_names=adata.var_names)
Parameters:
count_matrix
: the count matrix of gene expression, 2D numpy array of size (# of cells, # of genes), typenumpy.ndarray
, optionalspatial_locs
: spatial locations of cells (or spots) match to rows of the count matrix, 1D numpy array of size (n_locations,), typenumpy.ndarray
, optionalsample_names
: list of sample names in 1D numpy str array of size (n_cells,), optionalgene_names
: list of gene names in 1D numpy str array of size (n_genes,), optional
Next, we preprocess the ST data by run:
sf.preprocessing_data(n_top_genes=3000)
Parameters:
n_top_genes
: the number of the top highly variable genes.
The preprocessing includes the normalization and log-transformation of the expression count matrix, the selection of highly variable genes, and the construction of spatial proximity graph using spatial coordinates. (Details see the preprocessing_data
function in SpaceFlow/SpaceFlow.py
)
We then train a spatially regularized deep graph network model to learn a low-dimensional embedding that reflecting both expression similarity and the spatial proximity of cells in ST data.
sf.train(spatial_regularization_strength=0.1, z_dim=50, lr=1e-3, epochs=1000, max_patience=50, min_stop=100, random_seed=42, gpu=0, regularization_acceleration=True, edge_subset_sz=1000000)
Parameters:
spatial_regularization_strength
: the strength of spatial regularization, the larger the more of the spatial coherence in the identified spatial domains and spatiotemporal patterns. (default: 0.1)z_dim
: the target size of the learned embedding. (default: 50)lr
: learning rate for optimizing the model. (default: 1e-3)epochs
: the max number of the epochs for model training. (default: 1000)max_patience
: the max number of the epoch for waiting the loss decreasing. If loss does not decrease for epochs larger than this threshold, the learning will stop, and the model with the parameters that shows the minimal loss are kept as the best model. (default: 50)min_stop
: the earliest epoch the learning can stop if no decrease in loss for epochs larger than themax_patience
. (default: 100)random_seed
: the random seed set to the random generators of therandom
,numpy
,torch
packages. (default: 42)gpu
: the index of the Nvidia GPU, if no GPU, the model will be trained via CPU, which is slower than the GPU training time. (default: 0)regularization_acceleration
: whether or not accelerate the calculation of regularization loss using edge subsetting strategy (default: True)edge_subset_sz
: the edge subset size for regularization acceleration (default: 1000000)
After the model training, the learned low-dimensional embedding can be accessed through sf.embedding
.
SpaceFlow will use this learned embedding to identify the spatial domains based on Leiden algorithm.
sf.segmentation(domain_label_save_filepath="./domains.tsv", n_neighbors=50, resolution=1.0)
Parameters:
domain_label_save_filepath
: the file path for saving the identified domain labels. (default: "./domains.tsv")n_neighbors
: the number of the nearest neighbors for each cell for constructing the graph for Leiden using the embedding as input. (default: 50)resolution
: the resolution of the Leiden clustering, the larger the coarser of the domains. (default: 1.0)
We next plot the spatial domains using the identified domain labels and spatial coordinates of cells.
sf.plot_segmentation(segmentation_figure_save_filepath="./domain_segmentation.pdf", colormap="tab20", scatter_sz=1., rsz=4., csz=4., wspace=.4, hspace=.5, left=0.125, right=0.9, bottom=0.1, top=0.9)
The expected output is:
Parameters:
segmentation_figure_save_filepath
: optional, type: str, the file path for saving the figure of the spatial domain visualization. (default: "./domain_segmentation.pdf")colormap
: optional, type: str, the colormap of the different domains, full colormap options see matplotlibscatter_sz
: optional, type: float, the marker size in points. (default: 1.0)rsz
: optional, type: float, row size of the figure in inches, (default: 4.0)csz
: optional, type: float, column size of the figure in inches, (default: 4.0)wspace
: optional, type: float, the amount of width reserved for space between subplots, expressed as a fraction of the average axis width (default: 0.4)hspace
: optional, type: float,the amount of height reserved for space between subplots, expressed as a fraction of the average axis height (default: 0.4)left
: optional, type: float, the leftmost position of the subplots of the figure in fraction (default: 0.125)right
: optional, type: float, the rightmost position of the subplots of the figure in fraction (default: 0.9)bottom
: optional, type: float, the bottom position of the subplots of the figure in fraction (default: 0.1)top
: optional, type: float, the top position of the subplots of the figure in fraction (default: 0.9)
We can also visualize the expert annotation for comparison by:
import scanpy as sc
sc.pl.spatial(adata, color="celltype_mapped_refined", spot_size=0.03)
The expected output is:
Next, we apply the diffusion pseudotime (dpt) algorithm to the learned spatially-consistent embedding to generate a pseudo-Spatiotemporal Map (pSM). This pSM represents a spatially-coherent pseudotime ordering of cells that encodes biological relationships between cells, such as developmental trajectories and cancer progression
sf.pseudo_Spatiotemporal_Map(pSM_values_save_filepath="./pSM_values.tsv", n_neighbors=20, resolution=1.0)
Parameters:
pSM_values_save_filepath
: the file path for saving the inferred pSM values.n_neighbors
: the number of the nearest neighbors for each cell for constructing the graph for Leiden using the embedding as input. (default: 20)resolution
: the resolution of the Leiden clustering, the larger the coarser of the domains. (default: 1.0)
We next visualize the identified pseudo-Spatiotemporal Map (pSM).
sf.plot_pSM(pSM_figure_save_filepath="./pseudo-Spatiotemporal-Map.pdf", colormap="roma", scatter_sz=1., rsz=4., csz=4., wspace=.4, hspace=.5, left=0.125, right=0.9, bottom=0.1, top=0.9)
The expected output is:
Parameters:
pSM_figure_save_filepath
: optional, type: str, the file path for saving the figure of the pSM visualization. (default: "./pseudo-Spatiotemporal-Map.pdf")colormap
: optional, type: str, the colormap of the pSM (default: 'roma'), full colormap options see Scientific Colormapsscatter_sz
:optional, type: float, the marker size in points. (default: 1.0)rsz
: optional, type: float, row size of the figure in inches, (default: 4.0)csz
: optional, type: float, column size of the figure in inches, (default: 4.0)wspace
: optional, type: float, the amount of width reserved for space between subplots, expressed as a fraction of the average axis width (default: 0.4)hspace
: optional, type: float,the amount of height reserved for space between subplots, expressed as a fraction of the average axis height (default: 0.4)left
: optional, type: float, the leftmost position of the subplots of the figure in fraction (default: 0.125)right
: optional, type: float, the rightmost position of the subplots of the figure in fraction (default: 0.9)bottom
: optional, type: float, the bottom position of the subplots of the figure in fraction (default: 0.1)top
: optional, type: float, the top position of the subplots of the figure in fraction (default: 0.9)
Ren, Honglei, et al. "Identifying multicellular spatiotemporal organization of cells with SpaceFlow." Nature Communications 13.1 (2022): 1-14. https://www.nature.com/articles/s41467-022-31739-w
If you have any questions or found any issues, please contact: hongleir@uci.edu.