Graph-guided Neural Tensor Decomposition (GNTD) is program for reconstructing whole spatial transcriptomes from spatial gene expression profiling data such as the dataets generated by Visium ST and Stereo-seq. GNTD employs tensor structures and formulations to explicitly model the high-order spatial gene expression data with a hierarchical nonlinear decomposition in a three-layer neural network, enhanced by spatial relations among the capture spots and gene functional relations (in protein-protein interaction networks) for accurate reconstruction from highly sparse spatial profiling data.
The code was developed and tested on a machine with Linux system (Ubuntu 20.04.5 LTS), which has one 20-core Intel i9-10850K CPU, 64GB memory and one NVIDIA RTX 3080 12GB GPU.
The Python packages can be downloaded and run with the following library versions.
[python 3.8.12]
[numpy 1.21.5]
[scipy 1.7.3]
[pandas 1.2.3]
[scikit-learn 1.1.3]
[pytorch 1.10.2]
[scanpy 1.9.1]
[anndata 0.8.0]
[rpy2 3.5.5] (optional)
The R packages can be downloaded and run with the following library versions (optional).
[R 4.2.2]
[mclust 6.0.0]
Download Visium spatial transcriptomics data from 10x Genomics or spatialLIBD (require registration). We will use a mouse brain dataset as an example.
Under Visium Demonstration (v1 Chemistry) choose tab "Space Ranger v1.1.0". Click the "Mouse Brain Section (Coronal)" link, fill in the form and check the consent to access the data.
We only use filtered feature-barcode matrix data and spatial coordindates provided in the download file list. They can also be downloaded with following links filtered feature-barcode matrix data and spatial coordinates. The data files are usually less than 100M. It takes less than one minute to download the files.
Then unzip the downloaded data and organize folders using the following structure under a home-folder for your experiment:
. <data-folder>
├── ...
├── <tissue-folder>
│ ├── filtered_feature_bc_matrix
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── spatial
│ │ └── tissue_positions_list.csv
└── ...
The current PPI networks can be downloaded from BioGRID. This example uses mouse PPI network in BioGRID version 4.4.209. Download and unzip the BIOGRID-ORGANISM-4.4.209.tab3.zip file and place the "BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt" file under a folder as the PPI data (Please replace <species> with Mus_musculus to run the example).
. <data-folder>
├── ...
├── BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt
└── ...
from GNTD import GNTD
raw_data_path = "<data-folder>/<tissue-folder>" # Path to spatial expression data
PPI_data_path = "<data-folder>/BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt # Path to PPI data
rank = 128 # tensor rank
l = 0.1 # weight on graph regularization
model = GNTD(raw_data_path, PPI_data_path) # GNTD class initialization
model.preprocess() # Preprocessing, by default this function only prepares spatial
# gene expression for highly variable genes, please set
# use_highly_variable = False to get spatial gene expression
# for all genes
model.impute(rank, l) # Imputation
expr_mat, gene_names = model.get_imputed_expr_mat() # Return a spot by gene imputed expression matrix
# and corresponding gene names, where spots in the
# matrix are overlapped with the tissue section, to
# get the imputed expression matrix for a subset of
# genes, please use the parameter gene_names, e.g.
# model.get_imputed_expr_mat(gene_names)
model.preprocess(n_top_genes=3000) # To obtain better clustering performance, we highly
# recommmend to do imputation on highly variable genes,
# by default, top 3000 highly variable genes are selected
# please see more details about highly variable genes
# selection (scanpy) in the following link:
# https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html
model.impute(rank, l) # Imputation (Runtime: ~2mins)
import os
import numpy as np
from sklearn.decomposition import PCA
# To run clustering with mclust, please make sure you install
# rpy2 in python, and mclust in R
os.environ['R_HOME'] = '<Path-to-R>' # Set R path
import rpy2.robjects as robjects
robjects.r.library("mclust") # Load library mclust
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
r_random_seed = robjects.r['set.seed']
r_random_seed(0)
Mclust = robjects.r['Mclust']
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(5, 5))
n_components = 10 # the number of principal components
expr_mat, gene_names = model.get_imputed_expr_mat() # Get a spot by gene imputed expression matrix
# and corresponding gene names
expr_mat_hat = PCA(n_components=n_components).fit_transform(expr_mat) # Run PCA
mclust = Mclust(rpy2.robjects.numpy2ri.numpy2rpy(expr_mat_hat), 25, "EEE") # Run mclust
clustering_labels = mclust[-2] # Extract clustering labels from mclust results
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization
# Visualization
ax.scatter(x_coords, y_coords, c=clustering_labels, cmap='tab20c', s=10)
ax.set_ylim(ax.get_ylim()[::-1])
ax.tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)
model.preprocess(use_highly_variable=False, use_all_entries=True) # Please set use_highly_variable=False to obtain
# expression profile for all genes
model.impute(rank, l) # Imputation (Runtime: ~10mins)
import matplotlib.pyplot as plt
%matplotlib inline
gene_name = 'LAMP2' # Select gene to visualize
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))
# Visualization for raw spatial expression
expr_mat, gene_names = model.get_raw_expr_mat(gene_names=[gene_name]) # Get a spot by gene raw expression matrix
# and corresponding gene names
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization
axs[0].scatter(x_coords, y_coords, c=expr_mat[:, 0], cmap='RdYlBu_r', s=10)
axs[0].set_ylim(axs[0].get_ylim()[::-1])
axs[0].tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)
axs[0].set_title('Raw', fontsize=16)
axs[0].set_ylabel(gene_names[0], fontsize=16)
# Visualization for imputed spatial expression
expr_mat, gene_names = model.get_imputed_expr_mat(gene_names=[gene_name]) # Get a spot by gene raw expression matrix
# and corresponding gene names, where spots in the
# matrix are overlapped with the tissue section
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization
axs[1].scatter(x_coords, y_coords, c=expr_mat[:, 0], cmap='RdYlBu_r', s=10)
axs[1].set_ylim(axs[1].get_ylim()[::-1])
axs[1].tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)
axs[1].set_title('Imputation', fontsize=16)
GNTD: Reconstructing Spatial Transcriptomes with Graph-guided Neural Tensor Decomposition Informed by Spatial and Functional Relations, Tianci Song, Charles Broadbent and Rui Kuang, Nature Communications, 14, Article number: 8276 (2023)