GNTD

Graph-guided Neural Tensor Decomposition (GNTD) is program for reconstructing whole spatial transcriptomes from spatial gene expression profiling data such as the dataets generated by Visium ST and Stereo-seq. GNTD employs tensor structures and formulations to explicitly model the high-order spatial gene expression data with a hierarchical nonlinear decomposition in a three-layer neural network, enhanced by spatial relations among the capture spots and gene functional relations (in protein-protein interaction networks) for accurate reconstruction from highly sparse spatial profiling data.

System and package requirements

The code was developed and tested on a machine with Linux system (Ubuntu 20.04.5 LTS), which has one 20-core Intel i9-10850K CPU, 64GB memory and one NVIDIA RTX 3080 12GB GPU.

The Python packages can be downloaded and run with the following library versions.

[python 3.8.12]
[numpy 1.21.5]
[scipy 1.7.3]
[pandas 1.2.3]
[scikit-learn 1.1.3]
[pytorch 1.10.2]
[scanpy 1.9.1]
[anndata 0.8.0]
[rpy2 3.5.5] (optional)

The R packages can be downloaded and run with the following library versions (optional).

[R 4.2.2]
[mclust 6.0.0]

Data preparation

Spatial Transcriptomics Data

Download Visium spatial transcriptomics data from 10x Genomics or spatialLIBD (require registration). We will use a mouse brain dataset as an example.

Under Visium Demonstration (v1 Chemistry) choose tab "Space Ranger v1.1.0". Click the "Mouse Brain Section (Coronal)" link, fill in the form and check the consent to access the data.

We only use filtered feature-barcode matrix data and spatial coordindates provided in the download file list. They can also be downloaded with following links filtered feature-barcode matrix data and spatial coordinates. The data files are usually less than 100M. It takes less than one minute to download the files.

Then unzip the downloaded data and organize folders using the following structure under a home-folder for your experiment:

    . <data-folder>
    ├── ...
    ├── <tissue-folder>
    │   ├── filtered_feature_bc_matrix
    │   │   ├── barcodes.tsv.gz
    │   │   ├── features.tsv.gz
    │   │   └── matrix.mtx.gz
    │   ├── spatial
    │   │   └── tissue_positions_list.csv
    └── ...

Protein-protein Interaction Data

The current PPI networks can be downloaded from BioGRID. This example uses mouse PPI network in BioGRID version 4.4.209. Download and unzip the BIOGRID-ORGANISM-4.4.209.tab3.zip file and place the "BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt" file under a folder as the PPI data (Please replace <species> with Mus_musculus to run the example).

    . <data-folder>
    ├── ...
    ├── BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt
    └── ...

Running GNTD to get the imputed spatial transcriptomics data

from GNTD import GNTD

raw_data_path = "<data-folder>/<tissue-folder>" # Path to spatial expression data
PPI_data_path = "<data-folder>/BIOGRID-ORGANISM-<species>-4.4.209.tab3.txt # Path to PPI data

rank = 128 # tensor rank
l = 0.1 # weight on graph regularization

model = GNTD(raw_data_path, PPI_data_path) # GNTD class initialization

model.preprocess() # Preprocessing, by default this function only prepares spatial 
                   # gene expression for highly variable genes, please set 
                   # use_highly_variable = False to get spatial gene expression
                   # for all genes

model.impute(rank, l) # Imputation

expr_mat, gene_names = model.get_imputed_expr_mat() # Return a spot by gene imputed expression matrix
                                                    # and corresponding gene names, where spots in the 
                                                    # matrix are overlapped with the tissue section, to
                                                    # get the imputed expression matrix for a subset of
                                                    # genes, please use the parameter gene_names, e.g.
                                                    # model.get_imputed_expr_mat(gene_names)

Clustering the imputed spatial transcriptomics data

model.preprocess(n_top_genes=3000) # To obtain better clustering performance, we highly 
                                   # recommmend to do imputation on highly variable genes, 
                                   # by default, top 3000 highly variable genes are selected
                                   # please see more details about highly variable genes 
                                   # selection (scanpy) in the following link:
                                   # https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html
                                   
model.impute(rank, l) # Imputation (Runtime: ~2mins)

Code snippet for spot clustering with mclust

import os
import numpy as np
from sklearn.decomposition import PCA

# To run clustering with mclust, please make sure you install 
# rpy2 in python, and mclust in R
os.environ['R_HOME'] = '<Path-to-R>' # Set R path
import rpy2.robjects as robjects
robjects.r.library("mclust") # Load library mclust
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
r_random_seed = robjects.r['set.seed']
r_random_seed(0)
Mclust = robjects.r['Mclust']

import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(5, 5))

n_components = 10 # the number of principal components
expr_mat, gene_names = model.get_imputed_expr_mat() # Get a spot by gene imputed expression matrix
                                                    # and corresponding gene names
expr_mat_hat = PCA(n_components=n_components).fit_transform(expr_mat) # Run PCA
mclust = Mclust(rpy2.robjects.numpy2ri.numpy2rpy(expr_mat_hat), 25, "EEE") # Run mclust 
clustering_labels = mclust[-2] # Extract clustering labels from mclust results
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization

# Visualization
ax.scatter(x_coords, y_coords, c=clustering_labels, cmap='tab20c', s=10)
ax.set_ylim(ax.get_ylim()[::-1])
ax.tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)

Gene visualization in the imputed spatial transcriptomics data

model.preprocess(use_highly_variable=False, use_all_entries=True) # Please set use_highly_variable=False to obtain 
                                                                  # expression profile for all genes

model.impute(rank, l) # Imputation (Runtime: ~10mins)

Code snippet for marker gene visualization

import matplotlib.pyplot as plt
%matplotlib inline

gene_name = 'LAMP2' # Select gene to visualize

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))

# Visualization for raw spatial expression 
expr_mat, gene_names = model.get_raw_expr_mat(gene_names=[gene_name]) # Get a spot by gene raw expression matrix
                                                                      # and corresponding gene names
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization
axs[0].scatter(x_coords, y_coords, c=expr_mat[:, 0], cmap='RdYlBu_r', s=10)
axs[0].set_ylim(axs[0].get_ylim()[::-1])
axs[0].tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)
axs[0].set_title('Raw', fontsize=16)
axs[0].set_ylabel(gene_names[0], fontsize=16)

# Visualization for imputed spatial expression 
expr_mat, gene_names = model.get_imputed_expr_mat(gene_names=[gene_name]) # Get a spot by gene raw expression matrix
                                                                          # and corresponding gene names, where spots in the 
                                                                          # matrix are overlapped with the tissue section
x_coords, y_coords = model.get_sp_coords() # Get spatial coorindates of spots for visualization
axs[1].scatter(x_coords, y_coords, c=expr_mat[:, 0], cmap='RdYlBu_r', s=10)
axs[1].set_ylim(axs[1].get_ylim()[::-1])
axs[1].tick_params('both', left = False, labelleft=False, bottom=False, labelbottom=False)
axs[1].set_title('Imputation', fontsize=16)

Reference

GNTD: Reconstructing Spatial Transcriptomes with Graph-guided Neural Tensor Decomposition Informed by Spatial and Functional Relations, Tianci Song, Charles Broadbent and Rui Kuang, Nature Communications, 14, Article number: 8276 (2023)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
GNTD		GNTD
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GNTD

System and package requirements

Data preparation

Spatial Transcriptomics Data

Protein-protein Interaction Data

Running GNTD to get the imputed spatial transcriptomics data

Clustering the imputed spatial transcriptomics data

Code snippet for spot clustering with mclust

Gene visualization in the imputed spatial transcriptomics data

Code snippet for marker gene visualization

Reference

About

Releases 1

Packages

Languages

License

kuanglab/GNTD

Folders and files

Latest commit

History

Repository files navigation

GNTD

System and package requirements

Data preparation

Spatial Transcriptomics Data

Protein-protein Interaction Data

Running GNTD to get the imputed spatial transcriptomics data

Clustering the imputed spatial transcriptomics data

Code snippet for spot clustering with mclust

Gene visualization in the imputed spatial transcriptomics data

Code snippet for marker gene visualization

Reference

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages