scBiG for representation learning of single-cell gene expression data based on bipartite graph embedding

Overview

scBiG is a graph autoencoder network where the encoder based on multi-layer graph convolutional networks extracts high-order representations of cells and genes from the cell-gene bipartite graph, and the decoder based on the ZINB model uses these representations to reconstruct the gene expression matrix. By virtue of a model-driven self-supervised training paradigm, scBiG can effectively learn low-dimensional representations of both cells and genes, amenable to diverse downstream analytical tasks.

Installation

Please install scBiG from pypi with:

pip install scbig

Or clone this repository and use

pip install -e .

in the root of this repository.

For GPU users, please install the GPU version of dgl, it is available by visiting the official website: https://www.dgl.ai/pages/start.html

Quick start

Load the data to be analyzed:

import scanpy as sc
# data is the count matrix
adata = sc.AnnData(data)

Perform data pre-processing with scanpy:

# Basic filtering
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.filter_cells(adata, min_genes=200)

adata.raw = adata.copy()

# Total-count normlize, logarithmize the data, calculate the gene size factor 
sc.pp.normalize_per_cell(adata)
adata.obs['cs_factor'] = adata.obs.n_counts / np.median(adata.obs.n_counts)
sc.pp.log1p(adata)
# Calculate the gene size factor 
adata.var['gs_factor'] = np.max(adata.X, axis=0, keepdims=True).reshape(-1)

Run the scBiG method:

from scbig import run_scbig
adata = run_scbig(adata)

The output adata contains the cell embeddings in adata.obsm['feat'] and the gene embeddings in adata.varm['feat']. The embeddings can be used as input of other downstream analyses.

Please refer to tutorial.ipynb for a detailed description of scBiG's usage.

If users use Seurat for pre-processing and then use scBiG for subsequent analysis, we provide R_tutorial.Rmd as a reference.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
reproducibility		reproducibility
scbig		scbig
Human_pancreatic_islets.h5		Human_pancreatic_islets.h5
LICENSE		LICENSE
README.md		README.md
R_tutorial.Rmd		R_tutorial.Rmd
R_tutorial.html		R_tutorial.html
overview.png		overview.png
setup.py		setup.py
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scBiG for representation learning of single-cell gene expression data based on bipartite graph embedding

Overview

Installation

Quick start

About

Releases 2

Packages

Languages

License

sldyns/scBiG

Folders and files

Latest commit

History

Repository files navigation

scBiG for representation learning of single-cell gene expression data based on bipartite graph embedding

Overview

Installation

Quick start

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages