This is the official implementation of the DeepVelo method. DeepVelo employs cell-specific kinetic rates and provides more accurate RNA velocity estimates for complex differentiation and lineage decision events in heterogeneous scRNA-seq data. Please check out the paper for more details.
Please note that using the pip version is currently recommended. The currently supported python versions are 3.7
, 3.8
, and 3.9
.
pip install deepvelo
The dgl
cpu version is installed by default. For GPU acceleration, please install a proper dgl gpu version compatible with your CUDA environment.
pip uninstall dgl # remove the cpu version
# replace cu101 with your desired CUDA version and run the following
pip install "dgl-cu101>=0.4.3,<0.7"
We use poetry to manage dependencies.
poetry install
This will install the exact versions in the provided poetry.lock file. If you want to install the latest version for all dependencies, use the following command.
poetry update
We provide a number of notebooks in the examples folder to help you get started. This folder contains analyses from the paper, as well as a minimal python notebook.
DeepVelo fully integrates with scanpy and scVelo. The basic usage is as follows:
import anndata as ann
import deepvelo as dv
import scvelo as scv
adata = ann.read_h5ad("..") # load your data in AnnData here - modify the path accordingly
# preprocess the data
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_neighbors=30, n_pcs=30)
# run DeepVelo using the default configs
trainer = dv.train(adata, dv.Constants.default_configs)
# this will train the model and predict the velocity vectore. The result is stored in adata.layers['velocity']. You can use trainer.model to access the model.
# Plot the velocity results
scv.tl.velocity_graph(adata, n_jobs=4)
scv.pl.velocity_embedding_stream(
adata,
basis="umap",
color="clusters",
legend_fontsize=9,
dpi=150
)
If you can not fit a large dataset into (GPU) memory using the default configs, please try setting a small inner_batch_size
in the configs, which can reduce the memory usage and maintain the same performance.
Currently the training works on the whole graph of cells, we plan to release a flexible version using graph node sampling in the near future.