This repository contains the codes required to reproduce the results in DRVI paper. Please refer to DRVI repo for the model, documentation, and help.
All relevant files are available in drvi_notebooks
directory.
data
contains code for pre-processing of the analyzed datasets.general
contains the code to run DRVI, DRVI-AP, scVI, and CVAE on different datasets.drvi_runvis.py
is a general Python script for running the models, and all the running configs are available atruns.sh
.baseline
contains the code to run PCA, ICA, and MOFA on different datasets.linear_baselines_runvis.py
is a general Python script for running the models, and all the running configs are available atbaseline_runs.sh
.evaluation
contains the code for disentanglement and integration benchmarking (Fig. 2 and supplemental figures related to benchmarking).analysis
contains the code for analysis of immune, HLCA, and developmental pancreas datasets, as well as the code required to produce the rest of the figures.utils
contains utils functions used all around the project. That is why one should install this repository before using notebooks.
Install dependencies in requirements.txt and follow the next steps.
Then run the following commands to be able to run .py
files as notebooks:
jupyter nbextension install jupytext --user --py
jupyter nbextension enable jupytext --user --py
Install the reproducibility package
git clone https://gitlab.com/moinfar/drvi_reproducibility.git
cd drvi_reproducibility
pip install -e .
Install Rapids and rapids-singlecell package for faster scanpy GPU accelerated functions. Read more about Rapids installation (here)[https://docs.rapids.ai/install].
pip install rapids-singlecell # Already in requirements
pip install \
--extra-index-url=https://pypi.nvidia.com \
"cudf-cu12==23.12.*" "dask-cudf-cu12==23.12.*" "cuml-cu12==23.12.*" \
"cugraph-cu12==23.12.*" "cuspatial-cu12==23.12.*" "cuproj-cu12==23.12.*" \
"cuxfilter-cu12==23.12.*" "cucim-cu12==23.12.*" "pylibraft-cu12==23.12.*" \
"raft-dask-cu12==23.12.*"
The information to obtain datasets used in the is as follows. For all datasets, pre-processing code is provided in drvi_notebooks/data
.
Download from figshare: https://figshare.com/ndownloader/files/25717328.
Download from cellxgene: https://cellxgene.cziscience.com/collections/6f6d381a-7701-4781-935c-db10d30de293
Download the full data at E15.5 using the scvelo package:
import scvelo
adata = scvelo.datasets.pancreas()
Download the data with finer annotation using the cellRank package:
import cellrank
adata = cellrank.datasets.pancreas(kind="raw")
Then follow the Update scvelo pancreas data annotations
section in cellrank_pancreas_data_preparation.py
.
Download the data using the pertpy package.
import pertpy
adata = pertpy.data.norman_2019()
Download from cellxgene: https://cellxgene.cziscience.com/collections/2f4c738f-e2f3-4553-9db2-0582a38ea4dc
Download the dataset from: https://zenodo.org/records/8133569 Cluster annotations (Table S2 of the main paper) are downloaded from https://ars.els-cdn.com/content/image/1-s2.0-S1534580723005774-mmc3.xlsx