Docker

Docker Setup and Usage

This section outlines how to use Docker to build and run this project. We provide three Docker targets to cater to different needs: for running the application (running), for general development (development), and for development with a JupyterLab environment (develop_jupyter).

Building Docker Images

For Running the Application (running): This target creates an image optimized for running the application. It includes only the necessary dependencies and the project's runtime environment.
```
docker build --target running -t scgpt:latest .
```
For Development (development): The development target includes additional tools and dependencies for development purposes, such as testing libraries and development servers. This image is suitable for developers looking to write and test their code.
```
docker build --target development -t scgpt:dev .
```
For JupyterLab Development Environment (develop_jupyter): This target builds an image with JupyterLab installed, providing an interactive development environment accessible via a web browser. It's ideal for data exploration, visualization, and running Jupyter notebooks.
```
docker build --target develop_jupyter -t scgpt:develop_jupyter .
```

Running the Docker Containers

Running the Application with Volume Mounting: The following command runs the application and mounts the current directory to /app in the container. This setup allows for real-time synchronization between your local files and the files inside the container.
```
docker run -v $(pwd):/app scgpt:latest 
```
Using an .env File for Environment Variables: Securely pass environment variables, such as API keys, to your container using an .env file. Create this file in your project root with the required variables and use the --env-file option to load it into the container.
```
WANDB_API_KEY=your_wandb_api_key_here
```
```
docker run --env-file ./.env -v $(pwd):/app scgpt:latest 
```
JupyterLab Development Environment: To access JupyterLab, run the container with port forwarding. The command below maps your machine's port 8888 to the container's port 8888. With volume mounting, you can edit your project files from JupyterLab directly.
```
docker run -p 8888:8888 -v $(pwd):/app scgpt:develop_jupyter
```
Access JupyterLab by navigating to http://localhost:8888 in your browser.

This documentation provides instructions for leveraging Docker to streamline the setup and development process for this project, offering flexibility across different development environments.

scGPT

This is the official codebase for scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.

!UPDATE: We have released several new pretrained scGPT checkpoints. Please see the Pretrained scGPT checkpoints section for more details.

[2023.12.31] New tutorials about zero-shot applications are now available! Please see find them in the tutorials/zero-shot directory. We also provide a new continual pretrained model checkpoint for cell embedding related tasks. Please see the notebook for more details.

[2023.11.07] As requested by many, now we have made flash-attention an optional dependency. The pretrained weights can be loaded on pytorch CPU, GPU, and flash-attn backends using the same load_pretrained function, load_pretrained(target_model, torch.load("path_to_ckpt.pt")). An example usage is also here.

[2023.09.05] We have release a new feature for reference mapping samples to a custom reference dataset or to all the millions of cells collected from CellXGene! With the help of the faiss library, we achieved a great time and memory efficiency. The index of over 33 millions cells only takes less than 1GB of memory and the similarity search takes less than 1 second for 10,000 query cells on GPU. Please see the Reference mapping tutorial for more details.

Online apps

scGPT is now available at the following online apps as well, so you can get started simply with your browser!

Run the reference mapping app, cell annotation app and the GRN inference app with cloud gpus. Thanks to the Superbio.ai team for helping create and host the interactive tools.

Installation

scGPT works with Python >= 3.7.13 and R >=3.6.1. Please make sure you have the correct version of Python and R installed pre-installation.

scGPT is available on PyPI. To install scGPT, run the following command:

pip install scgpt "flash-attn<1.0.5"  # optional, recommended
# As of 2023.09, pip install may not run with new versions of the google orbax package, if you encounter related issues, please use the following command instead:
# pip install scgpt "flash-attn<1.0.5" "orbax<0.1.8"

[Optional] We recommend using wandb for logging and visualization.

pip install wandb

For developing, we are using the Poetry package manager. To install Poetry, follow the instructions here.

$ git clone this-repo-url
$ cd scGPT
$ poetry install

Note: The flash-attn dependency usually requires specific GPU and CUDA version. If you encounter any issues, please refer to the flash-attn repository for installation instructions. For now, May 2023, we recommend using CUDA 11.7 and flash-attn<1.0.5 due to various issues reported about installing new versions of flash-attn.

Pretrained scGPT Model Zoo

Here is the list of pretrained models. Please find the links for downloading the checkpoint folders. We recommend using the whole-human model for most applications by default. If your fine-tuning dataset shares similar cell type context with the training data of the organ-specific models, these models can usually demonstrate competitive performance as well. A paired vocabulary file mapping gene names to ids is provided in each checkpoint folder. If ENSEMBL ids are needed, please find the conversion at gene_info.csv.

Model name	Description	Download
whole-human (recommended)	Pretrained on 33 million normal human cells.	link
continual pretrained	For zero-shot cell embedding related tasks.	link
brain	Pretrained on 13.2 million brain cells.	link
blood	Pretrained on 10.3 million blood and bone marrow cells.	link
heart	Pretrained on 1.8 million heart cells	link
lung	Pretrained on 2.1 million lung cells	link
kidney	Pretrained on 814 thousand kidney cells	link
pan-cancer	Pretrained on 5.7 million cells of various cancer types	link

Fine-tune scGPT for scRNA-seq integration

Please see our example code in examples/finetune_integration.py. By default, the script assumes the scGPT checkpoint folder stored in the examples/save directory.

To-do-list

Contributing

We greatly welcome contributions to scGPT. Please submit a pull request if you have any ideas or bug fixes. We also welcome any issues you encounter while using scGPT.

Acknowledgements

We sincerely thank the authors of following open-source projects:

Citing scGPT

@article{cui2023scGPT,
title={scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI},
author={Cui, Haotian and Wang, Chloe and Maan, Hassaan and Pang, Kuan and Luo, Fengning and Wang, Bo},
journal={bioRxiv},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.vscode		.vscode
data		data
docs		docs
examples		examples
scgpt		scgpt
tests		tests
tutorials		tutorials
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker

Docker Setup and Usage

Building Docker Images

Running the Docker Containers

scGPT

Online apps

Installation

Pretrained scGPT Model Zoo

Fine-tune scGPT for scRNA-seq integration

To-do-list

Contributing

Acknowledgements

Citing scGPT

About

Releases

Packages

Languages

License

sonder-art/scGPT

Folders and files

Latest commit

History

Repository files navigation

Docker

Docker Setup and Usage

Building Docker Images

Running the Docker Containers

scGPT

Online apps

Installation

Pretrained scGPT Model Zoo

Fine-tune scGPT for scRNA-seq integration

To-do-list

Contributing

Acknowledgements

Citing scGPT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages