This project aims to pretrain a large model using 1 million plant single-cell data and further pretrain a cell annotation model and a batch integration model based on this pretrained model. Ultimately, we can use these pretrained models for cell annotation, providing an efficient method for handling large-scale plant single-cell data.
- Data Preparation: Use 1 million plant single-cell data for training.
- Pretraining the Generative Model: Pretrain a generative model.
- Pretraining the Cell Annotation Model: Use the pretrained generative model's weights to further pretrain the cell annotation model on the 1 million plant single-cell data.
- Pretraining the Batch Integration Model: Use the pretrained generative model's weights to further pretrain the batch integration model on the 1 million plant single-cell data.
- Cell Annotation Application: Use the pretrained cell annotation model for cell annotation. You can choose to apply it directly or fine-tune it before application.
An article detailing the methodology and results of this project is currently being drafted and will be published soon. Stay tuned for more information.
Here are some example results generated by the scPlantGPT model:
- Python 3.11+
- Refer to
requirements.txt
for the list of dependencies.
-
Clone this repository
git@github.com:cgshuo/scPlantGPT.git
-
Create and activate a virtual environment (optional)
python -m venv venv source venv/bin/activate # For Windows, use `venv\Scripts\activate`
-
Install dependencies
pip install -r requirements.txt