For the purpose of using the model architecture and corresponding functions, please use the following repository "ProtWaveVAE Model functions" with tutorial code snippet included or use the following pip install ProtWave-VAE
and follow the same tutorial code snippet. For reproducing the results of our recent paper titled "ProtWaveVAE: Integrating Autoregressive Sampling with Latent-based Inference for Data-driven Protein Design", please follow use this repository and follow instructions below.
This repository contains source code and scripts for reproducing the results of the paper titled "ProtWaveVAE: Integrating Autoregressive Sampling with Latent-based Inference for Data-driven Protein Design". The project is divided into three subfolders, each containing scripts and source code for reproducing specific tasks discussed in the paper:
Benchmark_project
: Contains scripts and instructions for reproducing fitness and function benchmarking tasks from TAPE and FLIP using ProtWave-VAE.Pfam_analysis
: Contains scripts and instructions for reproducing protein family latent inference studies, Chorismate mutase semi-supervised learning tasks, and C-terminus diversification with latent conditioning.SH3_design_project
: Contains scripts and instructions for designing protein sequences that were experimentally tested.
For detailed instructions and steps, navigate to one of the three folders.
Follow these step-by-step instructions to install and set up the project, including downloading and installing any dependencies:
- Create a new virtual environment (e.g., use a conda environment):
conda create --name ProtWaveVAE_env python=3.8
Optionally, upgrade pip:
python -m pip install --upgrade pip
- Activate the environment and install library packages:
source activate ProtWaveVAE_env
pip install -r requirements.txt
- Enter the directory for reproducing tasks and follow the task-specific instructions in the given directory's README.md:
# example for entering the directory for reproducing the benchmark tasks
cd Benchmark_project
This project requires the following dependencies:
- PyTorch
- torchvision
- PyTorch-lightning
Please note that for training PyTorch models, some of the dependencies require an NVIDIA GPU with CUDA support. If your system does not have an NVIDIA GPU, you can still run the code, but the training process will be significantly slower as it will use the CPU for computation.
To check if your system has an NVIDIA GPU and if it supports CUDA, you can visit the NVIDIA CUDA GPUs page.