This repository contains a PyTorch implementation of our paper Proximal Exploration for Model-guided Protein Sequence Design published at ICML 2022. Proximal Exploration (PEX) is a variant of directed evolution, which prioritizes the search for low-order mutants. Based this local-search mechanism, a model architecture called Mutation Factorization Network (MuFacNet) is developed to specialize in the local fitness landscape around the wild type.
The dependencies can be set up using the following commands:
conda create -n pex python=3.8 -y
conda activate pex
conda install pytorch=1.10.2 cudatoolkit=11.3 -c pytorch -y
conda install numpy=1.19 pandas=1.3 -y
conda install -c conda-forge tape_proteins=0.5 -y
pip install sequence-models==1.2.0
Clone this repository and download the oracle landscape models by the following commands:
git clone https://github.com/HeliXonProtein/proximal-exploration.git
cd proximal-exploration
bash download_landscape.sh
Run the following commands to reproduce our main results shown in section 5.1. There are eight fitness landscapes to support a diverse evaluation on black-box protein sequence design.
python run.py --alg=pex --net=mufacnet --task=avGFP # Green Fluorescent Proteins
python run.py --alg=pex --net=mufacnet --task=AAV # Adeno-associated Viruses
python run.py --alg=pex --net=mufacnet --task=TEM # TEM-1 β-Lactamase
python run.py --alg=pex --net=mufacnet --task=E4B # Ubiquitination Factor Ube4b
python run.py --alg=pex --net=mufacnet --task=AMIE # Aliphatic Amide Hydrolase
python run.py --alg=pex --net=mufacnet --task=LGK # Levoglucosan Kinase
python run.py --alg=pex --net=mufacnet --task=Pab1 # Poly(A)-binding Protein
python run.py --alg=pex --net=mufacnet --task=UBE2I # SUMO E2 conjugase
In the default configuration, the protein fitness landscape is simulated by a TAPE-based oracle model. By adding the argument --oracle_model=esm1b
, the landscape simulator is switched to an oracle model based on ESM-1b.
Please contact zhizhour[at]helixon.com for any questions related to the source code.