Get representations from AlphaFold's Evoformer. Setup the environment from AlphaFlow.
In an environment with Python 3.9 (for example, mamba create -n [NAME] python=3.9
), run:
pip install numpy==1.21.2 pandas==1.5.3
pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install biopython==1.79 dm-tree==0.1.6 modelcif==0.7 ml-collections==0.1.0 scipy==1.7.1 absl-py einops
pip install pytorch_lightning==2.0.4 fair-esm mdtraj wandb
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@103d037'
- Prepare a input CSV with an
name
andseqres
entry for each row. Seesplits/demo.csv
for examples. - If running an AlphaFlow model, prepare an MSA directory and place the alignments in
.a3m
format at the following paths:{alignment_dir}/{name}/a3m/{name}.a3m
. If you don't have the MSAs, there are two ways to generate them:- Query the ColabFold server with
python -m scripts.mmseqs_query --split [PATH] --outdir [DIR]
. - Download UniRef30 and ColabDB according to https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh and run
python -m scripts.mmseqs_search_helper --split [PATH] --db_dir [DIR] --outdir [DIR]
.
- Query the ColabFold server with
- If running an MD+Templates model, place the template PDB files into a templates directory with filenames matching the names in the input CSV. The PDB files should include only a single chain with no residue gaps.
The basic command for getting Evoformer representation from is:
python3 predict_representation.py --mode alphafold --input_csv [PATH] --msa_dir [DIR] --outpdb [DIR]
Download the pretrained AlphaFold weights into the repository root via
wget https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
tar -xvf alphafold_params_2022-12-06.tar params_model_1.npz
This code is based on AlphaFlow[1]
- Bowen Jing, Bonnie Berger, & Tommi Jaakkola. (2024). AlphaFold Meets Flow Matching for Generating Protein Ensembles.