Introducing a way to induce diversity in the AF2 ensemble by spanning the conformational ensemble and identifying possible states.

AFsample2 is a generative protein structure prediction system based on AF2 that is able to induce significant conformational diversity for a given protein.
See article: AFsample2 predicts multiple conformations and ensembles with AlphaFold2
Related datasets available at Zenodo
All possible combinations of methods that are implemented here
| Model preset | Method | Runs |
|---|---|---|
| Monomer | AFvanilla/AFsample | ✅ |
| Monomer | AFsample2 | ✅ |
| Monomer | SPEACH_AF | ✅ |
| Monomer | MSAsubsampling | ✅ |
| Multimer | AFvanilla/AFsample | ✅ |
| Multimer | AFsample2 | ✅ |
| Multimer | SPEACH_AF | ✅ |
| Multimer | MSAsubsampling | ❌ |
# Docker
docker pull kyogesh/afsample2:v1.1
# Docker usage
docker run --gpus 1 --volume <path-to-databases>:/databases \
--volume <path-to-inputs>:/inputs \
--volume <path-to-outputs>:/outputs \
-it kyogesh/afsample2:v1.1 \
--method afsample2 \
--fasta_paths inputs/example.fasta \
--flagfile /app/alphafold/AF_multitemplate/monomer_full_dbs.flag \
--nstruct 4 \
--msa_rand_fraction 0.20 \
--model_preset=monomer \
--output_dir examples/# Apptainer
apptainer pull docker://kyogesh/afsample2:v1.1
apptainer run --nv \
-B <database_path>:/databases \
-B examples/:/input \
-B AF_multitemplate:/app/alphafold/AF_multitemplate \
afsample2_v1.1.sif \
--method afsample2 \
--fasta_paths /input/P31133/P31133.fasta \
--flagfile /app/alphafold/AF_multitemplate/monomer_full_dbs.flag \
--nstruct 10 \
--model_preset monomer \
--output_dir /input/ \
--use_precomputed_features=True \
--dropout=True- Install Miniconda
- Setup environment
# Clone this repository
git clone https://github.com/iamysk/AFsample2.git
cd AFsample2/
# install dependencies
conda env create -n <env_name> --file=environment.yaml
conda activate <env_name>
python -m pip install -r requirements.txt
- Make sure that all sequence databases are available at
<data_path>. Follow the official AlphaFold guide here to set up databases.
cd scripts
chmod +x download_all_data.sh
./download_all_data.sh <data_path> reduced_dbsFollow the steps to generate a diverse conformational ensemble for a given <fasta_path>.
'''
Inputs:
<method>: Method to run among ['afsample2', 'speachaf', 'af2', 'msasubsampling']
<fasta_paths>: path to .fasta file
<flagfile> : AF2 specific parameter file
<nstruct>: Number of structures to generate
<msa_rand_fraction>: % MSA randomization in random msa_perturbation_mode
<models_to_use>: (Optional) AF2 model to use (model_1, model_2 ...)
# Outputs:
# <output_dir>: Path to output directory
'''
# Example usage (AFsample2)
python AF_multitemplate/run_afsample2.py --method afsample2 \
--fasta_paths examples/P31133/P31133.fasta \
--flagfile AF_multitemplate/monomer_full_dbs.flag \
--nstruct 1 \
--msa_rand_fraction 0.20 \
--model_preset=monomer \
--output_dir examples/
Other useful flags (run <AF_multitemplate/run_afsample2.py --help> for more details)
| flag | Options | Usage |
|---|---|---|
| --use_precomputed_features | Bool | Whether to use precomputed features file (msa_features.pkl). All database paths in flagfile will be ignored. |
| --msa_file | path_to_msa | Single MSA file (e.g., .a3m from mmseqs2). All database paths in flagfile will be ignored. |
| --msa_perturbation_mode | <random, profile> | To choose MSA perturbation mode |
'''
Inputs:
<afout_path>: Path to generated models
<pdb_state1>: Reference PDB of state1
<pdb_state1>: Reference PDB of state1
<ncpu>: number of cores to use
# Outputs:
# final_df_ref1-ref2.csv file saved at results/
'''
# Example usage (If references available)
python src/analyse_models.py --method afsample2 \
--protein 8E6Y \
--afout_path examples/8E6Y/ \
--pdb_state1 examples/8E6Y/referencea/2fs1_A.pdb \
--pdb_state2 examples/8E6Y/referencea/8e6y_A.pdb \
--clustering=False \
--ncpu=16OUTPUT:
___ ______ __ ___
/ | / ____/________ _____ ___ ____ / /__ |__
/ /| | / /_ / ___/ __ `/ __ `__ \/ __ \/ / _ \__/ /
/ ___ |/ __/ (__ ) /_/ / / / / / / /_/ / / __/ __/
/_/ |_/_/ /____/\__,_/_/ /_/ /_/ .___/_/\___/____/
/_/
2025-01-08 14:23:02,328 [INFO] Analyzing models...
2025-01-08 14:23:02,328 [INFO] Reference state1, state2: examples/8E6Y/referencea/2fs1_A.pdb, examples/8E6Y/referencea/8e6y_A.pdb
2025-01-08 14:23:02,329 [INFO] Found 10 models in examples/8E6Y
2025-01-08 14:23:02,708 [INFO] Low confidence (mean plddt<50) residue indices: []
2025-01-08 14:23:02,711 [INFO] Most confident model: examples/8E6Y/unrelaxed_model_1_pred_4_dropout.pdb, Confidence: 86.42021052631578
examples/8E6Y/referencea/2fs1_A.pdb examples/8E6Y/referencea/8e6y_A.pdb
2025-01-08 14:23:02,712 [INFO] Received reference PDBs: examples/8E6Y/referencea/2fs1_A.pdb, examples/8E6Y/referencea/8e6y_A.pdb
TM-align (examples/8E6Y/referencea/2fs1_A.pdb - models): 100%|██| 10/10 [00:00<00:00, 245280.94it/s]
TM-align (examples/8E6Y/referencea/8e6y_A.pdb - models): 100%|██| 10/10 [00:00<00:00, 170500.16it/s]
2025-01-08 14:23:03,116 [INFO] Alignments done. TM-align outputs saved at examples/8E6Y
2025-01-08 14:23:03,126 [INFO] >> State 1: examples/8E6Y/referencea/2fs1_A.pdb
2025-01-08 14:23:03,126 [INFO] >> State 2: examples/8E6Y/referencea/8e6y_A.pdb
2025-01-08 14:23:03,126 [INFO] >> Results CSV saved at results/afsample2/final_df_8E6Y_s1-s2.csv# Example usage (If references NOT available)
python src/analyse_models.py --method afsample2 --protein 8E6Y --afout_path examples/8E6Y/ --clustering=False--ncpu=16
All data and scripts required to generate the plots in the manuscript are provided here. An overview of the directory structure, along with a description of each folder and its contents is provided in the dataset page. Extract as follows.
tar --use-compress-program=unzstd -xvf input_datasets.tar.zst
└── input_datasets
├── oc23
│ ├── fastas
│ ├── filtered_dict.pickle # pdbids and stats for states
│ ├── msas # in .pkl format
│ └── pdbs
└── tp16
├── fastas
├── filtered_dict.pickle
├── msas
└── pdbs
tar --use-compress-program=unzstd -xvf generated_models.tar.zst
└── generated_models
├── oc23
│ ├── afsample2
│ ├── SPEACH_AF
│ ├── ...
└── tp16
├── afsample2
├── SPEACH_AF
├── ...
tar --use-compress-program=unzstd -xvf analysis_results.tar.zst
└── analysis_results
├── oc23
│ ├── afsample2
│ ├── SPEACH_AF
│ ├── ...
└── tp16
├── afsample2
├── SPEACH_AF
├── ...
@article {Kalakoti2024.05.28.596195,
author = {Kalakoti, Yogesh and Wallner, Bj{\"o}rn},
title = {AFsample2: Predicting multiple conformations and ensembles with AlphaFold2},
elocation-id = {2024.05.28.596195},
year = {2024},
doi = {10.1101/2024.05.28.596195},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/06/02/2024.05.28.596195},
eprint = {https://www.biorxiv.org/content/early/2024/06/02/2024.05.28.596195.full.pdf},
journal = {bioRxiv}
}
@article{Wallner2023,
title = {AFsample: improving multimer prediction with AlphaFold using massive sampling},
volume = {39},
ISSN = {1367-4811},
url = {http://dx.doi.org/10.1093/bioinformatics/btad573},
DOI = {10.1093/bioinformatics/btad573},
number = {9},
journal = {Bioinformatics},
publisher = {Oxford University Press (OUP)},
author = {Wallner, Bj\"{o}rn},
editor = {Kelso, Janet},
year = {2023},
month = sep
}
APACHE