Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about intuition behind AF2 monomer for validation #70

Open
eswan01 opened this issue Oct 22, 2024 · 1 comment
Open

Question about intuition behind AF2 monomer for validation #70

eswan01 opened this issue Oct 22, 2024 · 1 comment

Comments

@eswan01
Copy link

eswan01 commented Oct 22, 2024

Thanks for sharing this great binder generation pipeline! I have a few questions about the MPNN redesign/re-prediction step:

In your paper, you state: "These [MPNN-]optimized sequences are then re-predicted using the AF2 monomer model, with 3 recycles and 2-template based models in single sequence mode, to ensure robust and unbiased complex assessment." I'm finding that for some target/binder pairs, my trajectories produce good-looking binders, but they fail on monomer validation, specifically on iptm, as they're poorly docked by the monomer model. Could you please elaborate on the intuition behind using the AF2 monomer weights to fold the designed complex?

Supposing that I only use multimer weights in design and validation, do you think one could achieve similar "robustness" by using models [0, 1, 2] for design and [3, 4] for validation?

The complex and binder prediction models are defined and prepared in

BindCraft/bindcraft.py

Lines 194 to 203 in d2d3cd0

complex_prediction_model = mk_afdesign_model(protocol="binder", num_recycles=advanced_settings["num_recycles_validation"], data_dir=advanced_settings["af_params_dir"],
use_multimer=multimer_validation)
complex_prediction_model.prep_inputs(pdb_filename=target_settings["starting_pdb"], chain=target_settings["chains"], binder_len=length, rm_target_seq=advanced_settings["rm_template_seq_predict"],
rm_target_sc=advanced_settings["rm_template_sc_predict"])
# compile binder monomer prediction model
binder_prediction_model = mk_afdesign_model(protocol="hallucination", use_templates=False, initial_guess=False,
use_initial_atom_pos=False, num_recycles=advanced_settings["num_recycles_validation"],
data_dir=advanced_settings["af_params_dir"], use_multimer=multimer_validation)
binder_prediction_model.prep_inputs(length=length)

The complex prediction model is then called in

BindCraft/bindcraft.py

Lines 223 to 228 in d2d3cd0

### Predict mpnn redesigned binder complex using masked templates
mpnn_complex_statistics, pass_af2_filters = masked_binder_predict(complex_prediction_model,
mpnn_sequence['seq'], mpnn_design_name,
target_settings["starting_pdb"], target_settings["chains"],
length, trajectory_pdb, prediction_models, advanced_settings,
filters, design_paths, failure_csv)

When I dig into the masked_binder_predict code, it doesn't look to me like the templates are being masked, since by default in the advanced settings, rm_template_seq_predict and rm_template_sc_predict are both false. Could you please elaborate on the intention behind masked_binder_predict (beyond just refolding with different weights) and how it's being used here?

Best,
Erik

@martinpacesa
Copy link
Owner

Hi there! BindCraft was built to be robust, our goal was to make a pipeline where every binder would be potentially working in the lab, rather than having to screen hundreds experimentally, that's why some of the steps might seem over the top. So there is definitely a trade off between design accuracy and success, and certain targets might be failing or even some good binders might be filtered out. That's why in the end we use the monomer model to filter, which has never seen complexes, because if that model thinks it is likely to form a complex it's a more confident prediction than with multimer which has the propensity to form complexes to begin with. Also, with multimer we design with 5 models, rather than 2 if we used monomer, which reduces the chances of making adverserial sequences. Therefore I would avoid using multimer for both design and validation as you might end up with sequences that look good to AF2 but might be overfitted to please the model.

Yeah the masked binder predict needs to be renamed, we used to do masking but moved away from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants