diff --git a/.DS_Store b/.DS_Store index 2a39a37f..f36b2d42 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/docs/.DS_Store b/docs/.DS_Store index 5fc28be4..10c844cd 100644 Binary files a/docs/.DS_Store and b/docs/.DS_Store differ diff --git a/docs/Texts/.DS_Store b/docs/Texts/.DS_Store index 3f9bbd10..6296bcab 100644 Binary files a/docs/Texts/.DS_Store and b/docs/Texts/.DS_Store differ diff --git a/docs/Texts/0/01.md b/docs/Texts/0/01.md index 29ec39cd..c89e6b2d 100644 --- a/docs/Texts/0/01.md +++ b/docs/Texts/0/01.md @@ -1,6 +1,6 @@ # The Question! --- -##### Semi-Private | S.Alireza Hashemi +##### Unpublished | S.Alireza Hashemi
After the first-round results were announced, we met with (@Kave_nasr) to tackle the following question: Could we establish a single metric—or a combination of metrics—that reliably indicates binding confidence among the top-ranked candidates? diff --git a/docs/Texts/0/02.md b/docs/Texts/0/02.md new file mode 100644 index 00000000..bd189012 --- /dev/null +++ b/docs/Texts/0/02.md @@ -0,0 +1,81 @@ +# The Quest! +--- +##### Unpublished | S.Alireza Hashemi + + +## Overview +
+One of the things that stands out after the Round 2 results were published is how well the “surgey” tool pairs with the test sequences. This tool shows how ineffective AlphaFold or even ESMFold can be in the case of point mutation predictions. Essentially, the tool mutates one residue at a time to alanine (A) and then predicts the structures of these newly mutated sequences. The results indicate a significant mismatch and highlight the model’s tendency to resist changing the structure. +
+ +[X-Link](https://x.com/sokrypton/status/1812769477228200086) - +[Colab-Link](https://colab.research.google.com/github/sokrypton/ColabBio/blob/main/notebooks/replacement_scan.ipynb) + + + +
+ ![tendency](./comp0.png){ width="1000" } +
+
+Resist to changing the structure, 1 - 25 - 50 and 75 mutation which performs on Elian sequence make it + + + from + • SAGQATLDHVKARADKSKTLEELKELRKEAYEDVWKAYMAVVDETEKKIAEAKATGKGDGKKISEEGANALKTLQNTYGSLADIIDEKAKELRAAEEAAK + to + • AAAAAALAAAAAAAAAAATAEELAAAAAAAYAAAAAAAAAAAAAAAAAAKAAGAAAAAAIAAAAAAAAAAAAAAAAAIAAAAAEAAAAAAAA +by keeping the same structure (on 25 mutations even confidence is high) + + +## Two Key Observations + +I tried out some Round 1 and Round 2 models with this tool, and two things caught my attention: + +1. **Predicting Authenticity and Bias** + Can we predict how genuine these structures are, and how biased they might be? + +2. **Meaningful Correlation** + Do we see a meaningful correlation here? + +
+Answering the second question requires time and effort to evaluate all possibilities. However, in some cases, we observed that structures are more sensitive to mutations. This might suggest that such sequences deviate from the model’s inherent biases. (Perhaps the subset of binders that behave more sensitively in this tool will also prove meaningful.) +
+ +
+ ![tendency](./aleclSequence1.png){ width="1000" } +
Sensevity to mutations on alec1 binder sequence, low confidence at start is questionable.
+
+ + +## Addressing Potential Biases +
+As for the first issue, we have some ideas. Generally, a high-confidence predicted structure is assumed to be correct, and we often rely on confidence metrics to rank it. But the mutation profile suggests there could be biases coming from both the encoders (MPNN) and the structure prediction models (AF2 and ESM), especially in workflows where sequences are refined, partially diffused, or something similar. + +We also tested some mutated sequences on ColabFold-MMSeq and ESM-Fold, comparing their performance with the “surgey” analysis tool (without MSA). The result was striking—25 residues mutated to alanine. +
+ +
+ ![tendency](./comp1.png){ width="1000" } +
x.rustamov-m_18_41 and vs elian.elian3 share 83% sequence similarity. Even though most of the mutations are not in the binding site, it is interesting to see that they still do not bind to the target—despite the predicted structure being very similar. This observation highlights how poorly our models can perform during the refining steps.
+
+ + +
+ ![tendency](./comp2.png){ width="1000" } +
briannaughton.2_plus_1 sequence as a much bigger structure, struture prediction on 20 mutants profile, A) surgey tool B) AF2(Colab-Fold) C) ESMFold(Colab-Fold)
+
+ + +## Potential Solutions +
+A potential solution might come from older, physics-based tools—perhaps not for binding but for structure itself. The idea could be as grand as introducing a completely new tool, or as straightforward as using existing tools to evaluate binders before wet-lab tests. + +For instance, we can look at the correlation between structure and sequence using physics-based methods. One approach is to run Robetta domain-based predictions and compare them to AF2 predictions to see how much bias might be affecting the predicted model. We could also consider implementing a PyRosetta model with the surgey tool to assess structure accuracy. In principle, we can evaluate ddG from the sequence and structure separately to see if they are compatible. +
+ +## Concluding Thoughts +
+Finally, it might be helpful to employ complementary physics-based tools to evaluate the structure on its own. As discussed, the trend of a sequence and its predicted structure should align. There are some well-established tools that predict thermodynamic parameters directly from sequence and structure, showing good efficiency back in the era before ML-based methods. + +Now might be the right time to dust off those tools—or even train and fine-tune current models—by incorporating these classical parameters. +
\ No newline at end of file diff --git a/docs/Texts/0/aleclSequence1.png b/docs/Texts/0/aleclSequence1.png new file mode 100644 index 00000000..2cc0e2e3 Binary files /dev/null and b/docs/Texts/0/aleclSequence1.png differ diff --git a/docs/Texts/0/comp0.png b/docs/Texts/0/comp0.png new file mode 100644 index 00000000..d40b1b58 Binary files /dev/null and b/docs/Texts/0/comp0.png differ diff --git a/docs/Texts/0/comp1.png b/docs/Texts/0/comp1.png new file mode 100644 index 00000000..7301b58d Binary files /dev/null and b/docs/Texts/0/comp1.png differ diff --git a/docs/Texts/0/comp2.png b/docs/Texts/0/comp2.png new file mode 100644 index 00000000..80f11942 Binary files /dev/null and b/docs/Texts/0/comp2.png differ