What is the significance of the numbering system for the recommended mutations in the python bin/recommend.py output? #40
-
For example, in:
What does the 1, 2, 3, 4, and 6 mean? And why are some of the numbers more repeated than others? Are these recommended mutations to apply together to the protein I want to "evolve"? Or are they to be applied all at once? I could not find an answer in the paper - my apologies if it is there and I'm not seeing it. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I'd like an answer to this, as well. I've looked through the code and in the amis.py script, I found the deep mutational scanning function. Deep Mutational Scanning (deep_mutational_scan function): This function prints the results of scanning mutations across a protein sequence. For each position in the sequence, it predicts how different mutations (amino acid substitutions) at that position could affect the protein based on the model's predictions. The specific print statements within this function output the position (pos), the mutation (mt), and the value (val), which likely represents the predicted impact or score of that mutation. The output format for each mutation would be pos mt val, providing detailed insights into how each possible mutation could affect the protein's function or stability. So, maybe some sort of prioritization? Since they output it in large->small numbers, I would assume the ones with higher numbers are "better" in some way. But I'd prefer an answer from the devs. What should we do with these numbers? |
Beta Was this translation helpful? Give feedback.
This is documented in the README: "the script will output a list of substitutions and the number of recommending language models."
The number indicates the count of language models for which the corresponding mutation has higher LM likelihood than wildtype. We use an ensemble of six language models, which is why the number is out of 6. We use these counts to prioritize mutations that have a consensus across multiple language models, as described in the methods of the paper.
Hope that helps!