Skip to content

This is the GitHub Repository accompanying the paper: "AlphaMut: a deep reinforcement learning model to suggest helix-disrupting mutations"

Notifications You must be signed in to change notification settings

prathithbhargav/AlphaMut

Repository files navigation

AlphaMut

This is the GitHub Repository accompanying the paper: "AlphaMut: a deep reinforcement learning model to suggest helix-disrupting mutations"

README

Running the Inference Model

To run the trained model(Helix-in-protein), you can run the colab notebook - 3_inference_of_Helix-in-protein_trained_model.ipynb. Instructions are provided in the colab notebook along with an illustrative example.

Training the Model

This code is meant to run the code for learning how to break helices using Reinforcement Learning. There are two models --- one that disrupts helices(Helix-only), and another that disrupts helices within a protein environment(Helix-in-protein).

Information on training is provided in the Jupyter Notebooks - 1_training_and_validation_only_helix.ipynb and 2_training_and_validation_with_protein.ipynb

Packages Required

It is advised to install all of the below packages in a conda environment(>= python 3.8). It is advised to use StableBaselines3 since it has standard ready-to-use implementations of RL Algorithms. StableBaselines3 also downloads Gymnasium, which is necessary for the reinforcement learning environment.

The following packages are required:

  • Biotite - this is to get protein structural embeddings (from P-SEA)1 to obtain the reward
  • Transformers - this is to get the ESMFold2 Model and the ESM embedding model.
  • biopandas - this is to read the initial pdb files.
  • StableBaselines3 for the RL algorithms.
  • BioVec - this is to embed the states. The states are described as protein sequences that are embedded in a 100 dimensional space using a pretrained model called ProtVec3. The other way to get the state is through the ESM-2 model, that gives us a 320 dimensional space. This is implemented in the file utils/encoder_decoder.py. The module that I use for this is biovec, implemented in this GitHub Repo. Please make sure to pay attention to this issue. If you're using the esm model, there should be no issues related to installation since esm is implemented in transformers. This package is required only if you're plannning to train the Helix-only model.

References

Footnotes

  1. P-SEA: a new efficient assignment of secondary structure from C alpha trace of proteins - PubMed (nih.gov)

  2. Evolutionary-scale prediction of atomic-level protein structure with a language model | Science

  3. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics | PLOS ONE

About

This is the GitHub Repository accompanying the paper: "AlphaMut: a deep reinforcement learning model to suggest helix-disrupting mutations"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published