Skip to content

applying HDX-MS restraints to protein-protein complex prediction ranking

License

Notifications You must be signed in to change notification settings

tsudalab/HDXRank

Repository files navigation

HDXRank DOI

HDXRank is an open-source pipeline to apply HDX-MS restraints to protein-protein complex prediction ranking.

Method overview:

Integrating sparse experimental data into protein complex modeling workflows can significantly improve the accuracy and reliability of predicted models. Despite the valuable insights that hydrogen-deuterium exchange (HDX) data provide about protein binding interfaces, there is currently no standard protocol for incorporating this information into the complex model selection process. Inspired by advances in graph-based deep learning for protein representation, we utilize it as a backbone for a flexible scoring framework in protein-protein complex model ranking based on their alignment with experimental HDX profiles. It offers a robust, HDX-informed selection protocol with improved prediction accuracy.

Installation:

clone the repository and Use the install.sh file to create a Conda environment with all necessary dependencies:

git clone https://github.com/SuperChrisW/HDXRank.git
cd HDXRank
chmod +x ./install.sh
./install.sh
conda activate HDXRank

Preparation

Getting Started

HDXRank requires three input files:

  1. Protein structure file (.pdb)
  2. MSA file (.hhm)
  3. HDX-MS file (.xlsx)

Additionally, HDXRank uses a settings file (.xml) to control the pipeline. We offers examples for ranking docking and AF predictions in folder example, as shown in our paper. Users can run HDXRank with any .xml file under subfolder in folder example to get HDX predictions, such as:

python main.py -input ./example/1UGH_docking/BatchTable_1UGH.xml

Workflow:

  1. Protein embedding: HDXRank extracts embeddings from .pdb and .hhm files.
  2. Protein graph construction: Constructs a protein graph from the .pdb file.
  3. Peptide graph splitting: Splits the protein graph into peptide graphs based on the provided HDX-MS .xlsx file.

Execution:

With all input files prepared, run the following command to start the pipeline:

python main.py -input input.xml

Merge data and Retrain the model:

HDXRank model was trained upon a curated HDX-MS dataset collected from public database PRIDE and MassIVE, up to March 2024. New HDX-MS data can be merged with the current dataset and used for re-train our model.

To merge the newly collected data to dataset:

  1. copy HDX-MS file into dataset/HDX_files/: the table should contain columns protein state start end sequence log_t RFU.
  2. update record file dataset/250110_HDXRank_dataset.xlsx: record all protein+state pairs and corresponding structures.
  3. run HDXRank to generate embedding and peptide graphs:
python main.py -input ./dataset/BatchTable_setting.xml

To re-train the HDXRank model:

python HDXRank_train.py -input ./dataset/BatchTable_setting.xml -save ./Model

Citing HDXRank

if you use HDXRank, please cite the following paper: [submit for publication]

About

applying HDX-MS restraints to protein-protein complex prediction ranking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published