HDXRank is an open-source pipeline to apply HDX-MS restraints to protein-protein complex prediction ranking.
Integrating sparse experimental data into protein complex modeling workflows can significantly improve the accuracy and reliability of predicted models. Despite the valuable insights that hydrogen-deuterium exchange (HDX) data provide about protein binding interfaces, there is currently no standard protocol for incorporating this information into the complex model selection process. Inspired by advances in graph-based deep learning for protein representation, we utilize it as a backbone for a flexible scoring framework in protein-protein complex model ranking based on their alignment with experimental HDX profiles. It offers a robust, HDX-informed selection protocol with improved prediction accuracy.clone the repository and Use the install.sh
file to create a Conda environment with all necessary dependencies:
git clone https://github.com/SuperChrisW/HDXRank.git
cd HDXRank
chmod +x ./install.sh
./install.sh
conda activate HDXRank
- Obtain HDX-MS dataset and examples from Zenodo: 10.5281/zenodo.14625492, unzip and move to the HDXRank root directory.
- Install Hhblits to get MSA file, temporarily refers to AI-HDX document(https://github.com/Environmentalpublichealth/AI-HDX/blob/main/Documentations/MSA_embedding.md)
HDXRank requires three input files:
- Protein structure file (
.pdb
) - MSA file (
.hhm
) - HDX-MS file (
.xlsx
)
Additionally, HDXRank uses a settings file (.xml
) to control the pipeline.
We offers examples for ranking docking and AF predictions in folder example
, as shown in our paper.
Users can run HDXRank with any .xml
file under subfolder in folder example
to get HDX predictions, such as:
python main.py -input ./example/1UGH_docking/BatchTable_1UGH.xml
- Protein embedding: HDXRank extracts embeddings from
.pdb
and.hhm
files. - Protein graph construction: Constructs a protein graph from the
.pdb
file. - Peptide graph splitting: Splits the protein graph into peptide graphs based on the provided HDX-MS
.xlsx
file.
With all input files prepared, run the following command to start the pipeline:
python main.py -input input.xml
HDXRank model was trained upon a curated HDX-MS dataset collected from public database PRIDE and MassIVE, up to March 2024. New HDX-MS data can be merged with the current dataset and used for re-train our model.
To merge the newly collected data to dataset:
- copy HDX-MS file into
dataset/HDX_files/
: the table should contain columnsprotein
state
start
end
sequence
log_t
RFU
. - update record file
dataset/250110_HDXRank_dataset.xlsx
: record allprotein+state
pairs and corresponding structures. - run HDXRank to generate embedding and peptide graphs:
python main.py -input ./dataset/BatchTable_setting.xml
To re-train the HDXRank model:
python HDXRank_train.py -input ./dataset/BatchTable_setting.xml -save ./Model
if you use HDXRank, please cite the following paper: [submit for publication]