DProQ, is a Gated-Graph Transformer model for end-to-end protein complex structure's quality evaluation. DProQ achieves significant speed-ups and better quality compared to current baseline method. If you have any questions or suggestions, please contact us by xcbh6@umsystem.edu . We are happy to help!
If you think our work is helpful, please cite our work by:
@article {Chen2022.05.19.492741,
author = {Chen, Xiao and Morehead, Alex and Liu, Jian and Cheng, Jianlin},
title = {DProQ: A Gated-Graph Transformer for Protein Complex Structure Assessment},
elocation-id = {2022.05.19.492741},
year = {2022},
doi = {10.1101/2022.05.19.492741},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/05/20/2022.05.19.492741},
eprint = {https://www.biorxiv.org/content/early/2022/05/20/2022.05.19.492741.full.pdf},
journal = {bioRxiv}
}
We provide our benchmark tests HAF2 and DBM55-AF2 for download by:
wget https://zenodo.org/record/6569837/files/DproQ_benchmark.tgz
Each dataset contains:
decoy
folder: decoys filesnative
folder: native structure fileslabel_info.csv
: DockQ scores and CAPRI class label
-
Download this repository
git clone https://github.com/BioinfoMachineLearning/DProQ.git
-
Set up conda environment locally
cd DProQ conda env create --name DProQ -f environment.yml
-
Activate conda environment
conda activate DPRoQ
Here is the inference.py script parameters' introduction.
python inference.py
-c --complex_folder Raw protien complex complex_folder
-w --work_dir Working directory to save all intermedia files and folders, it will created if it is not exits
-r --result_folder Result folder to save two ranking results, it will created if it is not exits
-r --threads Number of threads for parallel feature generation and dataloader, default=10
-s --delete_tmp Set False to save work_dir and intermedia files, otherwise set True, default=False
DProQ requires GPU. We provide few protein complexes in example
folder for test. The evaluation result Ranking.csv is stored in result_folder.
python ./inference.py -c ./examples/6AL0/ -w ./examples/work/ -r ./examples/result
You can build you onw dataset for evaluation, the data folder should look like:
customer_data_folder
├── decoy_1.pdb
├── decoy_2.pdb
├── decoy_3.pdb
├── decoy_4.pdb
└── decoy_5.pdb
Following four tables show DProQ's consistent best result on HAF2 and DBM55-AF2 test sets in terms of hit rate and ranking loss. The best result is highlighted on bold.
Table 1: Hit rate performance on the HAF2 dataset. The BEST column represents each target’s best-possible Top-10 result. The SUMMARY row lists the results when all targets are taken into consideration.
ID | DPROQ | DPROQ_GT | DPROQ_GTE | DPROQ_GTN | GNN_DOVE | BEST |
---|---|---|---|---|---|---|
7AOH | 10/10/10 | 10/10/10 | 10/10/10 | 10/10/10 | 9/9/0 | 10/10/10 |
7D7F | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 5/0/0 |
7AMV | 10/10/10 | 10/10/10 | 10/10/10 | 10/10/10 | 10/10/6 | 10/10/10 |
7OEL | 10/10/0 | 10/9/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 |
7O28 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 |
7ALA | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 1/0/0 |
7MRW | 5/4/0 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 10/10/0 |
7OZN | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 10/2/0 |
7D3Y | 2/0/0 | 5/0/0 | 6/0/0 | 8/0/0 | 0/0/0 | 10/0/0 |
7NKZ | 10/10/2 | 10/10/1 | 10/10/1 | 10/010/4 | 10/9/9 | 10/10/10 |
7LXT | 1/1/0 | 0/0/0 | 0/0/0 | 0/0/0 | 1/0/0 | 10/10/0 |
7KBR | 10/10/10 | 10/10/10 | 10/10/10 | 10/10/10 | 10/10/9 | 10/10/10 |
7O27 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/4/0 | 10/10/0 |
SUMMARY | 10/9/4 | 8/7/4 | 8/7/4 | 8/7/4 | 8/7/3 | 13/10/4 |
Table 2: Ranking loss performance on the HAF2 dataset. The BEST row represents the mean and standard deviation of the ranking losses for all targets.
Target | DPROQ | DProQ_GT | DPROQ_GTE | DPROQ_GTN | GNN_DOVE |
---|---|---|---|---|---|
7AOH | 0.066 | 0.026 | 0.026 | 0.058 | 0.928 |
7D7F | 0.471 | 0.471 | 0.47 | 0.471 | 0.003 |
7AMV | 0.01 | 0.021 | 0.017 | 0.019 | 0.342 |
7OEL | 0.062 | 0.063 | 0.135 | 0.135 | 0.21 |
7O28 | 0.029 | 0.021 | 0.027 | 0.034 | 0.244 |
7ALA | 0.232 | 0.226 | 0.226 | 0.226 | 0.226 |
7MRW | 0.085 | 0.603 | 0.555 | 0.555 | 0.598 |
7OZN | 0.409 | 0.409 | 0.49 | 0.281 | 0.457 |
7D3Y | 0.326 | 0.33 | 0.012 | 0.326 | 0.295 |
7NKZ | 0.164 | 0.175 | 0.175 | 0.164 | 0.459 |
7LXT | 0.586 | 0.586 | 0.586 | 0.586 | 0.295 |
7KBR | 0.068 | 0.152 | 0.152 | 0.17 | 0.068 |
7O27 | 0.03 | 0.079 | 0.079 | 0.079 | 0.334 |
BEST | 0.195 ± 0.185 | 0.243 ± 0.206 | 0.227 ± 0.21 | 0.239 ± 0.187 | 0.343 ± 0.228 |
Table 3: Hit rate performance on DBM55-AF2 dataset. The BEST column represents each target’s best-possible Top-10 result. The SUMMARY row lists the results when all targets are taken into consideration.
Target | DPROQ | DPROQ_GT | DPROQ_GTE | DPROQ_GTN | GNN_DOVE | BEST |
---|---|---|---|---|---|---|
6AL0 | 9/2/0 | 10/0/0 | 10/0/0 | 10/2/0 | 6/0/0 | 10/2/0 |
3SE8 | 8/8/0 | 9/9/0 | 8/8/0 | 8/8/0 | 3/0/0 | 10/10/0 |
5GRJ | 10/10/0 | 9/9/0 | 10/10/0 | 9/9/0 | 3/2/0 | 10/10/0 |
6A77 | 7/7/0 | 7/7/0 | 8/8/0 | 8/8/0 | 0/0/0 | 8/8/0 |
4M5Z | 10/10/1 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/1 |
4ETQ | 1/1/0 | 1/1/0 | 1/1/0 | 1/1/0 | 0/0/0 | 1/1/0 |
5CBA | 10/10/1 | 10/10/0 | 10/10/0 | 10/10/1 | 10/10/3 | 10/10/6 |
5WK3 | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 1/0/0 | 3/0/0 |
5Y9J | 4/0/0 | 6/0/0 | 5/0/0 | 4/0/0 | 0/0/0 | 8/0/0 |
6BOS | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 | 10/10/0 |
5HGG | 8/0/0 | 8/0/0 | 8/0/0 | 8/0/0 | 8/0/0 | 10/0/0 |
6A0Z | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 2/0/0 | 3/0/0 |
3U7Y | 2/2/1 | 2/2/1 | 2/2/1 | 2/1/0 | 2/2/1 | 2/2/1 |
3WD5 | 10/8/0 | 9/8/0 | 9/8/0 | 9/8/0 | 0/0/0 | 10/10/0 |
5KOV | 0/0/0 | 0/0/0 | 0/0/0 | 0/0/0 | 1/0/0 | 2/0/0 |
SUMMARY | 12/10/3 | 12/9/1 | 12/9/1 | 12/10/1 | 10/4/1 | 15/10/3 |
Table 4: Ranking loss performance on the DBM55-AF2 dataset. The BEST row represents the mean and standard deviation of the ranking losses for all targets.
Target | DPROQ | DPROQ_GT | DPROQ_GTE | DPROQ_GTN | GNN_DOVE |
---|---|---|---|---|---|
6AL0 | 0.0 | 0.156 | 0.156 | 0.0 | 0.424 |
3SE8 | 0.079 | 0.041 | 0.041 | 0.079 | 0.735 |
5GRJ | 0.024 | 0.012 | 0.095 | 0.012 | 0.776 |
6A77 | 0.037 | 0.062 | 0.0 | 0.037 | 0.591 |
4M5Z | 0.015 | 0.026 | 0.026 | 0.015 | 0.221 |
4ETQ | 0.0 | 0.76 | 0.0 | 0.748 | 0.759 |
5CBA | 0.052 | 0.038 | 0.052 | 0.058 | 0.019 |
5WK3 | 0.114 | 0.114 | 0.114 | 0.186 | 0.087 |
5Y9J | 0.0 | 0.0 | 0.0 | 0.0 | 0.382 |
6BOS | 0.081 | 0.081 | 0.0 | 0.0 | 0.081 |
5HGG | 0.051 | 0.051 | 0.121 | 0.051 | 0.121 |
6A0Z | 0.207 | 0.207 | 0.207 | 0.207 | 0.062 |
3U7Y | 0.0 | 0.021 | 0.0 | 0.0 | 0.756 |
3WD5 | 0.011 | 0.011 | 0.011 | 0.0 | 0.672 |
5KOV | 0.065 | 0.08 | 0.085 | 0.087 | 0.0 |
BEST | 0.049 ± 0.054 | 0.111 ± 0.182 | 0.061 ± 0.064 | 0.099 ± 0.185 | 0.379 ± 0.298 |