Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Toverkwark · 2024-02-20T11:21:02Z

PIGT version:
Python version:3.9
Operating System:Debian 10

Description

If I run any prediction and load the resulting pdb into Maestro instead of pymol, the import doesn't recognize a full protein out of the pdbs. I think there are two problems that I've been able to sofar identify that are recurrent independent of what I model:
-The pdbs containing multiple models have the individual models not properly wrapped by MODEL and ENDMDL statements.
-The reason maestro seems to be unable to recognize a full protein seems to focus on particular amino acids, mostly those containing Cb substitutions, such as Threonine and Isoleucine and it occurred to me that at least for the threonines, all Cbs are modelled almost on top of the Cas. This probably means Maestro doesn't accept these to be bonded and therefore doesn't recognize the whole things as a contiguous protein.

What I Did

I simply ran the example command on the readme on github using any existing pdb as input as well as template.

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

The text was updated successfully, but these errors were encountered:

abeebyekeen · 2024-02-21T07:04:29Z

Python version: 3.9.18
Linux: Red Hat 4.8.5-44, 3.10.0-1160.88.1.el7.x86_64
Cuda: 11.8

I have also noticed this not just on Maestro but also PyMOL.
PyMOL: PyMOL versions 2.5.5, 2.40, and 1.8.2.0
Maestro: Maestro Version 12.5.139, MMshare Version 5.1.139, Release 2020-3, Platform Linux-x86_64

My observations:

Specific case: I was testing it with the protein-ligand complex PDB ID 4jrg. I separated the ligand from the protein and attempted to re-predict the complex.
When neuralplexer-inference is run with a template structure using the flags --use-template --input-template, the output structures, at first glance, look good on PyMOL when shown as cartoons. But when the structures are are shown as sticks, it becomes clear that atoms of Thr, Leu, Ile, and sometimes Arg, are not modelled correctly.
Command:

$ neuralplexer-inference --task=batched_structure_sampling \
                                        --input-receptor input/protein.pdb \
                                        --input-ligand input/lig.sdf \
                                        --use-template \
                                        --input-template input/protein.pdb \
                                        --out-path output \
                                        --model-checkpoint ${nplx_data_dir}/complex_structure_prediction.ckpt \
                                        --n-samples 16 \
                                        --chunk-size 4 \
                                        --num-steps=40 \
                                        --cuda \
                                        --sampler=langevin_simulated_annealing

When neuralplexer-inference is run with no template structure, atoms of the residues, especially Thr, are not modelled correctly on PyMOL even when shown as cartoons.
Command:

$ neuralplexer-inference --task=batched_structure_sampling \
                                        --input-receptor input/protein.pdb \
                                        --input-ligand input/lig.sdf \
                                        --out-path output \
                                        --model-checkpoint ${nplx_data_dir}/complex_structure_prediction.ckpt \
                                        --n-samples 16 \
                                        --chunk-size 4 \
                                        --num-steps=40 \
                                        --cuda \
                                        --sampler=langevin_simulated_annealing

In both cases described above, when the structures are visualized on Maestro, the defects in the residues are obvious.
Maybe the template or no template doesn't really matter. I just wanted to mention it in case it helps.
I ran each case twice and the problem persisted.
I'm attaching the with- and with-no- prot_all.pdb output files.

outfiles.zip

danny305 · 2024-02-22T03:09:36Z

I reviewed your output files and it looks like the model collapses Ca and Cb atoms for these amino acids, which prevents PyMol from properly visualizing them. Not sure if this is a typo in the code or in the actual output of the model.

Really interested in what the authors have to say about this.

zrqiao · 2024-02-22T04:57:24Z

Thanks for catching the issue. I confirm this observation in the attached .pdb files, while in the prediction results we generated in March 2023 (which are deposited to zenodo) as part of the original study, we do not observe this behavior.

This behavior might be introduced by version changes of the dependency libraries or typos within this repo; investigations on potential causes are very much appreciated.

In the mean time, would it suffice if a post-processing step is implemented to correct the Cb atom positions for these residues, based on the ideal backbone geometry of standard AAs? Such hotfixes are also highly welcome.

Toverkwark · 2024-02-22T11:53:23Z

Not really sure how to create a pull request but the MODEL/ENDMDL wrap issue is simply solved by uncommenting line 348 in af_common/protein.py

amorehead · 2024-02-24T03:29:54Z

@Toverkwark, thanks for the heads up. I've made a PR on your behalf at #7.

iungyu-snu · 2024-03-08T11:26:23Z

@abeebyekeen
--use-template
--input-template input/protein.pdb
Even if I add this, when I look at the prot_all file with pymol, some amino acids are missing. What the heck is the problem???

amorehead mentioned this issue Feb 24, 2024

Fix separation of PDB file models in protein.py #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Toverkwark commented Feb 20, 2024 •

edited

Loading

abeebyekeen commented Feb 21, 2024 •

edited

Loading

danny305 commented Feb 22, 2024

zrqiao commented Feb 22, 2024 •

edited

Loading

Toverkwark commented Feb 22, 2024

amorehead commented Feb 24, 2024

iungyu-snu commented Mar 8, 2024

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Comments

Toverkwark commented Feb 20, 2024 • edited Loading

Description

What I Did

abeebyekeen commented Feb 21, 2024 • edited Loading

danny305 commented Feb 22, 2024

zrqiao commented Feb 22, 2024 • edited Loading

Toverkwark commented Feb 22, 2024

amorehead commented Feb 24, 2024

iungyu-snu commented Mar 8, 2024

Toverkwark commented Feb 20, 2024 •

edited

Loading

abeebyekeen commented Feb 21, 2024 •

edited

Loading

zrqiao commented Feb 22, 2024 •

edited

Loading