Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Open
Toverkwark opened this issue Feb 20, 2024 · 6 comments
Open

Issue with Cb atoms of threonines, perhaps other amino-acids? #4

Toverkwark opened this issue Feb 20, 2024 · 6 comments

Comments

@Toverkwark
Copy link

Toverkwark commented Feb 20, 2024

  • PIGT version:
  • Python version:3.9
  • Operating System:Debian 10

Description

If I run any prediction and load the resulting pdb into Maestro instead of pymol, the import doesn't recognize a full protein out of the pdbs. I think there are two problems that I've been able to sofar identify that are recurrent independent of what I model:
-The pdbs containing multiple models have the individual models not properly wrapped by MODEL and ENDMDL statements.
-The reason maestro seems to be unable to recognize a full protein seems to focus on particular amino acids, mostly those containing Cb substitutions, such as Threonine and Isoleucine and it occurred to me that at least for the threonines, all Cbs are modelled almost on top of the Cas. This probably means Maestro doesn't accept these to be bonded and therefore doesn't recognize the whole things as a contiguous protein.

What I Did

I simply ran the example command on the readme on github using any existing pdb as input as well as template.

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

@abeebyekeen
Copy link

abeebyekeen commented Feb 21, 2024

  • Python version: 3.9.18
  • Linux: Red Hat 4.8.5-44, 3.10.0-1160.88.1.el7.x86_64
  • Cuda: 11.8

I have also noticed this not just on Maestro but also PyMOL.
PyMOL: PyMOL versions 2.5.5, 2.40, and 1.8.2.0
Maestro: Maestro Version 12.5.139, MMshare Version 5.1.139, Release 2020-3, Platform Linux-x86_64

My observations:

  • Specific case: I was testing it with the protein-ligand complex PDB ID 4jrg. I separated the ligand from the protein and attempted to re-predict the complex.
  • When neuralplexer-inference is run with a template structure using the flags --use-template --input-template, the output structures, at first glance, look good on PyMOL when shown as cartoons. But when the structures are are shown as sticks, it becomes clear that atoms of Thr, Leu, Ile, and sometimes Arg, are not modelled correctly.
    Command:
$ neuralplexer-inference --task=batched_structure_sampling \
                                        --input-receptor input/protein.pdb \
                                        --input-ligand input/lig.sdf \
                                        --use-template \
                                        --input-template input/protein.pdb \
                                        --out-path output \
                                        --model-checkpoint ${nplx_data_dir}/complex_structure_prediction.ckpt \
                                        --n-samples 16 \
                                        --chunk-size 4 \
                                        --num-steps=40 \
                                        --cuda \
                                        --sampler=langevin_simulated_annealing
  • When neuralplexer-inference is run with no template structure, atoms of the residues, especially Thr, are not modelled correctly on PyMOL even when shown as cartoons.
    Command:
$ neuralplexer-inference --task=batched_structure_sampling \
                                        --input-receptor input/protein.pdb \
                                        --input-ligand input/lig.sdf \
                                        --out-path output \
                                        --model-checkpoint ${nplx_data_dir}/complex_structure_prediction.ckpt \
                                        --n-samples 16 \
                                        --chunk-size 4 \
                                        --num-steps=40 \
                                        --cuda \
                                        --sampler=langevin_simulated_annealing
  • In both cases described above, when the structures are visualized on Maestro, the defects in the residues are obvious.
  • Maybe the template or no template doesn't really matter. I just wanted to mention it in case it helps.
  • I ran each case twice and the problem persisted.
  • I'm attaching the with- and with-no- prot_all.pdb output files.

outfiles.zip

@danny305
Copy link

I reviewed your output files and it looks like the model collapses Ca and Cb atoms for these amino acids, which prevents PyMol from properly visualizing them. Not sure if this is a typo in the code or in the actual output of the model.

Really interested in what the authors have to say about this.

@zrqiao
Copy link
Owner

zrqiao commented Feb 22, 2024

Thanks for catching the issue. I confirm this observation in the attached .pdb files, while in the prediction results we generated in March 2023 (which are deposited to zenodo) as part of the original study, we do not observe this behavior.

This behavior might be introduced by version changes of the dependency libraries or typos within this repo; investigations on potential causes are very much appreciated.

In the mean time, would it suffice if a post-processing step is implemented to correct the Cb atom positions for these residues, based on the ideal backbone geometry of standard AAs? Such hotfixes are also highly welcome.

@Toverkwark
Copy link
Author

Not really sure how to create a pull request but the MODEL/ENDMDL wrap issue is simply solved by uncommenting line 348 in af_common/protein.py

@amorehead
Copy link
Collaborator

@Toverkwark, thanks for the heads up. I've made a PR on your behalf at #7.

@iungyu-snu
Copy link

@abeebyekeen
--use-template
--input-template input/protein.pdb
Even if I add this, when I look at the prot_all file with pymol, some amino acids are missing. What the heck is the problem???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants