Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDKitToolkitWrapper, using rdkit=2024.09.3, writes some charged atoms to MOL/SDF without explicit charge #1988

Open
j-wags opened this issue Dec 16, 2024 · 0 comments

Comments

@j-wags
Copy link
Member

j-wags commented Dec 16, 2024

Describe the bug

One very interesting thing I found while debugging this report is that, using the most recent version of RDKit, our RDKitToolkitWrapper no longer writes MOL blocks with formal charges assigned to (some?) positive nitrogens. It's likely related to this change (which I'm working to understand, so not clear this is a bug at all, and if it is, it's not clear which package it's in).

A simple example of this new behavior is:

from openff.toolkit import Molecule
temp = Molecule.from_smiles('[N+1]([H])([H])([H])([H])')
temp.to_file('temp.sdf', file_format='sdf')

Using rdkit 2024.09.3 this yields:

     RDKit          2D

  5  4  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    1.5000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0000   -1.5000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  1  0
  1  4  1  0
  1  5  1  0
M  CHG  1   1   1
M  END
$$$$

But using rdkit 2024.09.2 we get

     RDKit          2D

  5  4  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 N   0  0  0  0  0  4  0  0  0  0  0  0
    1.5000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    1.5000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0000   -1.5000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  1  0
  1  4  1  0
  1  5  1  0
M  CHG  1   1   1
M  END
$$$$

Note the difference in charge flag (0 vs 4) in the first atom line.

The new behavior IS self-consistent - that is, our RDKitToolkitWrapper with rdkit 2024.09.3 can read the SDF that it wrote and get the same mol graph as it had originally:

from openff.toolkit import Molecule
temp = Molecule.from_smiles('[N+1]([H])([H])([H])([H])')
temp.to_file('temp.sdf', file_format='sdf')
temp2 = Molecule.from_file('temp.sdf')
print(temp2.to_smiles())

correctly yields [H][N+]([H])([H])[H]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant