Extended SMILES saved from Ketcher might be invalid for RDKit #1865

xuzuodong · 2022-11-23T06:13:57Z

Steps to Reproduce

Click "Open..." button
Click "PASTE FROM CLIPBOARD"
Input SMILES c1cccc(-c2ccc(Nc3cccc4c(=O)[nH]ccc34)nc2)c1
Save as Extended SMILES, and the result would be C1C=C(C2C=NC(Nc3c4c(c(ncc4)=O)ccc3)=CC=2)C=CC=1
Go to RDKit.js official website, in their online code demo, input and run:

var smiles = "C1C=C(C2C=NC(Nc3c4c(c(ncc4)=O)ccc3)=CC=2)C=CC=1"; // generated by Ketcher in step 4
var mol = RDKitModule.get_mol(smiles);
console.log(mol.is_valid())

Actual behavior

Value of mol.is_valid() in RDKit.js website is false.
Also, using RDKit to draw molecule image would fail.

Expected behavior
Smiles generated by Ketcher should all be valid to RDKit?

Ketcher version .
2.6.2

The text was updated successfully, but these errors were encountered:

AlexanderSavelyev · 2022-12-01T16:37:13Z

yes, aromatic bonds were not converted correctly for double bond Oxygen and aromaticity is kept as atom - which should not be a case (it should be converted to ":" bonds). It is suggested to un-aromatize such structures

paulsmirnov · 2022-12-16T20:06:20Z

@AlexanderSavelyev - I have a report from users that led me to this issue. Could you confirm that it is the same?

Load N#Cc1cn[nH]c1N, it will be saved as N#Cc1c(N)nnc1. This SMILES string is recognized by Biovia Draw and ChemDraw, but not RDKit 2020.09.5 (Python):

>>> m = Chem.MolFromSmiles('N#Cc1c(N)nnc1')
[19:51:56] Can't kekulize mol.  Unkekulized atoms: 2 3 5 6 7

paulsmirnov · 2023-01-04T14:50:17Z

A colleague helped me with the reasoning in terms of chemistry :)

It seems related to tautomers and aromaticity. The exported SMILES is ambiguous, and RDKit does not make extra assumptions in order to generate a valid structure, while it seems that Biovia Draw and ChemDraw do.
In our example, either nitrogen in the ring could have an H attached. N#Cc1cn[nH]c1N specifies its location, while N#Cc1c(N)nnc1 does not. When Ketcher processes the aromaticity of the ring, the location of the H is lost. RDKit does not restore the H, leading to an invalid structure. Similarly, in this GitHub issue, the original SMILES specifies a tautomeric [nH], but the information is lost when Ketcher processes the aromaticity of the molecule.

paulsmirnov · 2023-01-11T22:25:44Z

BTW, no need to use Extended SMILES, simple Daylight is enough (perhaps, it is good idea to correct the issue title).

With the OP's input:

load: c1cccc(-c2ccc(Nc3cccc4c(=O)[nH]ccc34)nc2)c1
save: c1cc(-c2cnc(Nc3c4c(c(ncc4)=O)ccc3)cc2)ccc1
ketcher warnings: Structure contains query properties of atoms and bonds that are not supported in the SMILES. Query properties will not be reflected in the file saved.
rdkit log: Can't kekulize mol. Unkekulized atoms: 8 9 10 12 13 14 16 17 18

AlexanderSavelyev · 2023-02-01T10:24:13Z

Need to switch to indigo for smiles generation

even1024 · 2023-03-16T07:21:27Z

The bug appears because interchange KET-format doesn't support explicit implicit hydrogens count which can be specified in bracketed SMILES atoms as a virtual hydrogens counter. Typically it's not an issue but there are special cases when the standard valence model fails to determine the number of suppressed hydrogens. For instance In the example above, N-atom is connected to aromatic ring, so the automatic hydrogen counting is not possible. To avoid the ambiguousness [nH] explicitly specifies the number of implicit hydrogens = 1 for the nitrogen atom. To fix the issue on the ketcher's side:

1) add implicitHCount field to the atom entity of the ket-format json schema:

    "ImplicitHCount": {
      "type": "integer",
      "enum": [0, 1, 2, 3, 4, 5]
    },

2) As ketcher has own parser/generator of MOL V2000, corresponding conversion of virtual hydrogens counter ImplicitHCount to the "chemaxon style" Data S-Group should be implemented. I.e. if a MOL V2000 file has a data group as below:

M STY 1 1 DAT
M SLB 1 1 1
M SAL 1 1 18
M SDT 1 MRV_IMPLICIT_H
M SDD 1 0.0000 0.0000 DA ALL 1 1
M SED 1 IMPL_H1

it should be converted to an atom's property implicitHCount and for generating of MOL V2000 the data S-Groups should be added basing on the implicitHCount value.

Some info about MRV_IMPLICIT_H data s-group:

http://www.scfbio-iitd.res.in/software/utility/marvin_new/marvin/help/FF/Chemaxon-specific-information-in-MDL-MOL-files_19693843.html

3) In editing mode when a heteroatom connects to an aromatic ring it's necessary to add a ImplicitHCount property to this atom to specify the number of hydrogens on it.

…2498) * #1865 Extended SMILES saved from Ketcher might be invalid for RDKit * #1865 fix conflicts * #1865 remove IMPLICIT_V for molfile generation

KonstantinEpam23 · 2023-04-20T18:48:16Z

Functionality for supporting implicit hydrogens for mol v2000 format will be implemented separately as part of #2500

xuzuodong added bug feature request labels Nov 23, 2022

xuzuodong assigned Nitvex Nov 23, 2022

AlexanderSavelyev added this to the Release Candidate 2.9.0-rc.1 milestone Jan 12, 2023

KonstantinEpam23 modified the milestones: Ketcher 2.9.0-rc.1, Ketcher 2.9.0-rc.4 Mar 11, 2023

Nitvex modified the milestones: Ketcher 2.9.0-rc.4, Ketcher 2.10.0-rc.1 Mar 15, 2023

even1024 mentioned this issue Mar 15, 2023

Keep implicit hydrogens information in KET-format epam/Indigo#1064

Closed

KonstantinEpam23 removed the feature request label Mar 15, 2023

This was referenced Apr 6, 2023

update KET json schema to support "explicit implicit" hydrogens #2456

Closed

Migrate to Indigo v1.11.0-rc.1 in-browser module #2458

Closed

KonstantinEpam23 assigned KonstantinEpam23 and unassigned Nitvex Apr 10, 2023

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 Extended SMILES saved from Ketcher might be invalid for RDKit

63eb500

KonstantinEpam23 mentioned this issue Apr 20, 2023

#1865 Extended SMILES saved from Ketcher might be invalid for RDKit #2498

Merged

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 refactor atom variable names

c902dea

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 remove IMPLICIT_V for molfile generation

ab6508b

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 Extended SMILES saved from Ketcher might be invalid for RDKit

4dee6d8

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 fix conflicts

8e872a0

KonstantinEpam23 added a commit that referenced this issue Apr 20, 2023

#1865 remove IMPLICIT_V for molfile generation

623482e

KonstantinEpam23 closed this as completed in #2498 Apr 20, 2023

KonstantinEpam23 mentioned this issue Apr 25, 2023

Settings: 'Terminal and Hetero' is not selected as default and 'on' option is not working #2245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended SMILES saved from Ketcher might be invalid for RDKit #1865

Extended SMILES saved from Ketcher might be invalid for RDKit #1865

xuzuodong commented Nov 23, 2022 •

edited

Loading

AlexanderSavelyev commented Dec 1, 2022

paulsmirnov commented Dec 16, 2022

paulsmirnov commented Jan 4, 2023

paulsmirnov commented Jan 11, 2023

AlexanderSavelyev commented Feb 1, 2023

even1024 commented Mar 16, 2023 •

edited

Loading

KonstantinEpam23 commented Apr 20, 2023

Extended SMILES saved from Ketcher might be invalid for RDKit #1865

Extended SMILES saved from Ketcher might be invalid for RDKit #1865

Comments

xuzuodong commented Nov 23, 2022 • edited Loading

AlexanderSavelyev commented Dec 1, 2022

paulsmirnov commented Dec 16, 2022

paulsmirnov commented Jan 4, 2023

paulsmirnov commented Jan 11, 2023

AlexanderSavelyev commented Feb 1, 2023

even1024 commented Mar 16, 2023 • edited Loading

KonstantinEpam23 commented Apr 20, 2023

xuzuodong commented Nov 23, 2022 •

edited

Loading

even1024 commented Mar 16, 2023 •

edited

Loading