You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This testing was done with commit 246e171 (current master head), python 3.8 and rdkit 2020.03.4 (conda-forge).
Below is a reaction from USPTO, PatentNumber US03956392. The current code may produce inconsistent and incorrect template.
ReactionSmiles is [C:1]([CH:4]1[CH:9]([CH3:10])[CH:8]=[CH:7][CH2:6][C:5]1([CH3:12])[CH3:11])(=[O:3])[CH3:2]>C(O)C.[Pd]>[C:1]([C@H:4]1[C@@H:9]([CH3:10])[CH2:8][CH2:7][CH2:6][C:5]1([CH3:11])[CH3:12])(=[O:3])[CH3:2]
The current code gives forward template [C:1]-[CH;D2;+0:2]=[CH;D2;+0:3]-[CH;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]>>[C:1]-[CH2;D2;+0:2]-[CH2;D2;+0:3]-[C@H;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]
I believe both atom 4 and 9 (in the original reaction) should be included, so the expected forward template should be: [C;D1;H3:1]-[C;H1;D3;+0:2]1-[C;H1;D2;+0:3]=[C;H1;D2;+0:4]-[C:5]-[C:6]-[C;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]>>[C;D1;H3:1]-[C@;H1;D3;+0:2]1-[C;H2;D2;+0:3]-[C;H2;D2;+0:4]-[C:5]-[C:6]-[C@;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]
The reason for the current behavior is the following. The strategy looking for chiral centers adjacent to reaction centers depends on the order of atoms when the distance is greater than one bond. In this case, for example, atoms with mapping number 7 and 8 are reaction centers in the original reaction, and atoms with mapping number 4 and 9 are related chiral centers. If tetra_atoms has [(4, ...), (9, ..)], atom 4 will be discarded because atom 9 is not seen yet (this is the actual situation); otherwise if tetra_atoms contains [(9, ...), (4, ..)], atom 4 will be included because atom 9 is included before. This makes the output SMARTS depends on the order of atoms in RDKit data structure and cause inconsistent behavior.
The fix will be to use BFS/DFS to search the neighbors of reaction centers and the neighbors of all related chiral centers. One quick and dirty workaround is simply adding for i in range(len(tetra_atoms)): before line 174. It effectively changes the search to BFS though with worse time complexity.
The text was updated successfully, but these errors were encountered:
This is an interesting case. This is not a reaction that is setting any stereocenters but the recorded product includes it because it is showing the relative stereochemistry since the syn isomer was isolated at 90% and the trans in 10%. But this ratio is inherited by the starting material since they followed the prep (K. S. Ayyar Chem. Comm. 1973, 161). Will have to look closer at the extraction to see and try to find cases where distal stereocenters are set from the reactive center.
This testing was done with commit 246e171 (current master head), python 3.8 and rdkit 2020.03.4 (conda-forge).
Below is a reaction from USPTO, PatentNumber US03956392. The current code may produce inconsistent and incorrect template.
ReactionSmiles is
[C:1]([CH:4]1[CH:9]([CH3:10])[CH:8]=[CH:7][CH2:6][C:5]1([CH3:12])[CH3:11])(=[O:3])[CH3:2]>C(O)C.[Pd]>[C:1]([C@H:4]1[C@@H:9]([CH3:10])[CH2:8][CH2:7][CH2:6][C:5]1([CH3:11])[CH3:12])(=[O:3])[CH3:2]
The current code gives forward template
[C:1]-[CH;D2;+0:2]=[CH;D2;+0:3]-[CH;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]>>[C:1]-[CH2;D2;+0:2]-[CH2;D2;+0:3]-[C@H;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]
I believe both atom 4 and 9 (in the original reaction) should be included, so the expected forward template should be:
[C;D1;H3:1]-[C;H1;D3;+0:2]1-[C;H1;D2;+0:3]=[C;H1;D2;+0:4]-[C:5]-[C:6]-[C;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]>>[C;D1;H3:1]-[C@;H1;D3;+0:2]1-[C;H2;D2;+0:3]-[C;H2;D2;+0:4]-[C:5]-[C:6]-[C@;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]
The reason for the current behavior is the following. The strategy looking for chiral centers adjacent to reaction centers depends on the order of atoms when the distance is greater than one bond. In this case, for example, atoms with mapping number 7 and 8 are reaction centers in the original reaction, and atoms with mapping number 4 and 9 are related chiral centers. If
tetra_atoms
has [(4, ...), (9, ..)], atom 4 will be discarded because atom 9 is not seen yet (this is the actual situation); otherwise iftetra_atoms
contains [(9, ...), (4, ..)], atom 4 will be included because atom 9 is included before. This makes the output SMARTS depends on the order of atoms in RDKit data structure and cause inconsistent behavior.The fix will be to use BFS/DFS to search the neighbors of reaction centers and the neighbors of all related chiral centers. One quick and dirty workaround is simply adding
for i in range(len(tetra_atoms)):
before line 174. It effectively changes the search to BFS though with worse time complexity.The text was updated successfully, but these errors were encountered: