Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chiral centers more than one bond away from reaction centers #20

Open
ljn917 opened this issue Jul 30, 2020 · 1 comment
Open

Chiral centers more than one bond away from reaction centers #20

ljn917 opened this issue Jul 30, 2020 · 1 comment

Comments

@ljn917
Copy link
Contributor

ljn917 commented Jul 30, 2020

This testing was done with commit 246e171 (current master head), python 3.8 and rdkit 2020.03.4 (conda-forge).

Below is a reaction from USPTO, PatentNumber US03956392. The current code may produce inconsistent and incorrect template.
ReactionSmiles is [C:1]([CH:4]1[CH:9]([CH3:10])[CH:8]=[CH:7][CH2:6][C:5]1([CH3:12])[CH3:11])(=[O:3])[CH3:2]>C(O)C.[Pd]>[C:1]([C@H:4]1[C@@H:9]([CH3:10])[CH2:8][CH2:7][CH2:6][C:5]1([CH3:11])[CH3:12])(=[O:3])[CH3:2]
uspto

The current code gives forward template [C:1]-[CH;D2;+0:2]=[CH;D2;+0:3]-[CH;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]>>[C:1]-[CH2;D2;+0:2]-[CH2;D2;+0:3]-[C@H;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]
smarts_current

I believe both atom 4 and 9 (in the original reaction) should be included, so the expected forward template should be: [C;D1;H3:1]-[C;H1;D3;+0:2]1-[C;H1;D2;+0:3]=[C;H1;D2;+0:4]-[C:5]-[C:6]-[C;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]>>[C;D1;H3:1]-[C@;H1;D3;+0:2]1-[C;H2;D2;+0:3]-[C;H2;D2;+0:4]-[C:5]-[C:6]-[C@;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]
smarts_expected

The reason for the current behavior is the following. The strategy looking for chiral centers adjacent to reaction centers depends on the order of atoms when the distance is greater than one bond. In this case, for example, atoms with mapping number 7 and 8 are reaction centers in the original reaction, and atoms with mapping number 4 and 9 are related chiral centers. If tetra_atoms has [(4, ...), (9, ..)], atom 4 will be discarded because atom 9 is not seen yet (this is the actual situation); otherwise if tetra_atoms contains [(9, ...), (4, ..)], atom 4 will be included because atom 9 is included before. This makes the output SMARTS depends on the order of atoms in RDKit data structure and cause inconsistent behavior.

The fix will be to use BFS/DFS to search the neighbors of reaction centers and the neighbors of all related chiral centers. One quick and dirty workaround is simply adding for i in range(len(tetra_atoms)): before line 174. It effectively changes the search to BFS though with worse time complexity.

@thomasstruble
Copy link
Contributor

This is an interesting case. This is not a reaction that is setting any stereocenters but the recorded product includes it because it is showing the relative stereochemistry since the syn isomer was isolated at 90% and the trans in 10%. But this ratio is inherited by the starting material since they followed the prep (K. S. Ayyar Chem. Comm. 1973, 161). Will have to look closer at the extraction to see and try to find cases where distal stereocenters are set from the reactive center.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants