-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdkit_molecule_from_smile needs an option to change the random seed for molecule embedding #445
Comments
Hi @victorl25, I have come up with an implementation of
The SMILES that you provided seems to be not valid rather than generating a molecule that fails the embedding. Would you be able to review and provide such example? |
Thanks for fixing! I just noticed that the smiles string was mangled by GitHub or somewhere else. The correct smiles string is 'CC(C)Cn1cnc2c(Nc3nc(NCc4ccc(CN5CCCC5=O)cc4)nc(NC@@HCc4ccc(Cl)cc4Cl)n3)nc3ccccc3c21'. |
Oops, it got mangled again, see it in the attached file. |
I'll implement that as a test. Thanks |
@victorl25 let me know if the library works now |
@ghiander it does work, I appreciate you putting it out there for a broader use! However, something strange is going on with AllChem.EmbedMolecule so that you know. In a multi-threaded scenario it can fail several times (I've seen up to 5) before succeeding. The same smiles string would get embedded just fine without the multi-threading or in a debug mode, go figure. Here is what I had to do:
|
Hey @victorl25, thanks for sharing this trick. I'd be interested to understand why this only happens when multithreading though. Would it be the case to open an issue in the RDKit Github? I am fairly sure it's not due to Jazzy as RDKit is used to preprocess input SMILES. |
I experimented a bit more and found that setting maxIterations to a high number like 100,000 is sufficient to avoid these embedding failures. I get iteration counts in 20-30K range for some random seeds and the same molecule. So RDKit already has a solution. |
Great. If you set maxIterations to 0, there will be no limit which is the default in RDKit. I'm closing this issue for now then. |
Setting maxIterations to 0 doesn't work, the default limit seems to be 1110. |
Hi @victorl25, what do you mean by that? The current default is zero: https://github.com/AstraZeneca/jazzy/blob/master/src/jazzy/core.py#L62 |
Try running the attached script. For me it fails with maxIterations=0 and succeeds with maxIterations > 15000. INITIAL_COORDS gives you the number of iterations it took to succeed or give up. |
Hi Victor, thanks for clarifying that. Then it's convenient to have the possibility to control maxIterations. I would have thought that the max was higher than 1110. |
rdkit_molecule_from_smile needs to vary the random seed for molecule embedding
Describe the bug
rdkit_molecule_from_smile occasionally fails to embed a molecule, however, the issue can be avoided by rerunning AllChem.EmbedMolecule with a different random seed. E.g. consider smiles 'CC(C)Cn1cnc2c(Nc3nc(NCc4ccc(CN5CCCC5=O)cc4)nc(NC@@HCc4ccc(Cl)cc4Cl)n3)nc3ccccc3c21' - embedding fails with the hardcoded random seed 11 but works with pretty much any other random seed.
To Reproduce
smiles = 'CC(C)Cn1cnc2c(Nc3nc(NCc4ccc(CN5CCCC5=O)cc4)nc(NC@@HCc4ccc(Cl)cc4Cl)n3)nc3ccccc3c21'
rdkit_mol = rdkit_molecule_from_smiles(smiles, minimisation_method="MMFF94")
Error is reported at the kallisto_molecule_from_rdkit_molecule step:
Jazzy ERROR: [18:22:55] The RDKit embedding has failed for the molecule: ...
Trying the following code extracted from rdkit_molecule_from_smiles generates an error with randomSeed=11 but works with randomSeed=12:
from rdkit import Chem
from rdkit.Chem import AllChem
smiles = 'CC(C)Cn1cnc2c(Nc3nc(NCc4ccc(CN5CCCC5=O)cc4)nc(NC@@HCc4ccc(Cl)cc4Cl)n3)nc3ccccc3c21'
m = Chem.MolFromSmiles(smiles)
mh = Chem.AddHs(m)
emb_code = AllChem.EmbedMolecule(mh, randomSeed=11)
if emb_code == -1:
print(f"The RDKit embedding has failed for the molecule: {smiles}")
else:
print("Success!")
Expected behavior
Please add a cycle to try embedding a couple of times with a different random seed as a simple workaround.
I suppose this is more of an rdkit issue, but it would be much harder to fix it there.
The text was updated successfully, but these errors were encountered: