-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDKitToolkitWrapper
conformer generation fails on some large molecules
#822
Comments
Apologies for a not-very-useful error message. I can reproduce this; the failures can come from failing conformer generation, which happens before trying to assign charges. In: # Modified script1.py
...
pdb = PDBFile("structure.pdb")
ligand = Molecule.from_smiles("c1cc(Cl)ccc1C2=N[C@@H](c(nnc3C)n3c(c24)sc(C)c4C)CC(=O)NCCOCCOCCOCCOCCOCCOCCOCCOCCOCCNC(=O)COc5cccc(c56)C(=O)N(C6=O)[C@@H](C7=O)CCC(=O)N7")
ligand.generate_conformers(n_conformers=1) Out:
The root of the confusing error is that Chem.AllChem.EmbedMultipleConfs(rdmol, numConfs=1, pruneRmsThresh=1, randomSeed=1)
rdmol.GetNumConformers() # returns 0 This may be a bug or shortcoming of RDKit, I'm not sure. I'm not so familiar with conformer generation so I can't help much more than that. Aside from being huge, the ligand doesn't seem to have an invalid topology. If you have access to the OpenEye toolkits, it doesn't seem to have an issue with this molecule ( However - your molecule already has a "conformer" on it, since the PDB file has reasonable-looking positions, which can be used in the AM1-BCC call, bypassing the above issue. # modified script2.py
rdmol = Chem.MolFromPDBFile("structure.pdb", removeHs=False)
molecule = Molecule.from_rdkit(rdmol)
molecule.assign_partial_charges(partial_charge_method="formal_charge", use_conformers=molecule.conformers) # Dummy placeholder values only!
topology = molecule.to_topology()
system = force_field.create_openmm_system(topology=topology, charge_from_molecules=[molecule]) Since the ligand is so big, AM1 takes a long time (I killed it after several minutes on my poor dual-core machine). If there's another way to get partial charges and stick them into i.e. an SDF file, that would also work. The partial charges there are all zeros; just dummy values to demonstrate that actual values can be dropped in. |
Thanks @mattwthompson ! As I wrote, the scripts work for many other molecules created in the similar way so I am confused what's special about this one? Why RDKit conformer generation is called at all? I would understand that someone provides SMILES only so a conformer needs to be generated, but both examples in OpenFF docs are based on PDB+SMILES (when off topology) or PDB only (when processed by RDKit). Could you also explain why calling explicitly |
Hi @mieczyslaw,
We do this to improve the reproducibility of the charges that get assigned. One of OpenFF's core philosphies is that straightforward usage of our tools should yield reproducible, publication-ready outputs. Something we've seen a lot is papers saying "We charged the ligand using AM1-BCC", which doesn't actually tell the full story, since AM1 is conformer-dependent. So, our AM1BCC charging will always try to generate its own conformer using a backend toolkit, so that the same chemical graph/connection table will get the same partial charges. We recognize that it's a bit counterintuitive, and that the exact output is still dependent on factors like low-level system factors and specific RDKit/AmberTools versions. But it should avoid situations where, for example, someone seeds simulations from different conformers of the same molecule, and each simulation ends up with different physical parameters. |
Thanks @j-wags. It seems I found a solution you may want to use in case the exception is generated (or at least to show the user a hint) - I regenerated SMILES with RDKit using the code:
So I got:
Then with |
Update: this must be related to the size of the molecule. I have another example:
This time even processing SMILES via RDKit doesn't help. |
@mattwthompson your modified script with calling
|
I reported in RDKit issues tracker: rdkit/rdkit#3764 |
As suggested by Greg/RDKit, the solution is to add
Also, when only single conformer is needed, there is another function:
It seems to properly generate conformer as I tested with I believe that the current code can be kept, but if no conformers are generated, it can try again with |
Hi @mieczyslaw - Thanks for doing so much legwork on this. I'm really sorry this fell off our radar. I hope we can be better organized in the future. I just saw openmm/openmm#3550 which has the same root cause, and it kicked my memory to come looking for this.
Yeah, I think this is the right solution. I'm attaching this to a release milestone so it doesn't get lost again. Again, I really apologize that we lost track of this, and I'm very grateful for your help here. |
RDKitToolkitWrapper
conformer generation fails on some large molecules
Describe the bug
When I try to parametrize attached structure (using two versions of code as in OpenFF manual/github examples), openforcefield.utils.toolkits.ChargeMethodUnavailableError is generated.
The structure's topology is a bit off as it comes from warheads/linker assembly. The structure contains the correct CONECT
records (repaired in Maestro) and SMILES is also generated by Maestro. The scripts work fine for some other molecules.
To Reproduce
Try the following codes:
script1.py
script2.py
The structure in PDB (as TXT as GitHub doesn't like PDB extension):
structure.pdb.txt
Output
Computing environment (please complete the following information):
conda list
Additional context
The goal after the system generation step is to use parmed (as in docs) to save prmtop file.
The text was updated successfully, but these errors were encountered: