-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for small molecules & Protein Ligands #155
Comments
Hey Arian, I will be working on it :D |
I agree that both RDKit and OpenBabel could be optionally dependency packages as they are much easier to install these days with conda and building the small molecule support with these two packages should be great. |
Yep, I think of the two RDKit is the better choice since it is now pip installable. OpenBabel could be trickier (I’ve had some issues getting the python bindings working properly in the past). |
* add mol support * add rdkit as optional dependency * lint atomic edges * lint distance * lint graphs * lint atom type * lint tests * additional node features from rdkit * additional vocabularies to atoms * add more RDKit constants * refactor config matching utils to reduce duplication * add molecule graph config to yaml parser * lint edge funcs * add bond-based edge feature funcs * add additional Rdkit node features * add feature __init__ * add molecule __init__ * lint graphs * fix rdkit constants * add molecule module to docs * fix rdkit constants * fix rdkit type hint * refactor config utils to prevent circular import * add names to graphs * add global mol descriptors * add add_hs flag * fix feature funcs * add descriptor vocab * update changelog * add molecule tutorial * add molecule tutorial to docs * update path to config parser utils * add additional molecular graph tests * add plotting functions for molecules * update mol tutorial with 3d plots * add molecule visualisation to docs * fix molecule notebooks in docs * update docs with refactored config * update node naming to including element symbol Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
This issue serves as the starting point for developing features to support small molecule graphs.
Ligands from Mol2 files:
Ligands from SDF files
I'm not aware of any lightweight parsers for SDF files. We could use the parsers in OpenBabel/RDKit as optional dependencies, or lift the parsers from these libraries if they're sufficiently compact (and licensing is sufficiently permissive).
Ligands from SMILES
Similar to the issues described for SDF files.
Ligands from PDB Files
This should be straightforward to extract from the current PDB file parsing, we can extract them from the HETATM df, which is separated from the main polypeptide dataframe. The trickiness here comes from distinguishing cofactors/crystallographic adjuvants from 'true' ligands of interest. We could circumnavigate this by allowing users to remove typical species of these types - we maintain a dictionary of them here
Inferring connectivity
Distance-based connectivity (eucl. threshold, KNN etc) should be straightforward and can likely leverage the existing code used for protein graphs. The more complicated scenario is inferring the covalent bonding from PDBs. PDB files may provide
CONNECT
records specifying ligand structure but I am not sure this is a guarantee.Features
Some idea of the sorts of features users would like to use would be quite helpful. I think having RDKit as an optional dependency and allowing users to interact with it would be the most straightforward solution here.
Keen to hear thoughts :)
The text was updated successfully, but these errors were encountered: