Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for small molecules & Protein Ligands #155

Closed
a-r-j opened this issue Apr 9, 2022 · 3 comments · Fixed by #157
Closed

Support for small molecules & Protein Ligands #155

a-r-j opened this issue Apr 9, 2022 · 3 comments · Fixed by #157
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@a-r-j
Copy link
Owner

a-r-j commented Apr 9, 2022

This issue serves as the starting point for developing features to support small molecule graphs.

Ligands from Mol2 files:

Ligands from SDF files

I'm not aware of any lightweight parsers for SDF files. We could use the parsers in OpenBabel/RDKit as optional dependencies, or lift the parsers from these libraries if they're sufficiently compact (and licensing is sufficiently permissive).

Ligands from SMILES

Similar to the issues described for SDF files.

Ligands from PDB Files

This should be straightforward to extract from the current PDB file parsing, we can extract them from the HETATM df, which is separated from the main polypeptide dataframe. The trickiness here comes from distinguishing cofactors/crystallographic adjuvants from 'true' ligands of interest. We could circumnavigate this by allowing users to remove typical species of these types - we maintain a dictionary of them here

Inferring connectivity

Distance-based connectivity (eucl. threshold, KNN etc) should be straightforward and can likely leverage the existing code used for protein graphs. The more complicated scenario is inferring the covalent bonding from PDBs. PDB files may provide CONNECT records specifying ligand structure but I am not sure this is a guarantee.

Features

Some idea of the sorts of features users would like to use would be quite helpful. I think having RDKit as an optional dependency and allowing users to interact with it would be the most straightforward solution here.

Keen to hear thoughts :)

@a-r-j a-r-j added enhancement New feature or request help wanted Extra attention is needed labels Apr 9, 2022
@yuanqidu
Copy link
Contributor

Hey Arian, I will be working on it :D

@yuanqidu
Copy link
Contributor

I agree that both RDKit and OpenBabel could be optionally dependency packages as they are much easier to install these days with conda and building the small molecule support with these two packages should be great.

@a-r-j
Copy link
Owner Author

a-r-j commented Apr 11, 2022

Yep, I think of the two RDKit is the better choice since it is now pip installable. OpenBabel could be trickier (I’ve had some issues getting the python bindings working properly in the past).

a-r-j added a commit that referenced this issue Apr 19, 2022
* add mol support

* add rdkit as optional dependency

* lint atomic edges

* lint distance

* lint graphs

* lint atom type

* lint tests

* additional node features from rdkit

* additional vocabularies to atoms

* add more RDKit constants

* refactor config matching utils to reduce duplication

* add molecule graph config to yaml parser

* lint edge funcs

* add bond-based edge feature funcs

* add additional Rdkit node features

* add feature __init__

* add molecule __init__

* lint graphs

* fix rdkit constants

* add molecule module to docs

* fix rdkit constants

* fix rdkit type hint

* refactor config utils to prevent circular import

* add names to graphs

* add global mol descriptors

* add add_hs flag

* fix feature funcs

* add descriptor vocab

* update changelog

* add molecule tutorial

* add molecule tutorial to docs

* update path to config parser utils

* add additional molecular graph tests

* add plotting functions for molecules

* update mol tutorial with 3d plots

* add molecule visualisation to docs

* fix molecule notebooks in docs

* update docs with refactored config

* update node naming to including element symbol

Co-authored-by: Arian Jamasb <arjamasb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants