Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mol support #157

Merged
merged 43 commits into from
Apr 19, 2022
Merged

add mol support #157

merged 43 commits into from
Apr 19, 2022

Conversation

yuanqidu
Copy link
Contributor

PRs

Fixes #155. Add support for small molecules.
Support reading molecules from SMILES strings, .mol2, .sdf, .pdb files
Support one-hot node features
Support fully-connected, knn, bond, distance threshold edges

Test

See tests/molecule/test_data and tests/molecule/test_graphs.py

test basic functions to read either long or short molecules in the aforementioned ways

Copy link
Owner

@a-r-j a-r-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This looks really great! I've got some code for featurising molecular graphs I've used in other projects I can try to add in.

I'll also try to add a notebook-based tutorial to show off the features. Thanks, Yuanqi!


import numpy as np

BASE_ATOMS: List[str] = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have some different sets of allowable atoms. I can check some previous projects for the sets we allowed there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked this, OGB basically used atomic numbers from 1-120 and leave others as others. Feel free to change!

graphein/molecule/config.py Outdated Show resolved Hide resolved
if G.has_edge(n1, n2):
G.edges[n1, n2]["kind"].add("bond")
else:
G.add_edge(n1, n2, kind={"bond"})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an option to distinguish bond order (single/double/triple etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh you are right, do we make it as edge feature or mark here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it as the kind. E.g kind={"single", "bond"} or kind={"single_bond"}. I think the first one may be easier to work with (and is what I did for atomic protein graphs)

graphein/molecule/edges/atomic.py Outdated Show resolved Hide resolved
graphein/molecule/graphs.py Outdated Show resolved Hide resolved
tests/molecule/test_graphs.py Outdated Show resolved Hide resolved

config = MoleculeGraphConfig()

def test_generate_graph_sdf():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some more checks on the graphs. We can follow a similar model to the protein graph tests.

graphein/molecule/edges/distance.py Outdated Show resolved Hide resolved
graphein/molecule/edges/distance.py Outdated Show resolved Hide resolved
graphein/molecule/edges/atomic.py Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2022

Codecov Report

Merging #157 (7a0d1a5) into master (8123f42) will increase coverage by 8.98%.
The diff coverage is 58.00%.

@@            Coverage Diff             @@
##           master     #157      +/-   ##
==========================================
+ Coverage   40.27%   49.25%   +8.98%     
==========================================
  Files          48       74      +26     
  Lines        2811     4172    +1361     
==========================================
+ Hits         1132     2055     +923     
- Misses       1679     2117     +438     
Impacted Files Coverage Δ
graphein/grn/parse_trrust.py 37.77% <ø> (ø)
graphein/ml/diffusion.py 0.00% <0.00%> (ø)
graphein/ppi/edges.py 100.00% <ø> (ø)
graphein/ppi/graph_metadata.py 0.00% <ø> (ø)
graphein/ppi/graphs.py 54.34% <ø> (ø)
graphein/ppi/parse_biogrid.py 75.00% <ø> (ø)
graphein/ppi/visualisation.py 0.00% <0.00%> (ø)
graphein/protein/analysis.py 0.00% <0.00%> (ø)
graphein/protein/features/sequence/sequence.py 71.42% <0.00%> (+2.67%) ⬆️
graphein/protein/features/sequence/utils.py 28.00% <0.00%> (+3.00%) ⬆️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ceb9c8...7a0d1a5. Read the comment docs.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@a-r-j a-r-j merged commit 8ccde9a into a-r-j:master Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for small molecules & Protein Ligands
3 participants