-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mol support #157
add mol support #157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This looks really great! I've got some code for featurising molecular graphs I've used in other projects I can try to add in.
I'll also try to add a notebook-based tutorial to show off the features. Thanks, Yuanqi!
|
||
import numpy as np | ||
|
||
BASE_ATOMS: List[str] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to have some different sets of allowable atoms. I can check some previous projects for the sets we allowed there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked this, OGB basically used atomic numbers from 1-120 and leave others as others. Feel free to change!
graphein/molecule/edges/atomic.py
Outdated
if G.has_edge(n1, n2): | ||
G.edges[n1, n2]["kind"].add("bond") | ||
else: | ||
G.add_edge(n1, n2, kind={"bond"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an option to distinguish bond order (single/double/triple etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh you are right, do we make it as edge feature or mark here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do it as the kind. E.g kind={"single", "bond"}
or kind={"single_bond"}
. I think the first one may be easier to work with (and is what I did for atomic protein graphs)
|
||
config = MoleculeGraphConfig() | ||
|
||
def test_generate_graph_sdf(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some more checks on the graphs. We can follow a similar model to the protein graph tests.
Codecov Report
@@ Coverage Diff @@
## master #157 +/- ##
==========================================
+ Coverage 40.27% 49.25% +8.98%
==========================================
Files 48 74 +26
Lines 2811 4172 +1361
==========================================
+ Hits 1132 2055 +923
- Misses 1679 2117 +438
Continue to review full report at Codecov.
|
Kudos, SonarCloud Quality Gate passed! |
PRs
Fixes #155. Add support for small molecules.
Support reading molecules from SMILES strings, .mol2, .sdf, .pdb files
Support one-hot node features
Support fully-connected, knn, bond, distance threshold edges
Test
See tests/molecule/test_data and tests/molecule/test_graphs.py
test basic functions to read either long or short molecules in the aforementioned ways