Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements in the context of knowledge graph resources #207

Merged
merged 27 commits into from
Mar 12, 2024

Conversation

abearab
Copy link
Contributor

@abearab abearab commented Aug 4, 2023

Suggesting a new data function: Knowledge Graph Mastery, #23 (comment)

Changes:

  • Add knowledge_graph module
  • Introduce KnowledgeGraph python class
  • Include build_KG function to enable creating graph from scratch
  • Add to_KG method to current primekg in resource module to get PrimeKG as KnowledgeGraph object

Features:

abearab added a commit to GilbertLabUCSF/Decitabine-treatment that referenced this pull request Sep 7, 2023
@abearab abearab changed the title Add a function to the primekg module Adding more functionalities to the primekg module Sep 7, 2023
@abearab abearab changed the title Adding more functionalities to the primekg module Improvements in the context of knowledge graph resources Sep 30, 2023
abearab added a commit to GilbertLabUCSF/Decitabine-treatment that referenced this pull request Oct 15, 2023
@kexinhuang12345
Copy link
Collaborator

Hi @abearab thanks for the PR! I think this looks like a lightweight approach to accommodate all the future KGs that could be added to the TDC. I have two comments:

  1. The KG class should be probably defined under the tdc/utils folder. Since it is a data function.
  2. Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

Otherwise, looks great - happy to discuss more about these two comments if you have any question!

@abearab
Copy link
Contributor Author

abearab commented Oct 30, 2023

Thanks @kexinhuang12345

The KG class should be probably defined under the tdc/utils folder. Since it is a data function.

done – 5eee5a6

Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

Yeah, I agree. I used a method option for the current PrimeKG class for converting it to KnowledgeGraph class. I just removed that so the .KG will keep the data for PrimeKG and inherits from the KG class by default. Thanks for the suggestion.
see 354cd50

@amva13
Copy link
Member

amva13 commented Mar 5, 2024

hi. can you run the new conda-test GH Action tests? They should be run if you create a new pull request. Also looks like your branch has conflicts. You should resolve those. Closing this for now.

@amva13 amva13 closed this Mar 5, 2024
@abearab
Copy link
Contributor Author

abearab commented Mar 5, 2024

hey @amva13 – thanks for your response.

I got this error, any thoughts?

OK (skipped=1)
/home/runner/work/TDC/TDC
All base tests passed
--- tdc/resource/primekg.py	(original)
+++ tdc/resource/primekg.py	(reformatted)
@@ -12,6 +12,7 @@
 from ..utils.knowledge_graph import KnowledgeGraph
 
 warnings.filterwarnings("ignore")
+
 
 class PrimeKG:
     """PrimeKG data loader class to load the knowledge graph with additional support functions.
@@ -30,9 +31,9 @@
 
         G = nx.Graph()
         for i in self.KG.df.relation.unique():
-            G.add_edges_from(
-                self.KG.df[self.df.relation == i][["x_id", "y_id"]].values, relation=i
-            )
+            G.add_edges_from(self.KG.df[self.df.relation == i][["x_id",
+                                                                "y_id"]].values,
+                             relation=i)
         return G
 
     def get_features(self, feature_type):
@@ -43,7 +44,5 @@
 
     def get_node_list(self, node_type):
         df = self.KG.df
-        return np.unique(
-            df[(df.x_type == node_type)].x_id.unique().tolist()
-            + df[(df.y_type == node_type)].y_id.unique().tolist()
-        )
+        return np.unique(df[(df.x_type == node_type)].x_id.unique().tolist() +
+                         df[(df.y_type == node_type)].y_id.unique().tolist())
--- tdc/utils/knowledge_graph.py	(original)
+++ tdc/utils/knowledge_graph.py	(reformatted)
@@ -5,13 +5,13 @@
 from copy import copy
 
 kg_columns = [
-    'relation', 'display_relation',
-    'x_id', 'x_type', 'x_name', 'x_source',
+    'relation', 'display_relation', 'x_id', 'x_type', 'x_name', 'x_source',
     'y_id', 'y_type', 'y_name', 'y_source'
 ]
 
 
 class KnowledgeGraph:
+
     def __init__(self, df=None):
         if df is not None:
             self.df = df
@@ -29,26 +29,32 @@
     def get_nodes_by_source(self, source):
         # extract x nodes
         x_df = self.df.query(
-            f"x_source == '{source}' | y_source == '{source}'"
-        )[[col for col in self.df.columns if col.startswith("x_")]]
+            f"x_source == '{source}' | y_source == '{source}'")[[
+                col for col in self.df.columns if col.startswith("x_")
+            ]]
 
         for col in x_df.columns:
             x_df = x_df.rename(columns={col: col[2:]})
 
         # extract y nodes
         y_df = self.df.query(
-            f"x_source == '{source}' | y_source == '{source}'"
-        )[[col for col in self.df.columns if col.startswith("y_")]]
+            f"x_source == '{source}' | y_source == '{source}'")[[
+                col for col in self.df.columns if col.startswith("y_")
+            ]]
         for col in y_df.columns:
             y_df = y_df.rename(columns={col: col[2:]})
         # merge x and y nodes and keep only unique nodes
-        out = pd.concat([x_df, y_df], axis=0).query(f'source == "{source}"').drop_duplicates().reset_index(drop=True)
+        out = pd.concat([
+            x_df, y_df
+        ], axis=0).query(f'source == "{source}"').drop_duplicates().reset_index(
+            drop=True)
 
         return out
 
 
-def build_KG(indices, relation, display_relation, x_id, x_type, x_name, x_source, y_id, y_type, y_name, y_source):
-    df = pd.DataFrame('',columns=kg_columns,index=indices)
+def build_KG(indices, relation, display_relation, x_id, x_type, x_name,
+             x_source, y_id, y_type, y_name, y_source):
+    df = pd.DataFrame('', columns=kg_columns, index=indices)
 
     df.relation = relation
     df.display_relation = display_relation
Error: Process completed with exit code 1.

@amva13
Copy link
Member

amva13 commented Mar 5, 2024

Yes this is an error from the linter. It's from your code's formatting. In theory, you should be able to fix by simply replacing the contents of your file with the output from
yapf --style=google tdc/utils/knowledge_graph.py
for example.
here's more info on the linter if helpful
https://github.com/google/yapf

@amva13
Copy link
Member

amva13 commented Mar 5, 2024

your tdc/resource/primekg.py file is also failing so make sure to do the same there.

@amva13
Copy link
Member

amva13 commented Mar 5, 2024

reopening

@amva13 amva13 reopened this Mar 5, 2024
@abearab
Copy link
Contributor Author

abearab commented Mar 5, 2024

Yes this is an error from the linter. It's from your code's formatting. In theory, you should be able to fix by simply replacing the contents of your file with the output from yapf --style=google tdc/utils/knowledge_graph.py for example. here's more info on the linter if helpful https://github.com/google/yapf

Cool, I fixed it.

@amva13
Copy link
Member

amva13 commented Mar 5, 2024

running tests

@amva13 amva13 closed this Mar 5, 2024
@amva13 amva13 reopened this Mar 5, 2024
@amva13
Copy link
Member

amva13 commented Mar 5, 2024

looks good @abearab will complete review this week.

@amva13
Copy link
Member

amva13 commented Mar 7, 2024

Hey @abearab . From the looks of it you should be able to separate your edits to PrimeKG into a different class. Can you go ahead and do that? It can inherit from existing PrimeKG if useful. As there are no dependencies on PrimeKG in the rest of the repo, you and anyone wishing to leverage the code you implemented in KnowledgeGraph should be able to use your class directly. As there is not yet extensive testing on PrimeKG, I'd prefer we proceed in this manner. I can approve when this is done.

@abearab
Copy link
Contributor Author

abearab commented Mar 7, 2024

Hi @amva13 – we have a discussion with @kexinhuang12345 here #207 (comment) which motivated me to modify PrimeKG class. This is what Kexin suggested:

Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

I'm not sure if I understand your concern, can you clarify please? I'll be more than happy to add a few more commits if needed.

@kexinhuang12345
Copy link
Collaborator

Yes, i think it would be great to create a separate KG class and PrimeKG can inherit from it. The reason is that in the future if there is other KG, then we can inherit from that class and reuse it instead of reimplementing the KG loading/processing/analysis shared functions. Let me know if that makes sense!

@amva13
Copy link
Member

amva13 commented Mar 7, 2024

@abearab your implementation does not currently have PrimeKG inheriting from KnowledgeGraph/KG (i.e. would look like PrimeKG(KG) for example)

Regardless - please do not change the code in PrimeKG. Please create a separate class, say, PrimeKGDev. There is no need for you to inherit from KG. You can use the exact same code you are using for PrimeKG and copy that over to PrimeKGDev.

PrimeKGDev should offer exactly the same interface as PrimeKG (this should not be difficult if you just copy over the code).

For conflict resolution, I am currently the authority on TDC, but you may ask @marinkaz for clarification if needed.

@kexinhuang12345 please feel free to implement testing on PrimeKG and we can consolidate the two classes, say by changing PrimeKG to inherit directly from PrimeKGDev, accordingly.

Thanks!

@abearab
Copy link
Contributor Author

abearab commented Mar 8, 2024

@ abearab your implementation does not currently have PrimeKG inheriting from KnowledgeGraph/KG (i.e. would look like PrimeKG(KG) for example)

That's true, now I fixed it. Thanks for the suggestion.

Regardless - please do not change the code in PrimeKG. Please create a separate class, say, PrimeKGDev. There is no need for you to inherit from KG. You can use the exact same code you are using for PrimeKG and copy that over to PrimeKGDev.

PrimeKGDev should offer exactly the same interface as PrimeKG (this should not be difficult if you just copy over the code).

I'm not sure if I understood your points for having both PrimeKGDev and PrimeKG. Can you clarify, please? Or if it's easier for you, maybe you can add that yourself? Let me know what you think.

For conflict resolution, I am currently the authority on TDC, but you may ask @marinkaz for clarification if needed.

Sounds great, I don't think any clarification is needed in this regard. I just referred to our prior conversations with @kexinhuang12345 to make sure we are all on the same page. Looking forward to seeing all upcoming features in TDC!

Copy link
Member

@amva13 amva13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved dev class. TODO switchover and tests

tdc/resource/primekg.py Show resolved Hide resolved
@amva13
Copy link
Member

amva13 commented Mar 12, 2024

@abearab feel free to merge this at your convenience. lgtm

@amva13 amva13 merged commit 2ccd732 into mims-harvard:main Mar 12, 2024
3 checks passed
abearab added a commit to abearab/TDC that referenced this pull request Apr 29, 2024
@abearab abearab mentioned this pull request Apr 29, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants