Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop < T541-FL] Import and export strategies #215

Merged
merged 25 commits into from
Mar 9, 2023

Conversation

as51340
Copy link
Contributor

@as51340 as51340 commented Jan 5, 2023

Description

Import and export to DGL, PyG and Nx.
Docs PR: memgraph/docs#693

Pull request type

Please delete options that are not relevant.

  • Feature

Related issues

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

######################################

Reviewer checklist (the reviewer checks this part)

  • Core feature implementation
  • Tests
  • Code documentation
  • Documentation on memgraph/docs

######################################

@as51340 as51340 marked this pull request as ready for review January 16, 2023 12:27
@as51340 as51340 self-assigned this Jan 16, 2023
@as51340 as51340 added the status: ready PR is ready for review label Jan 16, 2023
@as51340
Copy link
Contributor Author

as51340 commented Jan 16, 2023

There are some things that will be added soon:

  1. on some places Memgraph info for connecting is needed
  2. processing logic currently ignores isolated nodes. For networkx this must be added but for dgl and pyg not so sure.
  3. a possibility to import/export any graph projection: I wouldn't go into this until we see someone uses is because it will require a lot of changes for questionable value but I am open for discussion 😄

Copy link
Contributor

@antoniofilipovic antoniofilipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of work done here, good job! I checked mostly export, import and translators, didn't yet check tests. Will do that in next run

gqlalchemy/transformations/export/graph_transporter.py Outdated Show resolved Hide resolved
graph = transporter.export()
"""

def __init__(self, graph_type: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use graph_type as enum? This is easier to follow on different places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the method has to receive string since it is called directly by the user but I added comparison with the enum so I think it is good now.

@@ -35,17 +35,10 @@
)
from gqlalchemy.vendors.database_client import DatabaseClient
from gqlalchemy.graph_algorithms.query_modules import QueryModule
from gqlalchemy.memgraph_constants import MG_HOST, MG_PORT, MG_USERNAME, MG_PASSWORD, MG_ENCRYPTED, MG_CLIENT_NAME, MG_LAZY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use here same techinque I mentioned in export folder in graph_transporter.py

from gqlalchemy.transformations.translators.dgl_translator import DGLTranslator
from gqlalchemy.transformations.translators.nx_translator import NxTranslator
from gqlalchemy.transformations.translators.pyg_translator import PyGTranslator
from gqlalchemy.memgraph_constants import MG_HOST, MG_PORT, MG_USERNAME, MG_PASSWORD, MG_ENCRYPTED, MG_CLIENT_NAME, MG_LAZY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same here for constants as in graph_transporeter.py

elif self.graph_type == "nx":
self.translator = NxTranslator(default_node_label, default_edge_type, host, port, username, password, encrypted, client_name, lazy)
else:
raise ValueError("Unknown import option. Currently supported are DGL, PyG and Networkx.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Currently supported options are: DGL,..."

Comment on lines 134 to 136
if features is not None:
graph.edges[edge_triplet].data[feature_name] = features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for edge_triplet, features_dict ... for feature_name, features ... translated_features = ... //it makes sense here to rename variable if features is None: //this way you are not indenting lines so much continue graph.edges[edge_triplet].data[feature_name] = translated_features

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Comment on lines 69 to 80
for source_node_id, dest_node_id, eid in zip(source_nodes, dest_nodes, eids):
# Handle properties
source_node_properties, dest_node_properties, edge_properties = {}, {}, {}
# Copy source node properties
source_node_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][source_node_id])), node_src_label_properties.items()))
source_node_properties[DGL_ID] = int(source_node_id)
# Copy destination node properties
dest_node_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][dest_node_id])), node_dest_label_properties.items()))
dest_node_properties[DGL_ID] = int(dest_node_id)
# Copy edge features
edge_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][eid])), etype_properties.items()))
edge_properties[DGL_ID] = int(eid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, you are going over source_nodes, dest_nodes and eids. And for all those pairs, node_src_label_properties which I persume represents properties keys, is same for all source nodes? Why is that?

Couldn't it happen that some of nodes in source nodes have extra property? Or this is not stored at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so node_src_label_properties and node_dest_label_properties are used in DGL for storing properties. They save for each node type an array, where each element of the array contains data about one node. DGL stores only numeric properties and I have imposed the restriction that only properties which are on all nodes, can be translated to DGL and PyG because both frameworks don't allow different situation.

Comment on lines 165 to 171
self,
graph: nx.Graph,
host: str = "127.0.0.1",
port: int = 7687,
username: str = "",
password: str = "",
encrypted: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is host, port, username, password mentioned here again? Because we have it in init, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this method was from old times so I forgot to change this.

Comment on lines 206 to 211
args=(
process_queries,
host,
port,
username,
password,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why are we here sending again host, port, username and password, don't we have it in init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes same as above, it is old method.

Comment on lines 44 to 47
if iter_node_label == node_label:
for property_name, property_values in iter_node_properties.items():
if property_name != NUM_NODES:
node_properties[property_name] = property_values[node_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can reduce indenting here:

if iter_node_label != node_label: continue for property_name, property_values in iter_node_properties.items(): if property_name == NUM_NODES: continue node_properties[property_name] = property_values[node_id]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@katarinasupe
Copy link
Contributor

Just a general docs comment. I can see that you changed the location of the transformations. This can affect the current GQLA docs and it should be checked where and fix it. For example, it will probably change something here: https://memgraph.com/docs/gqlalchemy/how-to-guides/networkx. Besides that, please make sure to document at least something regarding the new features included in the next release. GQLA docs need to be regenerated and fixed, but by adding new content we at least don't lose knowledge and can help users.

@as51340
Copy link
Contributor Author

as51340 commented Jan 27, 2023

Just a general docs comment. I can see that you changed the location of the transformations. This can affect the current GQLA docs and it should be checked where and fix it. For example, it will probably change something here: https://memgraph.com/docs/gqlalchemy/how-to-guides/networkx. Besides that, please make sure to document at least something regarding the new features included in the next release. GQLA docs need to be regenerated and fixed, but by adding new content we at least don't lose knowledge and can help users.

Yes, yes I have to start writing docs and editing existing transformation docs. Thanks for the ping!

Copy link
Contributor

@Josipmrden Josipmrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, overall good job

@@ -0,0 +1,68 @@
# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023


class GraphTransporter(Transporter):
"""Here is a possible example for using this module:
>>> transporter = GraphTransported("dgl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphTransporter typo

Comment on lines 32 to 33
default_node_label="NODE",
default_edge_type="RELATIONSHIP",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put these 2 as well under constants.py

) -> None:
"""Initializes GraphTransporter. It is used for converting Memgraph graph to the specific graph type offered by some Python package (PyG, DGL, NX...)
Here is a possible example for using this module:
>>> transporter = GraphTransported("dgl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo GraphTransporter

@@ -0,0 +1,13 @@
# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just don't forget to change everywhere 2023. I'm not sure if this can be done automatically but for now we can do it manually.

Comment on lines 81 to 85
"""Produce cypher queries for data saved as part of thePyG graph. The method handles both homogeneous and heterogeneous graph.
The method converts 1D as well as multidimensional features. If there are some isolated nodes inside the graph, they won't get transferred. Nodes and edges
created in Memgraph DB will, for the consistency reasons, have property `pyg_id` set to the id they have as part of the PyG graph. Note that this method doesn't insert anything inside the database,
it just creates cypher queries. To insert queries the following code can be used:
>>> memegraph = Memgraph()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch for line length

Comment on lines 49 to 50
default_node_label="NODE",
default_edge_type="RELATIONSHIP",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract


for row in rel_results:
row_values = row.values()
# print(f"Row values: {row_values}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete print

@@ -32,7 +32,7 @@ exclude = '''
'''

[tool.poetry.dependencies]
python = "^3.7"
python = "^3.8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this on purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katarinasupe do we have any implications on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is on purpose because of poe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Josipmrden I think all should be fine.

@vpavicic
Copy link

vpavicic commented Mar 9, 2023

@as51340 - Data from Memgraph can now be imported from and exported to NetworkX, DGL and PyG graph formats.

@brunos252 brunos252 changed the base branch from main to develop March 9, 2023 13:54
@brunos252 brunos252 merged commit f530227 into develop Mar 9, 2023
@brunos252 brunos252 deleted the T541-FL-import-export-strategies branch March 9, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants