[develop < T541-FL] Import and export strategies #215

as51340 · 2023-01-05T12:58:03Z

Description

Import and export to DGL, PyG and Nx.
Docs PR: memgraph/docs#693

Pull request type

Please delete options that are not relevant.

Feature

Related issues

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

######################################

Reviewer checklist (the reviewer checks this part)

Core feature implementation
Tests
Code documentation
Documentation on memgraph/docs

######################################

* bump to 2.5 * try 2.2 * change ubuntu to 20.04 * revert to MG 2.3.0 * bump to mg 2.5.0 and update qm signatures

as51340 · 2023-01-16T13:26:46Z

There are some things that will be added soon:

on some places Memgraph info for connecting is needed
processing logic currently ignores isolated nodes. For networkx this must be added but for dgl and pyg not so sure.
a possibility to import/export any graph projection: I wouldn't go into this until we see someone uses is because it will require a lot of changes for questionable value but I am open for discussion 😄

antoniofilipovic

Lot of work done here, good job! I checked mostly export, import and translators, didn't yet check tests. Will do that in next run

gqlalchemy/transformations/export/graph_transporter.py

antoniofilipovic · 2023-01-18T10:11:57Z

gqlalchemy/transformations/export/graph_transporter.py

+    graph = transporter.export()
+    """
+
+    def __init__(self, graph_type: str,


Maybe use graph_type as enum? This is easier to follow on different places

So the method has to receive string since it is called directly by the user but I added comparison with the enum so I think it is good now.

antoniofilipovic · 2023-01-18T10:13:47Z

gqlalchemy/vendors/memgraph.py

@@ -35,17 +35,10 @@
 )
 from gqlalchemy.vendors.database_client import DatabaseClient
 from gqlalchemy.graph_algorithms.query_modules import QueryModule
+from gqlalchemy.memgraph_constants import MG_HOST, MG_PORT, MG_USERNAME, MG_PASSWORD, MG_ENCRYPTED, MG_CLIENT_NAME, MG_LAZY


You can use here same techinque I mentioned in export folder in graph_transporter.py

antoniofilipovic · 2023-01-18T10:14:32Z

gqlalchemy/transformations/importing/graph_importer.py

+from gqlalchemy.transformations.translators.dgl_translator import DGLTranslator
+from gqlalchemy.transformations.translators.nx_translator import NxTranslator
+from gqlalchemy.transformations.translators.pyg_translator import PyGTranslator
+from gqlalchemy.memgraph_constants import MG_HOST, MG_PORT, MG_USERNAME, MG_PASSWORD, MG_ENCRYPTED, MG_CLIENT_NAME, MG_LAZY


And same here for constants as in graph_transporeter.py

antoniofilipovic · 2023-01-18T10:15:15Z

gqlalchemy/transformations/importing/graph_importer.py

+        elif self.graph_type == "nx":
+            self.translator = NxTranslator(default_node_label, default_edge_type, host, port, username, password, encrypted, client_name, lazy)
+        else:
+            raise ValueError("Unknown import option. Currently supported are DGL, PyG and Networkx.")


"Currently supported options are: DGL,..."

antoniofilipovic · 2023-01-18T10:24:31Z

gqlalchemy/transformations/translators/dgl_translator.py

+                if features is not None:
+                    graph.edges[edge_triplet].data[feature_name] = features
+


for edge_triplet, features_dict ... for feature_name, features ... translated_features = ... //it makes sense here to rename variable if features is None: //this way you are not indenting lines so much continue graph.edges[edge_triplet].data[feature_name] = translated_features

antoniofilipovic · 2023-01-18T10:33:55Z

gqlalchemy/transformations/translators/dgl_translator.py

+            for source_node_id, dest_node_id, eid in zip(source_nodes, dest_nodes, eids):
+                # Handle properties
+                source_node_properties, dest_node_properties, edge_properties = {}, {}, {}
+                # Copy source node properties
+                source_node_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][source_node_id])), node_src_label_properties.items()))
+                source_node_properties[DGL_ID] = int(source_node_id)
+                # Copy destination node properties
+                dest_node_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][dest_node_id])), node_dest_label_properties.items()))
+                dest_node_properties[DGL_ID] = int(dest_node_id)
+                # Copy edge features
+                edge_properties = dict(map(lambda pair: (pair[0], to_cypher_value(pair[1][eid])), etype_properties.items()))
+                edge_properties[DGL_ID] = int(eid)


If I understand this correctly, you are going over source_nodes, dest_nodes and eids. And for all those pairs, node_src_label_properties which I persume represents properties keys, is same for all source nodes? Why is that?

Couldn't it happen that some of nodes in source nodes have extra property? Or this is not stored at all?

Yes so node_src_label_properties and node_dest_label_properties are used in DGL for storing properties. They save for each node type an array, where each element of the array contains data about one node. DGL stores only numeric properties and I have imposed the restriction that only properties which are on all nodes, can be translated to DGL and PyG because both frameworks don't allow different situation.

antoniofilipovic · 2023-01-18T10:49:15Z

gqlalchemy/transformations/translators/nx_translator.py

+        self,
+        graph: nx.Graph,
+        host: str = "127.0.0.1",
+        port: int = 7687,
+        username: str = "",
+        password: str = "",
+        encrypted: bool = False,


Why is host, port, username, password mentioned here again? Because we have it in init, right?

Yes, this method was from old times so I forgot to change this.

antoniofilipovic · 2023-01-18T10:49:42Z

gqlalchemy/transformations/translators/nx_translator.py

+                    args=(
+                        process_queries,
+                        host,
+                        port,
+                        username,
+                        password,


And why are we here sending again host, port, username and password, don't we have it in init?

Yes same as above, it is old method.

antoniofilipovic · 2023-01-18T10:53:39Z

gqlalchemy/transformations/translators/pyg_translator.py

+            if iter_node_label == node_label:
+                for property_name, property_values in iter_node_properties.items():
+                    if property_name != NUM_NODES:
+                        node_properties[property_name] = property_values[node_id]


I think you can reduce indenting here:

if iter_node_label != node_label: continue for property_name, property_values in iter_node_properties.items(): if property_name == NUM_NODES: continue node_properties[property_name] = property_values[node_id]

katarinasupe · 2023-01-27T12:16:26Z

Just a general docs comment. I can see that you changed the location of the transformations. This can affect the current GQLA docs and it should be checked where and fix it. For example, it will probably change something here: https://memgraph.com/docs/gqlalchemy/how-to-guides/networkx. Besides that, please make sure to document at least something regarding the new features included in the next release. GQLA docs need to be regenerated and fixed, but by adding new content we at least don't lose knowledge and can help users.

as51340 · 2023-01-27T13:17:10Z

Just a general docs comment. I can see that you changed the location of the transformations. This can affect the current GQLA docs and it should be checked where and fix it. For example, it will probably change something here: https://memgraph.com/docs/gqlalchemy/how-to-guides/networkx. Besides that, please make sure to document at least something regarding the new features included in the next release. GQLA docs need to be regenerated and fixed, but by adding new content we at least don't lose knowledge and can help users.

Yes, yes I have to start writing docs and editing existing transformation docs. Thanks for the ping!

Josipmrden

Some comments, overall good job

Josipmrden · 2023-02-14T09:41:05Z

gqlalchemy/transformations/export/graph_transporter.py

@@ -0,0 +1,68 @@
+# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]


Josipmrden · 2023-02-14T09:41:53Z

gqlalchemy/transformations/export/graph_transporter.py

+
+class GraphTransporter(Transporter):
+    """Here is a possible example for using this module:
+    >>> transporter = GraphTransported("dgl")


GraphTransporter typo

Josipmrden · 2023-02-14T09:42:32Z

gqlalchemy/transformations/export/graph_transporter.py

+        default_node_label="NODE",
+        default_edge_type="RELATIONSHIP",


put these 2 as well under constants.py

Josipmrden · 2023-02-14T09:42:48Z

gqlalchemy/transformations/export/graph_transporter.py

+    ) -> None:
+        """Initializes GraphTransporter. It is used for converting Memgraph graph to the specific graph type offered by some Python package (PyG, DGL, NX...)
+        Here is a possible example for using this module:
+        >>> transporter = GraphTransported("dgl")


typo GraphTransporter

Josipmrden · 2023-02-14T09:44:21Z

gqlalchemy/transformations/translators/__init__.py

@@ -0,0 +1,13 @@
+# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]


Just don't forget to change everywhere 2023. I'm not sure if this can be done automatically but for now we can do it manually.

Josipmrden · 2023-02-14T10:23:27Z

gqlalchemy/transformations/translators/pyg_translator.py

+        """Produce cypher queries for data saved as part of thePyG graph. The method handles both homogeneous and heterogeneous graph.
+        The method converts 1D as well as multidimensional features. If there are some isolated nodes inside the graph, they won't get transferred. Nodes and edges
+         created in Memgraph DB will, for the consistency reasons, have property `pyg_id` set to the id they have as part of the PyG graph. Note that this method doesn't insert anything inside the database,
+         it just creates cypher queries. To insert queries the following code can be used:
+         >>> memegraph = Memgraph()


watch for line length

gqlalchemy/transformations/translators/translator.py

Josipmrden · 2023-02-14T10:24:47Z

gqlalchemy/transformations/translators/translator.py

+        default_node_label="NODE",
+        default_edge_type="RELATIONSHIP",


Josipmrden · 2023-02-14T10:26:10Z

gqlalchemy/transformations/translators/translator.py

+
+        for row in rel_results:
+            row_values = row.values()
+            # print(f"Row values: {row_values}")


delete print

Josipmrden · 2023-02-14T10:28:06Z

pyproject.toml

@@ -32,7 +32,7 @@ exclude = '''
 '''

 [tool.poetry.dependencies]
-python = "^3.7"
+python = "^3.8"


is this on purpose

@katarinasupe do we have any implications on this?

yes it is on purpose because of poe

@Josipmrden I think all should be fine.

vpavicic · 2023-03-09T08:18:58Z

@as51340 - Data from Memgraph can now be imported from and exported to NetworkX, DGL and PyG graph formats.

BorisTasevski and others added 11 commits December 2, 2022 10:09

Added Support for numpy ndarrays and scalars

96bed59

Update CODEOWNERS

370e379

Update CODEOWNERS

ebf0861

[main < ] change Ubuntu and MG version in workflow (#214)

2754b7a

* bump to 2.5 * try 2.2 * change ubuntu to 20.04 * revert to MG 2.3.0 * bump to mg 2.5.0 and update qm signatures

DGL basic exporter

ae0ad49

Added export strategies

c3dde54

Added import strategies

fcd347e

Started adding DGL tests, fixed few bugs

b574fe3

DGL translator done with tests added :rocket

7533fa1

Added PyG tests, NX functional and test skelet

26adae1

Added tests and poetry dependency management

b18eb9d

as51340 requested review from katarinasupe, Josipmrden and brunos252 January 16, 2023 12:27

as51340 marked this pull request as ready for review January 16, 2023 12:27

Added graph_importers and exporters

45c8a37

as51340 self-assigned this Jan 16, 2023

as51340 added the status: ready PR is ready for review label Jan 16, 2023

Added Memgraph connection info + functional programming

e086755

antoniofilipovic suggested changes Jan 18, 2023

View reviewed changes

as51340 added 6 commits January 26, 2023 11:27

Fix PR comments

01a1600

Added new test functionalities, isolated nodes support for nx

bd14970

Torch packages pipeline fix #1

97f1b74

Install pyg fix

47aecea

Fix installation order and test loaders

d8465ea

Debug feather test

06c3fe1

as51340 requested a review from antoniofilipovic January 26, 2023 14:06

as51340 added 2 commits January 26, 2023 15:56

Old path fix

5415e32

Added test files for data loaders

9bba415

Win build

6662c86

Josipmrden requested changes Feb 14, 2023

View reviewed changes

as51340 added 2 commits February 14, 2023 16:09

PR comments

8bd2916

PR fixes for constants

8d15f59

gitbuda added this to the v1.4.0 milestone Mar 3, 2023

antepusic mentioned this pull request Mar 6, 2023

[master < E056-MAGE] Mock graph APIs memgraph/memgraph#757

Merged

5 tasks

as51340 mentioned this pull request Mar 8, 2023

[master < T0136-GA] Import, export gql docs memgraph/docs#693

Merged

7 tasks

Josipmrden approved these changes Mar 9, 2023

View reviewed changes

antoniofilipovic approved these changes Mar 9, 2023

View reviewed changes

brunos252 changed the base branch from main to develop March 9, 2023 13:54

merge develop

26d49f9

brunos252 merged commit f530227 into develop Mar 9, 2023

brunos252 deleted the T541-FL-import-export-strategies branch March 9, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[develop < T541-FL] Import and export strategies #215

[develop < T541-FL] Import and export strategies #215

as51340 commented Jan 5, 2023 •

edited

Loading

as51340 commented Jan 16, 2023

antoniofilipovic left a comment

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

antoniofilipovic Jan 18, 2023

antoniofilipovic Jan 18, 2023

antoniofilipovic Jan 18, 2023

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

antoniofilipovic Jan 18, 2023

as51340 Jan 26, 2023

katarinasupe commented Jan 27, 2023

as51340 commented Jan 27, 2023 •

edited

Loading

Josipmrden left a comment

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

Josipmrden Feb 14, 2023

as51340 Feb 14, 2023

katarinasupe Mar 3, 2023

vpavicic commented Mar 9, 2023

		if features is not None:
		graph.edges[edge_triplet].data[feature_name] = features

		@@ -0,0 +1,68 @@
		# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]

		@@ -0,0 +1,13 @@
		# Copyright (c) 2016-2022 Memgraph Ltd. [https://memgraph.com]

[develop < T541-FL] Import and export strategies #215

[develop < T541-FL] Import and export strategies #215

Conversation

as51340 commented Jan 5, 2023 • edited Loading

Description

Pull request type

Related issues

Checklist:

Reviewer checklist (the reviewer checks this part)

as51340 commented Jan 16, 2023

antoniofilipovic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katarinasupe commented Jan 27, 2023

as51340 commented Jan 27, 2023 • edited Loading

Josipmrden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpavicic commented Mar 9, 2023

as51340 commented Jan 5, 2023 •

edited

Loading

as51340 commented Jan 27, 2023 •

edited

Loading