-
Notifications
You must be signed in to change notification settings - Fork 13
Migration tutorial from RDBMS
using MAGE
modules
#770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
aab63dd
Memgraph 2.17 docs
kgolubic 648f4b4
Docs
kgolubic 992fb22
Merge branch 'main' into memgraph-2-17
kgolubic 6842baa
Merge branch 'main' into memgraph-2-17
kgolubic d4be62f
Merge branch 'main' into memgraph-2-17
kgolubic 7d2cbc3
Merge branch 'main' into memgraph-2-17
kgolubic 16c8157
Zoned datetime data type (#723)
antepusic 6c4a5d4
Revert "Zoned datetime data type (#723)" (#763)
kgolubic f5b98e1
Add migration tutorial from rdbms to memgraph
Josipmrden c989f1a
Add fixes
Josipmrden c4283e1
Fix from PR review
Josipmrden 8841fb2
Merge branch 'memgraph-2-18' into migrating-from-rdbms-using-mage
kgolubic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,284 @@ | ||
--- | ||
title: Migrate from RDBMS to Memgraph directly using MAGE modules | ||
description: Easily transition from RDBMS to Memgraph using MAGE modules. Our detailed documentation makes the migration process straightforward for competent graph computing. | ||
--- | ||
|
||
# Migrate from RDBMS to Memgraph using MAGE modules | ||
|
||
This tutorial will help you import your data from a PostgreSQL database into Memgraph | ||
directly using MAGE query modules. | ||
|
||
This migration tutorial makes migration from an external source to Memgraph possible in one less step | ||
than described in [migrating from a RDBMS to Memgraph using CSV files](/data-migration/migrate-from-rdbms). | ||
Make sure you read both tutorials to see what fits your needs. | ||
|
||
In two of our blog posts, we've covered the [differences between relational and | ||
graph database](https://memgraph.com/blog/graph-database-vs-relational-database) | ||
and outlined the [benefits of graph | ||
databases](https://memgraph.com/blog/the-benefits-of-using-a-graph-database-instead-of-sql). | ||
|
||
## Prerequisites | ||
|
||
To follow along, you will need: | ||
|
||
- **Memgraph MAGE** which includes **Memgraph** and **MAGE**, an advanced graph | ||
algorithms and modules library. To install the Memgraph MAGE and set it up, | ||
please follow the [getting started guide](/getting-started). | ||
- (optional) A running relational database either with your own schema and data | ||
or the data we provided to populate the tables. Here we will be using PostgreSQL, but | ||
you can also choose whatever external source, if it is supported inside | ||
[MAGE available migration algorithms](/advanced-algorithms/available-algorithms/migrate). | ||
|
||
## Data model | ||
|
||
Let's first get acquainted with the model. In our Postgres tables, we will be having `Musicians` | ||
and `Bands`, with their respective properties. | ||
|
||
The `Musicians` table will have an id, name and instrument they play: | ||
```sql | ||
CREATE TABLE Musicians ( | ||
musician_id SERIAL PRIMARY KEY, | ||
name TEXT NOT NULL, | ||
instrument TEXT | ||
); | ||
|
||
INSERT INTO Musicians (name, instrument) VALUES | ||
('John Doe', 'Guitar'), | ||
('Jane Smith', 'Vocals'), | ||
('Alice Johnson', 'Bass'), | ||
('Bob Lee', 'Drums'), | ||
('Charlie Brown', 'Keyboard'), | ||
('Eva Green', 'Guitar'); | ||
``` | ||
|
||
Bands will have the band id, and the name of the band: | ||
```sql | ||
CREATE TABLE Bands ( | ||
band_id SERIAL PRIMARY KEY, | ||
band_name TEXT NOT NULL | ||
); | ||
|
||
INSERT INTO Bands (band_name) VALUES | ||
('The Rockers'), | ||
('The Jazz Masters'), | ||
('The Classical Ensemble'), | ||
('Pop Sensations'); | ||
``` | ||
|
||
And they will be connected with a join table: | ||
```sql | ||
CREATE TABLE BandMemberships ( | ||
musician_id INTEGER, | ||
band_id INTEGER, | ||
PRIMARY KEY (musician_id, band_id), | ||
FOREIGN KEY (musician_id) REFERENCES Musicians (musician_id) ON DELETE CASCADE, | ||
FOREIGN KEY (band_id) REFERENCES Bands (band_id) ON DELETE CASCADE | ||
); | ||
|
||
INSERT INTO BandMemberships (musician_id, band_id) VALUES | ||
(1, 1), -- John Doe is a guitarist in "The Rockers" | ||
(2, 1), -- Jane Smith is a vocalist in "The Rockers" | ||
(3, 1), -- Alice Johnson is a bassist in "The Rockers" | ||
(4, 1), -- Bob Lee is a drummer in "The Rockers" | ||
(2, 2), -- Jane Smith is also a vocalist in "The Jazz Masters" | ||
(5, 2), -- Charlie Brown plays keyboard in "The Jazz Masters" | ||
(6, 4), -- Eva Green plays guitar in "Pop Sensations" | ||
(2, 4); -- Jane Smith also sings in "Pop Sensations" | ||
``` | ||
|
||
hal-eisen-MG marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Now we have populated all of the example data for the tutorial. | ||
|
||
The exercise is to migrate `Bands` and `Musicians` rows as nodes, and the mapping table as | ||
relationships. The steps below are provided so you can quickly glance at them at any time: | ||
|
||
1. Create necessary indices. Indices will increase performance when creating relationships between the nodes. | ||
2. Switch to in-memory analytical mode (to eliminate memory overhead) | ||
3. Import nodes from corresponding tables | ||
4. Import relationships from corresponding tables | ||
5. Switch back to in-memory transactional mode | ||
|
||
### 1. Create necessary indices | ||
In SQL databases, indices are mostly created after the import has finished, as they can slow down | ||
importing times. However, in Memgraph it's necesssary to create indices beforehand. The reason for this | ||
is the relationship import, which needs to find 2 nodes (start node and end node) as its source and | ||
destination. For node import, if you're doing `CREATE` commands, it's irrelevant. However, if you're | ||
executing `MERGE` commands, it will still scan over all the nodes, so use corresponding indices in order | ||
to make the scanning faster. | ||
|
||
We will import only the necessary indices: | ||
```cypher | ||
CREATE INDEX ON :Musician(id); | ||
CREATE INDEX ON :Band(id); | ||
``` | ||
|
||
### 2. Switch to in-memory analytical mode | ||
Memgraph currently offers 2 in-memory modes, transactional and analytical. Transactional is often used in | ||
production deployments. It has all the [ACID guarantees](https://en.wikipedia.org/wiki/ACID) | ||
and it's safe to use for transactional workloads. | ||
However, when executing import for the first time and testing, it is beneficial to use in-memory analytical | ||
mode. | ||
|
||
In-memory analytical mode offers less memory overhead than transactional mode. In-memory analytical mode | ||
is highly parallelizable when importing nodes and relationships in the way we will see below. However, it does | ||
not come with write-ahead logging, and it is therefore unable to make ACID guarantees. More information | ||
about analytical mode can be found on our [page about storage modes](/fundamentals/storage-memory-usage). | ||
|
||
In-memory storage mode is set with the following command: | ||
```cypher | ||
STORAGE MODE IN_MEMORY_ANALYTICAL; | ||
``` | ||
|
||
In the next two sections, we will provide you with different ways of migrating from an external data source. | ||
One will be migrating the whole table from the RDBMS to Memgraph, and the other will be by issuing a query. | ||
|
||
### 3. Import nodes from corresponding tables | ||
The [migrate module](/advanced-algorithms/available-algorithms/migrate) in MAGE has an easy API how to migrate rows from a relational database to Memgraph. We | ||
can inspect the table with the following query: | ||
|
||
```cypher | ||
CALL migrate.postgresql('Musicians' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
RETURN row.musician_id as id, row.name as name, row.instrument as instrument; | ||
``` | ||
|
||
On the picture we can see that the module has correctly extracted the columns from the table. | ||
 | ||
|
||
The `YIELD` keyword will create the `row` objects, which are essentially a map which | ||
has the column name as the key, and the column value as the value. We can iterate over the row objects with | ||
the Cypher language to populate the Memgraph database. | ||
|
||
```cypher | ||
CALL migrate.postgresql('Musicians' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
CREATE (n:Musician {id: row.musician_id, name: row.name, instrument: row.instrument}); | ||
``` | ||
|
||
Similarly, we can inspect the returned rows from the `Bands` table: | ||
```cypher | ||
CALL migrate.postgresql('Bands' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
RETURN row.band_id AS band_id, row.band_name AS name; | ||
``` | ||
|
||
 | ||
|
||
When we're happy with the returned columns, we can create the relationships: | ||
```cypher | ||
CALL migrate.postgresql('Bands' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
CREATE (n:Band {id: row.band_id, name: row.band_name}); | ||
``` | ||
|
||
After we have migrated all the nodes, by issuing a simple query to match all the nodes, we get the following | ||
disconnected graph. | ||
```cypher | ||
MATCH (n) RETURN n; | ||
``` | ||
|
||
 | ||
|
||
### 4. Import relationships from corresponding tables | ||
We can not do any graph use cases without connections between the nodes, so in this section, we will update the relationships. | ||
However, in the first argument we will provide a query that selects columns from the table. | ||
|
||
```cypher | ||
CALL migrate.postgresql('SELECT * FROM BandMemberships' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
RETURN row.band_id AS band_id, row.musician_id AS musician_id; | ||
``` | ||
|
||
 | ||
|
||
This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a subset | ||
of them. Although you can omit creation of properties when the row is received, this approach consumes less memory on | ||
the row level. We can create the relationships by matching the source and destination node we migrated previously. | ||
The indices will play a big role, as the graph can be connected effectively and with optimal performance. | ||
|
||
```cypher | ||
CALL migrate.postgresql('SELECT * FROM BandMemberships' , { | ||
user:'postgres', | ||
password:'xxxxxx', | ||
host:'localhost', | ||
database:'postgresdb' | ||
}) | ||
YIELD row | ||
MATCH (m:Musician {id: row.musician_id}), (b:Band {id: row.band_id}) | ||
CREATE (m)-[:PERFORMS_IN]->(b); | ||
``` | ||
|
||
After migrating the relationships, we can see how our graph looks like: | ||
|
||
```cypher | ||
MATCH (m)-[p]->(b) RETURN m, p, b; | ||
``` | ||
|
||
 | ||
|
||
### 5. Switch back to in-memory transactional mode | ||
If we want to resume the transactional guarantees, we can do so by switching back from | ||
analytical to transactional mode with the query: | ||
```cypher | ||
STORAGE MODE IN_MEMORY_TRANSACTIONAL; | ||
``` | ||
|
||
|
||
## Benefits of direct migration from RDBMS to Memgraph | ||
We have shown how Memgraph can, with the easy integration of MAGE modules, migrate from an external | ||
source to its graph storage in a matter of few queries. That offers a significant improvement in the | ||
ease of use than the CSV import we have provided previously. Users are encouraged to see for themselves | ||
which method suits them better. | ||
|
||
## Frequently asked questions | ||
|
||
### What are some of the best practices to lower memory when migrating data? | ||
We have already mentioned the in-memory analytical mode that has less memory overhead since it | ||
strips Memgraph of some ACID guarantees. | ||
|
||
We will look at some other techniques which we use to lower memory: | ||
1. Omitting unnecessary columns when migrating | ||
2. Appropriate data type conversion | ||
|
||
#### 1. Ommitting unnecessary columns when migrating | ||
Most users will want to migrate full rows to Memgraph, which can sometimes be inefficient. | ||
The columns in Postgres are meant to be stored on | ||
disk, and therefore users can add new columns which can be big in size. This mainly is a concern | ||
for unnecessary date strings and urls. | ||
|
||
This can be optimized by processing the minimal set of necessary columns. Some of the columns which would | ||
be a good choice to migrate to Memgraph are those which will aid in graph matching, filtering, and traversing | ||
of the graph. Try to avoid list and map properties if there is no valid use case to keep them in | ||
Memgraph, as they take more size other data types. | ||
|
||
#### 2. Appropriate data type conversion | ||
When receiving rows from RDBMS to Memgraph, be sure that the appropriate types are utilized. That namely | ||
is concerning converting strings to integers (`ToInteger`), floats (`ToFloat`), | ||
booleans (`ToBoolean`), or date-time objects (`date`, `localdatetime`, and `duration` constructors). For functions | ||
that do the conversions between strings and other types, please check the | ||
pages for [functions](/querying/functions), as well | ||
as [temporal types](/fundamentals/data-types#temporal-types). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+92.4 KB
...migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-bands-row.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+351 KB
...ion/migrate-from-rdbms-directly/migrate-from-rdbms-directly-connected-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+107 KB
...ation/migrate-from-rdbms-directly/migrate-from-rdbms-directly-musicians-row.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+189 KB
...migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-raw-nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+85.1 KB
...n/migrate-from-rdbms-directly/migrate-from-rdbms-directly-relationships-row.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.