Skip to content

Migration tutorial from RDBMS using MAGE modules #770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 3, 2024
3 changes: 2 additions & 1 deletion pages/data-migration/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
"json": "JSON",
"cypherl": "CYPHERL",
"migrate-from-neo4j": "Migrate from Neo4j",
"migrate-from-rdbms": "Migrate from RDBMS",
"migrate-from-rdbms": "Migrate from RDBMS using CSV files",
"migrate-from-rdbms-directly": "Migrate from RDBMS using MAGE modules",
"migrate-memgraph-platform": "Migrate from Memgraph Platform to Memgraph MAGE",
"export-data": "Export data"
}
284 changes: 284 additions & 0 deletions pages/data-migration/migrate-from-rdbms-directly.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
---
title: Migrate from RDBMS to Memgraph directly using MAGE modules
description: Easily transition from RDBMS to Memgraph using MAGE modules. Our detailed documentation makes the migration process straightforward for competent graph computing.
---

# Migrate from RDBMS to Memgraph using MAGE modules

This tutorial will help you import your data from a PostgreSQL database into Memgraph
directly using MAGE query modules.

This migration tutorial makes migration from an external source to Memgraph possible in one less step
than described in [migrating from a RDBMS to Memgraph using CSV files](/data-migration/migrate-from-rdbms).
Make sure you read both tutorials to see what fits your needs.

In two of our blog posts, we've covered the [differences between relational and
graph database](https://memgraph.com/blog/graph-database-vs-relational-database)
and outlined the [benefits of graph
databases](https://memgraph.com/blog/the-benefits-of-using-a-graph-database-instead-of-sql).

## Prerequisites

To follow along, you will need:

- **Memgraph MAGE** which includes **Memgraph** and **MAGE**, an advanced graph
algorithms and modules library. To install the Memgraph MAGE and set it up,
please follow the [getting started guide](/getting-started).
- (optional) A running relational database either with your own schema and data
or the data we provided to populate the tables. Here we will be using PostgreSQL, but
you can also choose whatever external source, if it is supported inside
[MAGE available migration algorithms](/advanced-algorithms/available-algorithms/migrate).

## Data model

Let's first get acquainted with the model. In our Postgres tables, we will be having `Musicians`
and `Bands`, with their respective properties.

The `Musicians` table will have an id, name and instrument they play:
```sql
CREATE TABLE Musicians (
musician_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
instrument TEXT
);

INSERT INTO Musicians (name, instrument) VALUES
('John Doe', 'Guitar'),
('Jane Smith', 'Vocals'),
('Alice Johnson', 'Bass'),
('Bob Lee', 'Drums'),
('Charlie Brown', 'Keyboard'),
('Eva Green', 'Guitar');
```

Bands will have the band id, and the name of the band:
```sql
CREATE TABLE Bands (
band_id SERIAL PRIMARY KEY,
band_name TEXT NOT NULL
);

INSERT INTO Bands (band_name) VALUES
('The Rockers'),
('The Jazz Masters'),
('The Classical Ensemble'),
('Pop Sensations');
```

And they will be connected with a join table:
```sql
CREATE TABLE BandMemberships (
musician_id INTEGER,
band_id INTEGER,
PRIMARY KEY (musician_id, band_id),
FOREIGN KEY (musician_id) REFERENCES Musicians (musician_id) ON DELETE CASCADE,
FOREIGN KEY (band_id) REFERENCES Bands (band_id) ON DELETE CASCADE
);

INSERT INTO BandMemberships (musician_id, band_id) VALUES
(1, 1), -- John Doe is a guitarist in "The Rockers"
(2, 1), -- Jane Smith is a vocalist in "The Rockers"
(3, 1), -- Alice Johnson is a bassist in "The Rockers"
(4, 1), -- Bob Lee is a drummer in "The Rockers"
(2, 2), -- Jane Smith is also a vocalist in "The Jazz Masters"
(5, 2), -- Charlie Brown plays keyboard in "The Jazz Masters"
(6, 4), -- Eva Green plays guitar in "Pop Sensations"
(2, 4); -- Jane Smith also sings in "Pop Sensations"
```

Now we have populated all of the example data for the tutorial.

The exercise is to migrate `Bands` and `Musicians` rows as nodes, and the mapping table as
relationships. The steps below are provided so you can quickly glance at them at any time:

1. Create necessary indices. Indices will increase performance when creating relationships between the nodes.
2. Switch to in-memory analytical mode (to eliminate memory overhead)
3. Import nodes from corresponding tables
4. Import relationships from corresponding tables
5. Switch back to in-memory transactional mode

### 1. Create necessary indices
In SQL databases, indices are mostly created after the import has finished, as they can slow down
importing times. However, in Memgraph it's necesssary to create indices beforehand. The reason for this
is the relationship import, which needs to find 2 nodes (start node and end node) as its source and
destination. For node import, if you're doing `CREATE` commands, it's irrelevant. However, if you're
executing `MERGE` commands, it will still scan over all the nodes, so use corresponding indices in order
to make the scanning faster.

We will import only the necessary indices:
```cypher
CREATE INDEX ON :Musician(id);
CREATE INDEX ON :Band(id);
```

### 2. Switch to in-memory analytical mode
Memgraph currently offers 2 in-memory modes, transactional and analytical. Transactional is often used in
production deployments. It has all the [ACID guarantees](https://en.wikipedia.org/wiki/ACID)
and it's safe to use for transactional workloads.
However, when executing import for the first time and testing, it is beneficial to use in-memory analytical
mode.

In-memory analytical mode offers less memory overhead than transactional mode. In-memory analytical mode
is highly parallelizable when importing nodes and relationships in the way we will see below. However, it does
not come with write-ahead logging, and it is therefore unable to make ACID guarantees. More information
about analytical mode can be found on our [page about storage modes](/fundamentals/storage-memory-usage).

In-memory storage mode is set with the following command:
```cypher
STORAGE MODE IN_MEMORY_ANALYTICAL;
```

In the next two sections, we will provide you with different ways of migrating from an external data source.
One will be migrating the whole table from the RDBMS to Memgraph, and the other will be by issuing a query.

### 3. Import nodes from corresponding tables
The [migrate module](/advanced-algorithms/available-algorithms/migrate) in MAGE has an easy API how to migrate rows from a relational database to Memgraph. We
can inspect the table with the following query:

```cypher
CALL migrate.postgresql('Musicians' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
RETURN row.musician_id as id, row.name as name, row.instrument as instrument;
```

On the picture we can see that the module has correctly extracted the columns from the table.
![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-musicians-row.png)

The `YIELD` keyword will create the `row` objects, which are essentially a map which
has the column name as the key, and the column value as the value. We can iterate over the row objects with
the Cypher language to populate the Memgraph database.

```cypher
CALL migrate.postgresql('Musicians' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
CREATE (n:Musician {id: row.musician_id, name: row.name, instrument: row.instrument});
```

Similarly, we can inspect the returned rows from the `Bands` table:
```cypher
CALL migrate.postgresql('Bands' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
RETURN row.band_id AS band_id, row.band_name AS name;
```

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-bands-row.png)

When we're happy with the returned columns, we can create the relationships:
```cypher
CALL migrate.postgresql('Bands' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
CREATE (n:Band {id: row.band_id, name: row.band_name});
```

After we have migrated all the nodes, by issuing a simple query to match all the nodes, we get the following
disconnected graph.
```cypher
MATCH (n) RETURN n;
```

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-raw-nodes.png)

### 4. Import relationships from corresponding tables
We can not do any graph use cases without connections between the nodes, so in this section, we will update the relationships.
However, in the first argument we will provide a query that selects columns from the table.

```cypher
CALL migrate.postgresql('SELECT * FROM BandMemberships' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
RETURN row.band_id AS band_id, row.musician_id AS musician_id;
```

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-relationships-row.png)

This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a subset
of them. Although you can omit creation of properties when the row is received, this approach consumes less memory on
the row level. We can create the relationships by matching the source and destination node we migrated previously.
The indices will play a big role, as the graph can be connected effectively and with optimal performance.

```cypher
CALL migrate.postgresql('SELECT * FROM BandMemberships' , {
user:'postgres',
password:'xxxxxx',
host:'localhost',
database:'postgresdb'
})
YIELD row
MATCH (m:Musician {id: row.musician_id}), (b:Band {id: row.band_id})
CREATE (m)-[:PERFORMS_IN]->(b);
```

After migrating the relationships, we can see how our graph looks like:

```cypher
MATCH (m)-[p]->(b) RETURN m, p, b;
```

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly/connected-graph.png)

### 5. Switch back to in-memory transactional mode
If we want to resume the transactional guarantees, we can do so by switching back from
analytical to transactional mode with the query:
```cypher
STORAGE MODE IN_MEMORY_TRANSACTIONAL;
```


## Benefits of direct migration from RDBMS to Memgraph
We have shown how Memgraph can, with the easy integration of MAGE modules, migrate from an external
source to its graph storage in a matter of few queries. That offers a significant improvement in the
ease of use than the CSV import we have provided previously. Users are encouraged to see for themselves
which method suits them better.

## Frequently asked questions

### What are some of the best practices to lower memory when migrating data?
We have already mentioned the in-memory analytical mode that has less memory overhead since it
strips Memgraph of some ACID guarantees.

We will look at some other techniques which we use to lower memory:
1. Omitting unnecessary columns when migrating
2. Appropriate data type conversion

#### 1. Ommitting unnecessary columns when migrating
Most users will want to migrate full rows to Memgraph, which can sometimes be inefficient.
The columns in Postgres are meant to be stored on
disk, and therefore users can add new columns which can be big in size. This mainly is a concern
for unnecessary date strings and urls.

This can be optimized by processing the minimal set of necessary columns. Some of the columns which would
be a good choice to migrate to Memgraph are those which will aid in graph matching, filtering, and traversing
of the graph. Try to avoid list and map properties if there is no valid use case to keep them in
Memgraph, as they take more size other data types.

#### 2. Appropriate data type conversion
When receiving rows from RDBMS to Memgraph, be sure that the appropriate types are utilized. That namely
is concerning converting strings to integers (`ToInteger`), floats (`ToFloat`),
booleans (`ToBoolean`), or date-time objects (`date`, `localdatetime`, and `duration` constructors). For functions
that do the conversions between strings and other types, please check the
pages for [functions](/querying/functions), as well
as [temporal types](/fundamentals/data-types#temporal-types).
2 changes: 1 addition & 1 deletion pages/data-migration/migrate-from-rdbms.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import { Callout } from 'nextra/components'
import { Steps } from 'nextra/components'
import { Tabs, Tab } from 'nextra/components'

# Migrate from RDBMS to Memgraph
# Migrate from RDBMS to Memgraph using CSV files

This tutorial will help you import your data from a MySQL database into Memgraph
using CSV files.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.