-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration tutorial from RDBMS
using MAGE
modules
#770
Conversation
* Document how zoned datetime is supported * Fix formatting * Update the Differences in Cypher implementation page * Update min/max functions
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@Josipmrden is this PR ready for review? I'm asking since it is missing the |
# Migrate from RDBMS to Memgraph using MAGE modules | ||
|
||
This tutorial will help you import your data from a PostgreSQL database into Memgraph | ||
directly, using MAGE query modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comma
The exercise is to migrate `Bands` and `Musicians` rows as nodes, and the mapping table as | ||
relationships. The steps below are provided so you can quickly glance at them at any time: | ||
|
||
1. Create necessary indices (important for relationship import) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say something about speed or throughput
2. Switch to in-memory analytical mode (to eliminate memory overhead) | ||
3. Import nodes from corresponding tables | ||
4. Import relationships from corresponding tables | ||
5. Switch back to in-memory analytical mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, we switched to analytical in step 2. Do you mean switch back to transactional?
|
||
### 1. Create necessary indices | ||
In SQL databases, indices are mostly created after the import has finished, as they can slow down | ||
importing times. However, in Memgraph it's necesssary to create indices beforehand. Reason for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Reason/The reason
|
||
### 2. Switch to in-memory analytical mode | ||
Memgraph currently offers 2 in-memory modes, transactional and analytical. Transactional is often used in | ||
production deployments. It has all the ACID guarantees and it's safe to use for transactional workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link from text ACID to https://en.wikipedia.org/wiki/ACID
However, when executing import for the first time and testing, it is beneficial to use in-memory analytical | ||
mode. | ||
|
||
In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "also"
|
||
In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode | ||
is highly parallelizable when importing nodes and relationships in the way we will see below. However, it does | ||
not come with write-ahead logging, and it is therefore stripped of the ACID guarantees. More information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/stripped of/unable to make
CREATE (n:Musician {id: row.musician_id, name: row.name, instrument: row.instrument}); | ||
``` | ||
|
||
We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the word "iterate" somewhere in here, so the reader understands that all of the rows will be processed one by one.
|
||
We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which | ||
has the column name as the key, and the column value as the value. | ||
If we just issue the queries to return us the rows, so we can inspect the values, we can unwrap the maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank space above this line to start a new paragraph which separates the two ideas of migrating versus inspecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are inconsistent about the order. Below, we inspect and then migrate. Here we migrate and then inspect. This is confusing.
![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-raw-nodes.png) | ||
|
||
### 4. Import relationships from corresponding tables | ||
We can not do any graph use cases without connections, so in this section, we will update the relationships. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/without connections/without connections between the nodes
2. Appropriate data type conversion | ||
|
||
#### 1. Ommitting unnecessary columns when migrating | ||
Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most users will want to migrate full rows to Memgraph, which can sometimes be inefficient.
|
||
#### 1. Ommitting unnecessary columns when migrating | ||
Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on | ||
disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be optimized by processing the minimal set of necessary columns.
disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern | ||
for unnecessary date strings, and urls. | ||
|
||
For graph use cases, you would want to keep the number of columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid the 2nd person pronoun of "you" as it is too informal.
#### 2. Appropriate data type conversion | ||
When receiving rows from RDBMS to Memgraph, be sure that the appropriate types are utilized. That namely | ||
is concerning converting strings to integers (`ToInteger`), floats (`ToFloat`), | ||
booleans (`ToBoolean`), or date-time objects (`date()`, `localdatetime()`, and others.). For functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra period (.)
All the parenthesis look awkward. Consider making a table.
not come with write-ahead logging, and it is therefore unable to make ACID guarantees. More information | ||
about analytical mode can be found on our [page about storage modes](/fundamentals/storage-memory-usage). | ||
|
||
In-memory storage mode is enabled with the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/enabled/set
``` | ||
|
||
In the next two sections, we will provide you with different ways of migrating from an external data source. | ||
One will be migrating the whole table from the RDBMS to Memgraph, and other will be by issuing a query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and other/and the other
|
||
### 3. Import nodes from corresponding tables | ||
The [migrate module](/advanced-algorithms/available-algorithms/migrate) in MAGE has an easy API how to migrate rows from a relational database to Memgraph. We | ||
can migrate and inspect the table with the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "migrate and"
|
||
The `YIELD` keyword will create the `row` objects, which are essentially a map which | ||
has the column name as the key, and the column value as the value. We can iterate over the row objects with | ||
the Cypher language to continue populating the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/continue populating the database/populate the Memgraph database
|
||
This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction | ||
of them. Although you can omit creation of properties when the row is received, this approach consumes less memory on | ||
the row level. We can create the relationships by matching the source and destination node we migrated previously. Now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "Now,"
|
||
![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-relationships-row.png) | ||
|
||
This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fraction/subset
After migrating the relationships, we can see how our graph looks like: | ||
|
||
```cypher | ||
MATCH (n)-[r]->(m) RETURN n, r, m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using generic variables such as "n" and "m".
Prefer "m" for musician and "b" for band, as above.
![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly/connected-graph.png) | ||
|
||
### 5. Switch back to in-memory transactional mode | ||
If we want to give back the transactional guarantees, we can do so by switching back from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/give back/resume
disk, and therefore users can add new columns which can be big in size. This mainly is a concern | ||
for unnecessary date strings and urls. | ||
|
||
This can be optimized by processing the minimal set of necessary columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say more to help the user decide which columns are necessary. Perhaps something about thinking about which graph traversal algorithms are going to be used, and the inputs they need.
@kgolubic please resolve this merge in whichever way you like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks Ok. I'll fix the merge conflict.
* Docs improvements (#799) * Add NuRaft log file flag (#816) * Add health checks docs (#833) * Update HA examples * Add NuRaft log file env (#848) * Add leader/follower role (#857) Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com> * Add coordinator hostname HA configuration option (#860) * First leader * Update pages/clustering/high-availability.mdx Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com> --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com> * Add monitoring at runtime docs (#806) * database-less connections update (#853) * Update HA docs * Add durability for coordinator (#859) * add durability docs * Update high-availability.mdx --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * Edge type property index (#855) * Add hops limit docs (#881) * add docs for hops limit * revert accidentally changed part of the docs * fix grammar mistakes * implement suggestions * Peak memory observation in the system (#788) * Peak memory observation in the system * Update storage info where needed * Update best practics * Add information from PR review --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * removed flags (#854) * Update envs for HA (#878) * fix envs * add 3 missing envs * Update configuration.mdx * Update configuration.mdx --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * Add callout for important force reset notice (#877) * add callout for instance to be alive on force reset * Update high-availability.mdx * Enum datatype (v1) (#852) * Add information around enums * Add ALTER to the list of clauses * Add ALTER to the list of Memgraph's Cypher extension * Fix incorrect ALTER command * Update pages/querying/clauses/create.mdx * Update pages/querying/clauses/alter.mdx * Update pages/querying/clauses/alter.mdx * Update graph-modeling.md * Address PR comments --------- Co-authored-by: kgolubic <kgolubic@gmail.com> Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * SSO Core - OIDC + OAuth2.0 docs (#876) * wip oidc docs * oidc sso docs * PR changes and update * update package info * add startup info * Update v2.18 experimental feature statuses (#871) * Update v2.18 experimental feature statuses * Join the sentances into paragraphs * Update language and formatting --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> Co-authored-by: kgolubic <kgolubic@gmail.com> * Add postgresql migration docs (#867) * Add postgresql migration docs * Fix from PR review * Single-sign on: general and SAML documentation (#884) * Update the existing auth flags for v2.18 * Add a usage note for streams and triggers created by SSO users * Update general (other than the SAML modules) auth functionalities * Reorganize auth documentation to accommodate SSO * Add SAML SSO docs (partial) * Complete SAML docs * Apply review suggestions * Add remaining review suggestions * Move OIDC docs next to SAML (since both are SSO) * Fix trigger/stream × SSO usage note placement * Remove duplicate entry and fix section tag * Move section to the right place * Apply suggestions * Update direct download paths (#885) * Migration tutorial from `RDBMS` using `MAGE` modules (#770) * Memgraph 2.17 docs * Docs * Zoned datetime data type (#723) * Document how zoned datetime is supported * Fix formatting * Update the Differences in Cypher implementation page * Update min/max functions * Revert "Zoned datetime data type (#723)" (#763) This reverts commit 16c8157. * Add migration tutorial from rdbms to memgraph * Add fixes * Fix from PR review --------- Co-authored-by: kgolubic <kgolubic@gmail.com> Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> Co-authored-by: Ante Pušić <ante.pusic@memgraph.io> * new: Add query sharing docs and update ENV variables for Lab (#865) * new: Add a complete list of ENV variables * new: Add query-sharing feature page * Fix styling issues * Update query-sharing.mdx * new: Add more details for the query sharing * Add images * Insert images * Insert images * Fix paths --------- Co-authored-by: kgolubic <kgolubic@gmail.com> Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * Added description how each user can change its own password (#819) * Added description how each user can change its own password * Update users.mdx * Update users.mdx --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * Add SSO docs (#797) * Add docs for OAuth setup (WIP) * Add docs for Okta OAuth * Replace references to OAuth with OIDC * Update single-sign-on.mdx * new: Add SAML for Entra and Okta --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> Co-authored-by: Toni Lastre <toni.lastre@memgraph.io> * Create release notes for Memgraph 2.18 (#795) * Update release notes * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update RN * Update PR * Update RN * new: Add Lab 2.15 release notes (#866) * new: Add Lab 2.15 release notes * Update pages/release-notes.mdx --------- Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com> * Update RN * Update RN --------- Co-authored-by: Toni <toni.lastre@memgraph.io> * Remove support for Amazon Linux 2, CentOS 7 and RedHat 8 * Linked MAGE docs * Update release notes * Add callout for Breaking changes * Add PR link * Update Release notes * Update pages/database-management/authentication-and-authorization/auth-system-integrations.mdx Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com> * Fix indentation --------- Co-authored-by: Andi <andi8647@gmail.com> Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com> Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com> Co-authored-by: Josipmrden <josip.mrden@memgraph.io> Co-authored-by: andrejtonev <29177572+andrejtonev@users.noreply.github.com> Co-authored-by: Aidar Samerkhanov <aidar.samerkhanov@memgraph.io> Co-authored-by: David Ivekovic <david.ivekovic@memgraph.io> Co-authored-by: Gareth Andrew Lloyd <gareth@ignition-web.co.uk> Co-authored-by: Ivan Milinović <44698587+imilinovic@users.noreply.github.com> Co-authored-by: Marko Budiselić <marko.budiselic@memgraph.com> Co-authored-by: Ante Pušić <ante.pusic@memgraph.io> Co-authored-by: Toni <toni.lastre@memgraph.io> Co-authored-by: tonijurjevic96 <168409767+tonijurjevic96@users.noreply.github.com> Co-authored-by: David <davidlozic@gmail.com>
Description
New tutorial has been made which migrates from Postgres to Memgraph (basically any RDBMS to Memgraph).
Pull request type
Please check what kind of PR this is:
Related PRs and issues
PR this doc page is related to:
(especially necessary if the PR is related to a release)
Closes:
(paste the link to the issue it closes)
Checklist: