Migration tutorial from `RDBMS` using `MAGE` modules #770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

kgolubic merged 12 commits into memgraph-2-18 from migrating-from-rdbms-using-mage

Jul 3, 2024

Contributor

Josipmrden commented May 15, 2024

Description

New tutorial has been made which migrates from Postgres to Memgraph (basically any RDBMS to Memgraph).

Pull request type

Please check what kind of PR this is:

Fix or improvement of an existing page
New documentation page, release related

Related PRs and issues

PR this doc page is related to:
(especially necessary if the PR is related to a release)

Closes:
(paste the link to the issue it closes)

Checklist:

Check all content with Grammarly
Perform a self-review of my code
Make corresponding changes to the rest of the documentation (consult with the DX team)
The build passes locally
My changes generate no new warnings or errors
Add a corresponding label
If release-related, add a product and version label
If release-related, add release note on product PR

kgolubic and others added 9 commits

April 11, 2024 09:17


          Memgraph 2.17 docs

aab63dd


          Docs

648f4b4


          Merge branch 'main' into memgraph-2-17

992fb22


          Merge branch 'main' into memgraph-2-17

6842baa


          Merge branch 'main' into memgraph-2-17

d4be62f


          Merge branch 'main' into memgraph-2-17

7d2cbc3


          Zoned datetime data type (#723)

16c8157

* Document how zoned datetime is supported

* Fix formatting

* Update the Differences in Cypher implementation page

* Update min/max functions


          Revert "Zoned datetime data type (#723)" (#763)

6c4a5d4

This reverts commit 16c8157.


          Add migration tutorial from rdbms to memgraph

f5b98e1

Josipmrden added the priority: medium (missing info) label

Josipmrden self-assigned this

Josipmrden requested a review from kgolubic as a code owner

May 15, 2024 12:20

vercel bot commented May 15, 2024 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 3, 2024 7:17am

vercel bot deployed to Preview

May 15, 2024 12:23

View deployment

Contributor

kgolubic commented May 29, 2024

@Josipmrden is this PR ready for review? I'm asking since it is missing the Ready label and you have asked for my review.

kgolubic requested a review from hal-eisen-MG

June 10, 2024 08:20

Josipmrden added the status: ready label

kgolubic mentioned this pull request

Add postgresql migration memgraph/mage#464

Merged

19 tasks

kgolubic changed the base branch from main to memgraph-2-18

June 17, 2024 08:10

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              #  Migrate from RDBMS to Memgraph using MAGE modules
+              This tutorial will help you import your data from a PostgreSQL database into Memgraph
+              directly, using MAGE query modules.

Contributor

hal-eisen-MG Jun 17, 2024

Remove comma

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Show resolved Hide resolved

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              The exercise is to migrate `Bands` and `Musicians` rows as nodes, and the mapping table as
+              relationships. The steps below are provided so you can quickly glance at them at any time:
+. Create necessary indices (important for relationship import)

Contributor

hal-eisen-MG Jun 17, 2024

Say something about speed or throughput

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+. Switch to in-memory analytical mode (to eliminate memory overhead)
+. Import nodes from corresponding tables
+. Import relationships from corresponding tables
+. Switch back to in-memory analytical mode

Contributor

hal-eisen-MG Jun 17, 2024

Uh, we switched to analytical in step 2. Do you mean switch back to transactional?

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ### 1. Create necessary indices
+              In SQL databases, indices are mostly created after the import has finished, as they can slow down
+              importing times. However, in Memgraph it's necesssary to create indices beforehand. Reason for this

Contributor

hal-eisen-MG Jun 17, 2024

s/Reason/The reason

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ### 2. Switch to in-memory analytical mode
+              Memgraph currently offers 2 in-memory modes, transactional and analytical. Transactional is often used in
+              production deployments. It has all the ACID guarantees and it's safe to use for transactional workloads.

Contributor

hal-eisen-MG Jun 17, 2024

Link from text ACID to https://en.wikipedia.org/wiki/ACID

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              However, when executing import for the first time and testing, it is beneficial to use in-memory analytical
+              mode.
+              In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode

Contributor

hal-eisen-MG Jun 17, 2024

Remove "also"

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode
+              is highly parallelizable when importing nodes and relationships in the way we will see below. However, it does
+              not come with write-ahead logging, and it is therefore stripped of the ACID guarantees. More information

Contributor

hal-eisen-MG Jun 17, 2024

s/stripped of/unable to make

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              CREATE (n:Musician {id: row.musician_id, name: row.name, instrument: row.instrument});
+              ```
+              We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which

Contributor

hal-eisen-MG Jun 17, 2024

Please use the word "iterate" somewhere in here, so the reader understands that all of the rows will be processed one by one.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which
+              has the column name as the key, and the column value as the value.
+              If we just issue the queries to return us the rows, so we can inspect the values, we can unwrap the maps

Contributor

hal-eisen-MG Jun 17, 2024

Add a blank space above this line to start a new paragraph which separates the two ideas of migrating versus inspecting.

Contributor

hal-eisen-MG Jun 17, 2024

We are inconsistent about the order. Below, we inspect and then migrate. Here we migrate and then inspect. This is confusing.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-raw-nodes.png)
+              ### 4. Import relationships from corresponding tables
+              We can not do any graph use cases without connections, so in this section, we will update the relationships.

Contributor

hal-eisen-MG Jun 17, 2024

s/without connections/without connections between the nodes

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+. Appropriate data type conversion
+              ####  1. Ommitting unnecessary columns when migrating
+              Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on

Contributor

hal-eisen-MG Jun 17, 2024

Most users will want to migrate full rows to Memgraph, which can sometimes be inefficient.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ####  1. Ommitting unnecessary columns when migrating
+              Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on
+              disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern

Contributor

hal-eisen-MG Jun 17, 2024

This can be optimized by processing the minimal set of necessary columns.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern
+              for unnecessary date strings, and urls.
+              For graph use cases, you would want to keep the number of columns

Contributor

hal-eisen-MG Jun 17, 2024

Avoid the 2nd person pronoun of "you" as it is too informal.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              #### 2. Appropriate data type conversion
+              When receiving rows from RDBMS to Memgraph, be sure that the appropriate types are utilized. That namely
+              is concerning converting strings to integers (`ToInteger`), floats (`ToFloat`),
+              booleans (`ToBoolean`), or date-time objects (`date()`, `localdatetime()`, and others.). For functions

Contributor

hal-eisen-MG Jun 17, 2024

Remove extra period (.)
All the parenthesis look awkward. Consider making a table.

kgolubic reviewed

View reviewed changes

pages/data-migration/_meta.json Show resolved Hide resolved

kgolubic added the future release label


          Add fixes

c989f1a

Josipmrden requested review from kgolubic and hal-eisen-MG

June 24, 2024 15:18

vercel bot deployed to Preview

June 24, 2024 15:22

View deployment

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              not come with write-ahead logging, and it is therefore unable to make ACID guarantees. More information
+              about analytical mode can be found on our [page about storage modes](/fundamentals/storage-memory-usage).
+              In-memory storage mode is enabled with the following command:

Contributor

hal-eisen-MG Jun 25, 2024

s/enabled/set

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ```
+              In the next two sections, we will provide you with different ways of migrating from an external data source.
+              One will be migrating the whole table from the RDBMS to Memgraph, and other will be by issuing a query.

Contributor

hal-eisen-MG Jun 25, 2024

s/and other/and the other

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ### 3. Import nodes from corresponding tables
+              The [migrate module](/advanced-algorithms/available-algorithms/migrate) in MAGE has an easy API how to migrate rows from a relational database to Memgraph. We
+              can migrate and inspect the table with the following query:

Contributor

hal-eisen-MG Jun 25, 2024

Remove "migrate and"

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              The `YIELD` keyword will create the `row` objects, which are essentially a map which
+              has the column name as the key, and the column value as the value. We can iterate over the row objects with
+              the Cypher language to continue populating the database.

Contributor

hal-eisen-MG Jun 25, 2024

s/continue populating the database/populate the Memgraph database

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction
+              of them. Although you can omit creation of properties when the row is received, this approach consumes less memory on
+              the row level. We can create the relationships by matching the source and destination node we migrated previously. Now,

Contributor

hal-eisen-MG Jun 25, 2024

Remove "Now,"

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated


		![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-relationships-row.png)

		This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction

Contributor

hal-eisen-MG Jun 25, 2024

s/fraction/subset

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              After migrating the relationships, we can see how our graph looks like:
+              ```cypher
+              MATCH (n)-[r]->(m) RETURN n, r, m;

Contributor

hal-eisen-MG Jun 25, 2024

Avoid using generic variables such as "n" and "m".
Prefer "m" for musician and "b" for band, as above.

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              ![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly/connected-graph.png)
+              ### 5. Switch back to in-memory transactional mode
+              If we want to give back the transactional guarantees, we can do so by switching back from

Contributor

hal-eisen-MG Jun 25, 2024

s/give back/resume

hal-eisen-MG reviewed

View reviewed changes

pages/data-migration/migrate-from-rdbms-directly.md Outdated

+              disk, and therefore users can add new columns which can be big in size. This mainly is a concern
+              for unnecessary date strings and urls.
+              This can be optimized by processing the minimal set of necessary columns.

Contributor

hal-eisen-MG Jun 25, 2024

Say more to help the user decide which columns are necessary. Perhaps something about thinking about which graph traversal algorithms are going to be used, and the inputs they need.

Josipmrden requested a review from hal-eisen-MG

July 2, 2024 10:28


          Fix from PR review

c4283e1

Contributor Author

Josipmrden commented Jul 2, 2024

@kgolubic please resolve this merge in whichever way you like.

vercel bot deployed to Preview

July 2, 2024 10:50

View deployment

hal-eisen-MG approved these changes

View reviewed changes

Contributor

hal-eisen-MG left a comment

LGTM

kgolubic approved these changes

View reviewed changes

Contributor

kgolubic left a comment

Everything looks Ok. I'll fix the merge conflict.


          Merge branch 'memgraph-2-18' into migrating-from-rdbms-using-mage

8841fb2

vercel bot deployed to Preview

July 3, 2024 07:17

View deployment

kgolubic merged commit f0edde3 into memgraph-2-18

2 checks passed

kgolubic deleted the migrating-from-rdbms-using-mage branch

July 3, 2024 07:22

kgolubic added a commit that referenced this pull request


          Memgraph 2.18 documentation (#804)

70eaeed

* Docs improvements (#799)

* Add NuRaft log file flag (#816)

* Add health checks docs (#833)

* Update HA examples

* Add NuRaft log file env (#848)

* Add leader/follower role (#857)

Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com>

* Add coordinator hostname HA configuration option (#860)

* First leader

* Update pages/clustering/high-availability.mdx

Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

* Add monitoring at runtime docs (#806)

* database-less connections update (#853)

* Update HA docs

* Add durability for coordinator (#859)

* add durability docs

* Update high-availability.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Edge type property index (#855)

* Add hops limit docs (#881)

* add docs for hops limit

* revert accidentally changed part of the docs

* fix grammar mistakes

* implement suggestions

* Peak memory observation in the system (#788)

* Peak memory observation in the system

* Update storage info where needed

* Update best practics

* Add information from PR review

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* removed flags (#854)

* Update envs for HA (#878)

* fix envs

* add 3 missing envs

* Update configuration.mdx

* Update configuration.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Add callout for important force reset  notice (#877)

* add callout for instance to be alive on force reset

* Update high-availability.mdx

* Enum datatype (v1) (#852)

* Add information around enums

* Add ALTER to the list of clauses

* Add ALTER to the list of Memgraph's Cypher extension

* Fix incorrect ALTER command

* Update pages/querying/clauses/create.mdx

* Update pages/querying/clauses/alter.mdx

* Update pages/querying/clauses/alter.mdx

* Update graph-modeling.md

* Address PR comments

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* SSO Core - OIDC + OAuth2.0 docs (#876)

* wip oidc docs

* oidc sso docs

* PR changes and update

* update package info

* add startup info

* Update v2.18 experimental feature statuses (#871)

* Update v2.18 experimental feature statuses

* Join the sentances into paragraphs

* Update language and formatting

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: kgolubic <kgolubic@gmail.com>

* Add postgresql migration docs (#867)

* Add postgresql migration docs

* Fix from PR review

* Single-sign on: general and SAML documentation (#884)

* Update the existing auth flags for v2.18

* Add a usage note for streams and triggers created by SSO users

* Update general (other than the SAML modules) auth functionalities

* Reorganize auth documentation to accommodate SSO

* Add SAML SSO docs (partial)

* Complete SAML docs

* Apply review suggestions

* Add remaining review suggestions

* Move OIDC docs next to SAML (since both are SSO)

* Fix trigger/stream × SSO usage note placement

* Remove duplicate entry and fix section tag

* Move section to the right place

* Apply suggestions

* Update direct download paths (#885)

* Migration tutorial from `RDBMS` using `MAGE` modules (#770)

* Memgraph 2.17 docs

* Docs

* Zoned datetime data type (#723)

* Document how zoned datetime is supported

* Fix formatting

* Update the Differences in Cypher implementation page

* Update min/max functions

* Revert "Zoned datetime data type (#723)" (#763)

This reverts commit 16c8157.

* Add migration tutorial from rdbms to memgraph

* Add fixes

* Fix from PR review

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Ante Pušić <ante.pusic@memgraph.io>

* new: Add query sharing docs and update ENV variables for Lab (#865)

* new: Add a complete list of ENV variables

* new: Add query-sharing feature page

* Fix styling issues

* Update query-sharing.mdx

* new: Add more details for the query sharing

* Add images

* Insert images

* Insert images

* Fix paths

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Added description how each user can change its own password (#819)

* Added description how each user can change its own password

* Update users.mdx

* Update users.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Add SSO docs (#797)

* Add docs for OAuth setup (WIP)

* Add docs for Okta OAuth

* Replace references to OAuth with OIDC

* Update single-sign-on.mdx

* new: Add SAML for Entra and Okta

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Toni Lastre <toni.lastre@memgraph.io>

* Create release notes for Memgraph 2.18 (#795)

* Update release notes

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update PR

* Update RN

* new: Add Lab 2.15 release notes (#866)

* new: Add Lab 2.15 release notes

* Update pages/release-notes.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Update RN

* Update RN

---------

Co-authored-by: Toni <toni.lastre@memgraph.io>

* Remove support for Amazon Linux 2, CentOS 7 and RedHat 8

* Linked MAGE docs

* Update release notes

* Add callout for Breaking changes

* Add PR link

* Update Release notes

* Update pages/database-management/authentication-and-authorization/auth-system-integrations.mdx

Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

* Fix indentation

---------

Co-authored-by: Andi <andi8647@gmail.com>
Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com>
Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>
Co-authored-by: Josipmrden <josip.mrden@memgraph.io>
Co-authored-by: andrejtonev <29177572+andrejtonev@users.noreply.github.com>
Co-authored-by: Aidar Samerkhanov <aidar.samerkhanov@memgraph.io>
Co-authored-by: David Ivekovic <david.ivekovic@memgraph.io>
Co-authored-by: Gareth Andrew Lloyd <gareth@ignition-web.co.uk>
Co-authored-by: Ivan Milinović <44698587+imilinovic@users.noreply.github.com>
Co-authored-by: Marko Budiselić <marko.budiselic@memgraph.com>
Co-authored-by: Ante Pušić <ante.pusic@memgraph.io>
Co-authored-by: Toni <toni.lastre@memgraph.io>
Co-authored-by: tonijurjevic96 <168409767+tonijurjevic96@users.noreply.github.com>
Co-authored-by: David <davidlozic@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

future release priority: medium (missing info) status: ready