Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration tutorial from RDBMS using MAGE modules #770

Merged
merged 12 commits into from
Jul 3, 2024

Conversation

Josipmrden
Copy link
Contributor

Description

New tutorial has been made which migrates from Postgres to Memgraph (basically any RDBMS to Memgraph).

Pull request type

Please check what kind of PR this is:

  • Fix or improvement of an existing page
  • New documentation page, release related

Related PRs and issues

PR this doc page is related to:
(especially necessary if the PR is related to a release)

Closes:
(paste the link to the issue it closes)

Checklist:

  • Check all content with Grammarly
  • Perform a self-review of my code
  • Make corresponding changes to the rest of the documentation (consult with the DX team)
  • The build passes locally
  • My changes generate no new warnings or errors
  • Add a corresponding label
  • If release-related, add a product and version label
  • If release-related, add release note on product PR

@Josipmrden Josipmrden added the priority: medium (missing info) An additional information can be helpful or interesting, but the absence is not disruptive label May 15, 2024
@Josipmrden Josipmrden self-assigned this May 15, 2024
@Josipmrden Josipmrden requested a review from kgolubic as a code owner May 15, 2024 12:20
Copy link

vercel bot commented May 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
documentation ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 3, 2024 7:17am

@kgolubic
Copy link
Contributor

@Josipmrden is this PR ready for review? I'm asking since it is missing the Ready label and you have asked for my review.

@kgolubic kgolubic requested a review from hal-eisen-MG June 10, 2024 08:20
@Josipmrden Josipmrden added the status: ready PR is ready for review label Jun 14, 2024
@kgolubic kgolubic mentioned this pull request Jun 14, 2024
19 tasks
@kgolubic kgolubic changed the base branch from main to memgraph-2-18 June 17, 2024 08:10
# Migrate from RDBMS to Memgraph using MAGE modules

This tutorial will help you import your data from a PostgreSQL database into Memgraph
directly, using MAGE query modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comma

The exercise is to migrate `Bands` and `Musicians` rows as nodes, and the mapping table as
relationships. The steps below are provided so you can quickly glance at them at any time:

1. Create necessary indices (important for relationship import)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say something about speed or throughput

2. Switch to in-memory analytical mode (to eliminate memory overhead)
3. Import nodes from corresponding tables
4. Import relationships from corresponding tables
5. Switch back to in-memory analytical mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, we switched to analytical in step 2. Do you mean switch back to transactional?


### 1. Create necessary indices
In SQL databases, indices are mostly created after the import has finished, as they can slow down
importing times. However, in Memgraph it's necesssary to create indices beforehand. Reason for this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Reason/The reason


### 2. Switch to in-memory analytical mode
Memgraph currently offers 2 in-memory modes, transactional and analytical. Transactional is often used in
production deployments. It has all the ACID guarantees and it's safe to use for transactional workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link from text ACID to https://en.wikipedia.org/wiki/ACID

However, when executing import for the first time and testing, it is beneficial to use in-memory analytical
mode.

In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "also"


In-memory analytical mode offers less memory overhead than transactional mode. Also, in-memory analytical mode
is highly parallelizable when importing nodes and relationships in the way we will see below. However, it does
not come with write-ahead logging, and it is therefore stripped of the ACID guarantees. More information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/stripped of/unable to make

CREATE (n:Musician {id: row.musician_id, name: row.name, instrument: row.instrument});
```

We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the word "iterate" somewhere in here, so the reader understands that all of the rows will be processed one by one.


We can see that the YIELD clause gives us multiple `row` objects, which are essentially a map which
has the column name as the key, and the column value as the value.
If we just issue the queries to return us the rows, so we can inspect the values, we can unwrap the maps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank space above this line to start a new paragraph which separates the two ideas of migrating versus inspecting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are inconsistent about the order. Below, we inspect and then migrate. Here we migrate and then inspect. This is confusing.

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-raw-nodes.png)

### 4. Import relationships from corresponding tables
We can not do any graph use cases without connections, so in this section, we will update the relationships.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/without connections/without connections between the nodes

2. Appropriate data type conversion

#### 1. Ommitting unnecessary columns when migrating
Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most users will want to migrate full rows to Memgraph, which can sometimes be inefficient.


#### 1. Ommitting unnecessary columns when migrating
Usually, users will migrate the whole data to Memgraph. The columns in Postgres are meant to be stored on
disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized by processing the minimal set of necessary columns.

disk, and therefore users can carelessly add new columns which can be big in size. This mainly is a concern
for unnecessary date strings, and urls.

For graph use cases, you would want to keep the number of columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the 2nd person pronoun of "you" as it is too informal.

#### 2. Appropriate data type conversion
When receiving rows from RDBMS to Memgraph, be sure that the appropriate types are utilized. That namely
is concerning converting strings to integers (`ToInteger`), floats (`ToFloat`),
booleans (`ToBoolean`), or date-time objects (`date()`, `localdatetime()`, and others.). For functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra period (.)
All the parenthesis look awkward. Consider making a table.

@kgolubic kgolubic added the future release For one of the next versions label Jun 24, 2024
not come with write-ahead logging, and it is therefore unable to make ACID guarantees. More information
about analytical mode can be found on our [page about storage modes](/fundamentals/storage-memory-usage).

In-memory storage mode is enabled with the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/enabled/set

```

In the next two sections, we will provide you with different ways of migrating from an external data source.
One will be migrating the whole table from the RDBMS to Memgraph, and other will be by issuing a query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and other/and the other


### 3. Import nodes from corresponding tables
The [migrate module](/advanced-algorithms/available-algorithms/migrate) in MAGE has an easy API how to migrate rows from a relational database to Memgraph. We
can migrate and inspect the table with the following query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "migrate and"


The `YIELD` keyword will create the `row` objects, which are essentially a map which
has the column name as the key, and the column value as the value. We can iterate over the row objects with
the Cypher language to continue populating the database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/continue populating the database/populate the Memgraph database


This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction
of them. Although you can omit creation of properties when the row is received, this approach consumes less memory on
the row level. We can create the relationships by matching the source and destination node we migrated previously. Now,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "Now,"


![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly-relationships-row.png)

This query is useful as sometimes you don't need all the columns to be migrated to Memgraph, you only need a fraction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fraction/subset

After migrating the relationships, we can see how our graph looks like:

```cypher
MATCH (n)-[r]->(m) RETURN n, r, m;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using generic variables such as "n" and "m".
Prefer "m" for musician and "b" for band, as above.

![](/pages/data-migration/migrate-from-rdbms-directly/migrate-from-rdbms-directly/connected-graph.png)

### 5. Switch back to in-memory transactional mode
If we want to give back the transactional guarantees, we can do so by switching back from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/give back/resume

disk, and therefore users can add new columns which can be big in size. This mainly is a concern
for unnecessary date strings and urls.

This can be optimized by processing the minimal set of necessary columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say more to help the user decide which columns are necessary. Perhaps something about thinking about which graph traversal algorithms are going to be used, and the inputs they need.

@Josipmrden Josipmrden requested a review from hal-eisen-MG July 2, 2024 10:28
@Josipmrden
Copy link
Contributor Author

@kgolubic please resolve this merge in whichever way you like.

Copy link
Contributor

@hal-eisen-MG hal-eisen-MG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kgolubic kgolubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks Ok. I'll fix the merge conflict.

@kgolubic kgolubic merged commit f0edde3 into memgraph-2-18 Jul 3, 2024
2 checks passed
@kgolubic kgolubic deleted the migrating-from-rdbms-using-mage branch July 3, 2024 07:22
kgolubic added a commit that referenced this pull request Jul 3, 2024
* Docs improvements (#799)

* Add NuRaft log file flag (#816)

* Add health checks docs (#833)

* Update HA examples

* Add NuRaft log file env (#848)

* Add leader/follower role (#857)

Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com>

* Add coordinator hostname HA configuration option (#860)

* First leader

* Update pages/clustering/high-availability.mdx

Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

* Add monitoring at runtime docs (#806)

* database-less connections update (#853)

* Update HA docs

* Add durability for coordinator (#859)

* add durability docs

* Update high-availability.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Edge type property index (#855)

* Add hops limit docs (#881)

* add docs for hops limit

* revert accidentally changed part of the docs

* fix grammar mistakes

* implement suggestions

* Peak memory observation in the system (#788)

* Peak memory observation in the system

* Update storage info where needed

* Update best practics

* Add information from PR review

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* removed flags (#854)

* Update envs for HA (#878)

* fix envs

* add 3 missing envs

* Update configuration.mdx

* Update configuration.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Add callout for important force reset  notice (#877)

* add callout for instance to be alive on force reset

* Update high-availability.mdx

* Enum datatype (v1) (#852)

* Add information around enums

* Add ALTER to the list of clauses

* Add ALTER to the list of Memgraph's Cypher extension

* Fix incorrect ALTER command

* Update pages/querying/clauses/create.mdx

* Update pages/querying/clauses/alter.mdx

* Update pages/querying/clauses/alter.mdx

* Update graph-modeling.md

* Address PR comments

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* SSO Core - OIDC + OAuth2.0 docs (#876)

* wip oidc docs

* oidc sso docs

* PR changes and update

* update package info

* add startup info

* Update v2.18 experimental feature statuses (#871)

* Update v2.18 experimental feature statuses

* Join the sentances into paragraphs

* Update language and formatting

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: kgolubic <kgolubic@gmail.com>

* Add postgresql migration docs (#867)

* Add postgresql migration docs

* Fix from PR review

* Single-sign on: general and SAML documentation (#884)

* Update the existing auth flags for v2.18

* Add a usage note for streams and triggers created by SSO users

* Update general (other than the SAML modules) auth functionalities

* Reorganize auth documentation to accommodate SSO

* Add SAML SSO docs (partial)

* Complete SAML docs

* Apply review suggestions

* Add remaining review suggestions

* Move OIDC docs next to SAML (since both are SSO)

* Fix trigger/stream × SSO usage note placement

* Remove duplicate entry and fix section tag

* Move section to the right place

* Apply suggestions

* Update direct download paths (#885)

* Migration tutorial from `RDBMS` using `MAGE` modules (#770)

* Memgraph 2.17 docs

* Docs

* Zoned datetime data type (#723)

* Document how zoned datetime is supported

* Fix formatting

* Update the Differences in Cypher implementation page

* Update min/max functions

* Revert "Zoned datetime data type (#723)" (#763)

This reverts commit 16c8157.

* Add migration tutorial from rdbms to memgraph

* Add fixes

* Fix from PR review

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Ante Pušić <ante.pusic@memgraph.io>

* new: Add query sharing docs and update ENV variables for Lab (#865)

* new: Add a complete list of ENV variables

* new: Add query-sharing feature page

* Fix styling issues

* Update query-sharing.mdx

* new: Add more details for the query sharing

* Add images

* Insert images

* Insert images

* Fix paths

---------

Co-authored-by: kgolubic <kgolubic@gmail.com>
Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Added description how each user can change its own password (#819)

* Added description how each user can change its own password

* Update users.mdx

* Update users.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Add SSO docs (#797)

* Add docs for OAuth setup (WIP)

* Add docs for Okta OAuth

* Replace references to OAuth with OIDC

* Update single-sign-on.mdx

* new: Add SAML for Entra and Okta

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>
Co-authored-by: Toni Lastre <toni.lastre@memgraph.io>

* Create release notes for Memgraph 2.18 (#795)

* Update release notes

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update RN

* Update PR

* Update RN

* new: Add Lab 2.15 release notes (#866)

* new: Add Lab 2.15 release notes

* Update pages/release-notes.mdx

---------

Co-authored-by: Kruno Golubic <46486712+kgolubic@users.noreply.github.com>

* Update RN

* Update RN

---------

Co-authored-by: Toni <toni.lastre@memgraph.io>

* Remove support for Amazon Linux 2, CentOS 7 and RedHat 8

* Linked MAGE docs

* Update release notes

* Add callout for Breaking changes

* Add PR link

* Update Release notes

* Update pages/database-management/authentication-and-authorization/auth-system-integrations.mdx

Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>

* Fix indentation

---------

Co-authored-by: Andi <andi8647@gmail.com>
Co-authored-by: Antonio Filipovic <61245998+antoniofilipovic@users.noreply.github.com>
Co-authored-by: Katarina Supe <61758502+katarinasupe@users.noreply.github.com>
Co-authored-by: Josipmrden <josip.mrden@memgraph.io>
Co-authored-by: andrejtonev <29177572+andrejtonev@users.noreply.github.com>
Co-authored-by: Aidar Samerkhanov <aidar.samerkhanov@memgraph.io>
Co-authored-by: David Ivekovic <david.ivekovic@memgraph.io>
Co-authored-by: Gareth Andrew Lloyd <gareth@ignition-web.co.uk>
Co-authored-by: Ivan Milinović <44698587+imilinovic@users.noreply.github.com>
Co-authored-by: Marko Budiselić <marko.budiselic@memgraph.com>
Co-authored-by: Ante Pušić <ante.pusic@memgraph.io>
Co-authored-by: Toni <toni.lastre@memgraph.io>
Co-authored-by: tonijurjevic96 <168409767+tonijurjevic96@users.noreply.github.com>
Co-authored-by: David <davidlozic@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future release For one of the next versions priority: medium (missing info) An additional information can be helpful or interesting, but the absence is not disruptive status: ready PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants