Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db): Setup neo4j data migrations #2828

Closed
wants to merge 16 commits into from
Closed

feat(db): Setup neo4j data migrations #2828

wants to merge 16 commits into from

Conversation

Tirokk
Copy link
Member

@Tirokk Tirokk commented Oct 5, 2020

roschaefer Authored by roschaefer
Jan 19, 2020
Merged Jan 23, 2020


🍰 Pullrequest

I use #2381 as a use case to add consistent and reliable data migrations to our backend.

Issues

Todo

  • Configure migrate in docker image (production build stage)
  • Implement Neo4J data store for migrate
  • Refactor existing code for data tasks (seed, cleaning) into one folder db/
  • Run migration on production
  • Finish feat(deployment): Add helm charts for deploy #1613 so we can run yarn run prod:migrate init on post-install and yarn run prod:migrate up on post-upgrade

I guess we have to do the latter manually (during a downtime)

roschaefer and others added 16 commits January 20, 2020 10:58
Implement a migration to merge duplicate user accounts with reactive
programming. Those duplicate user accounts existed, because around 40
users have decided to register again while we experienced a bug
related to normalized emails in our database.
* Add unique index for `Migration`s
* Fix proper use of `next` callback. First argument is potential error.
* Update migration template
- having duplicate Location nodes in the production database blocks us
  from adding a unique constraint, so that Locations are not created
which have the same id.
- it's more human readable and consistent with all other nodes in
  production
- create command should be run with --date-format to be more human
  readable, and --template-file to use our template instead of migrate's
default
- rename migrations
- rename createdAt to migratedAt to remove ambiguity
- do not merge relationships for Location nodes as we don't want to
  create duplicate relationships
- use singular locationId as it's iterating one at a time
We have to figure out if `mergeRels: true` is actually avoiding
duplicate relationships 🤔.

Before:
(l1)-[:IS_IN]->(l2)
(l1)-[:IS_IN]->(l3)

After:
(l1)-[:IS_IN]->(new)
(l1)-[:IS_IN]->(new)
@Tirokk
Copy link
Member Author

Tirokk commented Oct 7, 2020

cypress[bot] Authored by cypress[bot]
Jan 20, 2020




Test summary

53 0 0 0


Run details

Project Human-Connection
Status Passed
Commit ffb3982d32
Started Jan 23, 2020 8:21 PM
Ended Jan 23, 2020 8:34 PM
Duration 12:49 💡
OS Linux Ubuntu Linux - 16.04
Browser Electron 78

View run in Cypress Dashboard ➡️


This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

{% tabs %}
{% tab title="Docker" %}
```bash
docker-compose exec backend yarn run db:migrate init
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattwr18 Authored by mattwr18
Jan 20, 2020


Outdated (history rewrite) - original diff


@@ -53,6 +53,27 @@ can issue GraphQL requests or access GraphQL Playground in the browser.
 
 ![GraphQL Playground](../.gitbook/assets/graphql-playground.png)
 
+### Database Indices and Constraints
+
+Database indices and constraints need to be created when the database and the
+backend is running:
+
+{% tabs %}
+{% tab title="Docker" %}
+```bash
+docker-compose exec backend yarn run db:migrate init

I'm getting this error when running this command:

Error: ERROR_TRANSACTION_FAILED
    at new TransactionError (/nitro-backend/node_modules/neode/build/TransactionError.js:21:126)
    at /nitro-backend/node_modules/neode/build/index.js:456:33
    at processTicksAndRejections (internal/process/task_queues.js:94:5)
    at Store.init (/nitro-backend/src/db/migrate/store.js:18:7) {
  errors: [
    {
      query: 'CREATE CONSTRAINT ON (model:Location) ASSERT model.id IS UNIQUE',
      params: {},
      error: [Neo4jError]
    },
    {
      query: 'CREATE CONSTRAINT ON (model:Donations) ASSERT model.id IS UNIQUE',
      params: {},
      error: [Neo4jError]
    },
    {
      query: 'CREATE CONSTRAINT ON (model:Report) ASSERT model.id IS UNIQUE',
      params: {},
      error: [Neo4jError]
    }
  ]
}

do I need to delete all indexes on the database before running this command? or does this error message mean something more to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattwr18 Authored by mattwr18
Jan 21, 2020


looking into this further @roschaefer ... this has nothing, it seems, to do with Docker vs. local setup, but production vs seed data.
It appears there is an error thrown when setting up the unique constraints on Location and then the rest of the 4 👆 don't get run. I ran them individually in the neo4j browser and got this error for Location, the other 4 were successful.

Unable to create CONSTRAINT ON ( location:Location ) ASSERT location.id IS UNIQUE:
Both Node(48981) and Node(186812) have the label `Location` and property `id` = 'place.4463223470618060'

Looking this Location up by it's id results in 4 identical Locations in production data, that look like

{
  "lng": 48.06028,
  "nameES": "Villingen-Schwenningen",
  "nameFR": "Villingen-Schwenningen",
  "nameIT": "Villingen-Schwenningen",
  "nameEN": "Villingen-Schwenningen",
  "type": "place",
  "namePT": "Villingen-Schwenningen",
  "nameDE": "Villingen-Schwenningen",
  "nameNL": "Villingen-Schwenningen",
  "name": "Villingen-Schwenningen",
  "namePL": "Villingen-Schwenningen",
  "id": "place.4463223470618060",
  "lat": 8.45861
}

I guess we don't really need to run init in production, right? should we delete these duplicate records anyways?
Looking at our db schema in production confirms that we do no have any CONSTRAINT on Location`, so I guess we want to set that up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roschaefer Authored by roschaefer
Jan 21, 2020


@mattwr18 yes, we want to set that up. I used neode to setup unique constraints and if that fails on production data, then we had one constraint missing on production. Maybe we can run a migration before we setup the constraints. Sounds odd but is the best solution that comes to my mind right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattwr18 Authored by mattwr18
Jan 22, 2020


I've added a migration that runs first... but we would need to run yarn db:migrate up before running yarn db:migrate init... otherwise the latter will fail.
Also, I changed the timestamp of the merge_duplicate_location_nodes so that it run before the merge_duplicate_user_nodes, but actually that's not important.
We have talked about adding the migrations we have already run on production data to this folder so that we have a comprehensive list of every migration we have run... should those timestamps be altered as well to appear at the top of the list and be run first?
Also, when I added the unique constraints on Migration node, I realized it failed as it was creating multiple nodes for the same migration.
I've updated it to MERGE by title, then ON CREATE SET the properties and changed the timestamp to createdAt so that we have a record of when the migration was run in production, and it's human readable.

```bash
# in folder backend/
# make sure your database is running on http://localhost:7474/browser/
yarn run db:migrate init
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattwr18 Authored by mattwr18
Jan 20, 2020


Outdated (history rewrite) - original diff


@@ -53,6 +53,27 @@ can issue GraphQL requests or access GraphQL Playground in the browser.
 
 ![GraphQL Playground](../.gitbook/assets/graphql-playground.png)
 
+### Database Indices and Constraints
+
+Database indices and constraints need to be created when the database and the
+backend is running:
+
+{% tabs %}
+{% tab title="Docker" %}
+```bash
+docker-compose exec backend yarn run db:migrate init
+```
+{% endtab %}
+
+{% tab title="Without Docker" %}
+```bash
+# in folder backend/
+# make sure your database is running on http://localhost:7474/browser/
+yarn run db:migrate init

when I ran this locally, without Docker... it seems to have run successfully, based on this output

yarn db:migrate init
yarn run v1.19.0
warning ../../package.json: No license field
$ yarn run __migrate --store ./src/db/migrate/store.js init
warning ../../package.json: No license field
$ migrate --compiler 'js:@babel/register' --migrations-dir ./src/db/migrations --store ./src/db/migrate/store.js init
  migrations dir : /home/mattwr18/code/Human-Connection/backend/src/db/migrations
Successfully created database indices and constraints!
  init : undefined
Done in 3.64s.


class Store {
async init(next) {
const neode = getNeode()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roschaefer Authored by roschaefer
Jan 23, 2020


Outdated (history rewrite) - original diff


@@ -35,7 +35,7 @@ class Store {
     const session = driver.session()
     const readTxResultPromise = session.readTransaction(async txc => {
       const result = await txc.run(
-        'MATCH (migration:Migration) RETURN migration {.*} ORDER BY migration.timestamp DESC',
+        'MATCH (migration:Migration) RETURN migration {.*} ORDER BY migration.createdAt DESC',

One could argue we should save two attributes: createdAt and migratedAt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattwr18 Authored by mattwr18
Jan 23, 2020


maybe, but I guess those will be the same cause the Migration is only created in the database when it is run for the first time... after that, since we use merge, it is not updated.
I thought about calling it migratedAt, and wouldn't be opposed to it, but thought that createdAt could also be ok...

title: { type: 'string', primary: true, token: true },
description: { type: 'string' },
migratedAt: { type: 'string', isoDate: true, default: () => new Date().toISOString() },
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roschaefer Authored by roschaefer
Jan 23, 2020


Outdated (history rewrite) - original diff


@@ -1,5 +1,5 @@
 export default {
   title: { type: 'string', primary: true, token: true },
   description: { type: 'string' },
-  timestamp: { type: 'number', unique: true },
+  createdAt: { type: 'string', isoDate: true, default: () => new Date().toISOString() },

Yeah, so this is certainly migratedAt.

@Mogge Mogge closed this Oct 8, 2020
@ulfgebhardt ulfgebhardt deleted the pr2828head branch January 7, 2021 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants