Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README for db-sync component #13045

Merged
merged 1 commit into from
Sep 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions components/ee/db-sync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# db-sync

This component runs only in Gitpod SaaS and is responsible for bidirectional synchronisation of data between two or more mysql databases.

### Commands

`db-sync` has two subcommands: `db-sync run` and `db-sync export` .

- `db-sync run`: Runs the sync process repeatedly
- `db-sync export`: Exports a SQL file containing the sync operations

### Config

`db-sync` uses a config file (passed as the `--config` argument to `db-sync run`). There is an [example config](https://github.com/gitpod-io/gitpod/blob/main/components/ee/db-sync/example-config.json) in the repository and a production config file looks like this:

```yaml
{
"disableTransactions": false,
"replicationLogDir": "/var/log/db-sync",
"roundRobin": true,
"syncPeriod": 10000,
"tableSet": "gitpod",
"targets": [
{
"database": "gitpod",
"host": "localhost",
"name": "...",
"password": "...",
"port": 3300,
"user": "gitpod"
},
{
"database": "gitpod",
"host": "localhost",
"name": "...",
"password": "...",
"port": 3301,
"user": "gitpod"
}
]
}
```

### Modes

`db-sync` runs in “round robin mode” in production (`”roundRobin”: true` in the above config).

Round robin mode:
- Connects to each target database in the config and creates a “replicator” for each.
- Calls `synchronize` on each replicator in turn, to sync each database with every other one.
- The synchronisation loop (running each replicator) happens every `syncPeriod` milliseconds (so every 10s in the above config).

`db-sync` attempts a graceful shutdown by handing `SIGINT` and waiting for any currently running sync to complete before terminating. In Kubernetes this should mean that when `db-sync` is terminated it has 30 seconds (the default termination grace period) in which to complete synchronisation before being forcibly terminated with a `SIGTERM`.

### Synchronisation

- The list of tables that `db-sync` considers for synchronisation is defined by the `GitpodTableDescriptionProvider` in [tables.ts](https://github.com/gitpod-io/gitpod/blob/243ee21379f90f3c70f79f0317723006270493bf/components/gitpod-db/src/tables.ts#L47-L309).
- Each table to be synchronised must specify a `timeColumn` of type `timestamp`
- Tables can specify some columns as ignored by `db-sync` (via `[ignoreColumns](https://github.com/gitpod-io/gitpod/blob/243ee21379f90f3c70f79f0317723006270493bf/components/gitpod-db/src/tables.ts#L21)`)

All rows in considered tables with a `timeColumn` later than the start time of the current sync are inserted into the target database(s).

If a table specifies a `deletionColumn` all rows marked as deleted in the source are hard-deleted from the target database(s) *as well as the source database*.

### gitpod_replication table

`db-sync` uses the `gitpod_replication` db table. This table holds a single row, continuously updated, containing the last time `db-sync` performed a sync. This timestamp is used as the start_date for the next sync. This start_date can also be ignored to force a full sync.

### Logging

`db-sync` writes logs to `config.replicationLogDir` where each timestamped logged file contains the SQL run on each database during that synchronisation.