Skip to content

Commit

Permalink
Document dual-write and backfill migration strategy
Browse files Browse the repository at this point in the history
  • Loading branch information
JamesGuthrie committed Aug 29, 2023
1 parent da234c0 commit a7d38fc
Show file tree
Hide file tree
Showing 3 changed files with 288 additions and 6 deletions.
244 changes: 244 additions & 0 deletions use-timescale/migration/dual-write-and-backfill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
---
title: Low-downtime migrations with dual-write and backfill
excerpt: Migrate a hypertable or entire database with low downtime
products: [cloud, self_hosted]
keywords: [backups, restore]
tags: [recovery, logical backup, pg_dump, pg_restore]
---

# Dual-write and backfill

Dual-write and backfill is a migration strategy to move a large amount of
time-series data (100GB-4TB+) with low downtime (on the order of minutes).

Check failure on line 12 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Units] Put a nonbreaking space between the number and the unit in '100GB'. Raw Output: {"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '100GB'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 12, "column": 19}}}, "severity": "ERROR"}

Check failure on line 12 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Units] Put a nonbreaking space between the number and the unit in '4TB'. Raw Output: {"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '4TB'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 12, "column": 25}}}, "severity": "ERROR"}
Roughly, it consists of three steps:

1. Clone schema and relational data from source to target
1. Dual-write to source and target
1. Backfill time-series data

Dual-write and backfill can be used for any source database type, as long as it
can provide data in csv format. It can be used to move data from a Postgres
source, and from TimescaleDB to TimescaleDB. If the source and target databases
are Postgres, they can be of different versions, as long as the target is
greater than the source. If both source and target use TimescaleDB, the version
of TimescaleDB must be the same.

Dual-write and backfill works well when:
1. The bulk of the (on-disk) data is in time-series tables.
1. Writes by the application do not reference historical time-series data.
1. There is no requirement for transactional consistency (i.e. it is possible

Check failure on line 29 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Latin] Use 'that is' instead of 'i.e.'. Raw Output: {"message": "[Google.Latin] Use 'that is' instead of 'i.e.'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 29, "column": 59}}}, "severity": "ERROR"}
to filter the time-series data by time and retain data integrity).
1. No `UPDATE` or `DELETE` queries will be run on time-series data in the
source database during the migration process (or if they are, it happens in
a controlled manner, such that it's possible to either ignore, or
re-backfill).
1. Either the relational (non-time-series) data is small enough to be copied
from source to target in an acceptable amount of time for this to be done
with downtime, or the relational data can be copied asynchronously while the
application continues to run (i.e. changes relatively infrequently).

Check failure on line 38 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Latin] Use 'that is' instead of 'i.e.'. Raw Output: {"message": "[Google.Latin] Use 'that is' instead of 'i.e.'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 38, "column": 34}}}, "severity": "ERROR"}

## Migration process

In detail, the migration process consists of the following steps:
1. Set up a second database
1. Modify the application to write to a secondary database
1. Migrate schema and relational data from source to target
1. Start the application in dual-write mode
1. Determine the consistency time `T`
1. Backfill time-series data from source to target
1. Enable retention and compression policies
1. Validate that all data is present in target database
1. Validate that target database can handle production load
1. Switch application to treat target database as primary (potentially continuing to write into source database, as a backup)

### 1. Set up a second database

[Create a database service in Timescale][create-service].

Check failure on line 56 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.MarkdownLinks] Remember to include the link. Raw Output: {"message": "[Google.MarkdownLinks] Remember to include the link.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 56, "column": 41}}}, "severity": "ERROR"}

[create-service]: /use-timescale/:currentVersion:/services/create-a-service/

### 2. Modify the application to write to a secondary database

How exactly to do this is dependent on the language that your application is
written in, and on how exactly your ingestion and application function. In the
simplest case, you simply execute two inserts in parallel. In the general case,
you will need to think about how to handle the failure to write to either the
old or new database, and what mechanism you want to or can build to recover
from such a failure.

You may also want to execute the same read queries on the old and new database,
in order to evaluate the correctness and performance of the results which the
queries deliver. Bear in mind that the new database will spend a certain amount
of time without all data being present, so you should expect that the results
are not the same for some period (potentially a number of days).

### 3. Set up schema and migrate relational data to new database

How exactly you perform this is dependent on whether you're migrating from
plain PostgreSQL, TimescaleDB, or some other database.

#### From TimescaleDB

Dump the database roles from the source database:

```
pg_dumpall -d "$SOURCE" \
--quote-all-identifiers \
--roles-only \
--file=roles.sql
```

Fix up the dumped roles:

```
sed -i -E \
-e '/CREATE ROLE "postgres";/d' \
-e '/ALTER ROLE "postgres"/d' \
-e 's/(NO)*SUPERUSER//g' \
-e 's/(NO)*REPLICATION//g' \
-e 's/(NO)*BYPASSRLS//g' \
-e 's/GRANTED BY "[^"]*"//g' \
roles.sql
```

Dump all plain tables and the TimescaleDB catalog from the source database:

```
pg_dump -d "$SOURCE" \
--format=plain \
--quote-all-identifiers \
--no-tablespaces \
--no-owner \
--no-privileges \
--exclude-table-data='_timescaledb_internal.*' \
--file=dump.sql
```

a. `--no-tablespaces` is required because Timescale does not support
tablespaces other than the default. This is a limitation.
b. `--no-owner` is required because tsdbadmin is not a superuser and cannot
assign ownership in all cases. This flag means that everything will be
owned by the tsdbadmin user in the target regardless of ownership in the
source. This is a limitation.
c. `--no-privileges` is required because tsdbadmin is not a superuser and
cannot assign privileges in all cases. This flag means that privileges
assigned to other users will need to be reassigned in the target
database as a manual clean-up task. This is a limitation.
d. `--exclude-table-data='_timescaledb_internal.*'` will dump the structure
of the hypertable chunks, but not the data. This will create empty
chunks on the target ready for the backfill process.

1. If the source database has the timescaledb extension installed in a schema
other than "public" it will cause issues on Timescale. Edit the dump file to
remove any references to the non-public schema. We need the extension in the
"public" schema on Timescale. This is a limitation.
1. If any background jobs are owned by the "postgres" user, they need to be
owned by "tsdbadmin" on the target database. Edit the dump file accordingly.

Load the roles and schema into the target database, and disable all background jobs.
TODO: why do we disable all background jobs?

```
psql -X -d "$TARGET" \
-v ON_ERROR_STOP=1 \
--echo-errors \
-f roles.sql \
-c 'select public.timescaledb_pre_restore();' \
-f dump.sql \
-f - <<'EOF'
begin;
select public.timescaledb_post_restore();
-- disable all background jobs
select public.alter_job(id::integer, scheduled=>false)
from _timescaledb_config.bgw_job
where id >= 1000
;
commit;
EOF
```

#### From plain PostgreSQL

TODO

#### From some other database

TODO

### 4. Start application in dual-write mode

With the target database set up, your application can now be started in
dual-write mode.

### 5. Determine the consistency time `T`

After dual-writes have been executing for a while, the target hypertable will
contain data in three time ranges: missing writes, late-arriving data, and the
"consistency" range

#### Missing writes

If the application is made up of multiple writers, and these writers did not
all simultaneously start writing into the target hypertable, there is a period
of time in which not all writes have made it into the target hypertable. This
period starts when the first writer begins dual-writing, and ends when the last
writer begins dual-writing.

#### Late-arriving data

Some applications have late-arriving data: measurements which have a timestamp
in the past, but which weren't written yet (e.g. from devices which had

Check failure on line 191 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Latin] Use 'for example' instead of 'e.g.'. Raw Output: {"message": "[Google.Latin] Use 'for example' instead of 'e.g.'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 191, "column": 45}}}, "severity": "ERROR"}
intermittent connectivity issues). The window of late-arriving data is between
the present moment, and the maximum lateness.

#### Consistency range

The consistency range is the range in which there are no missing writes, and in
which all data has arrived, i.e. between the end of the missing writes range

Check failure on line 198 in use-timescale/migration/dual-write-and-backfill.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Latin] Use 'that is' instead of 'i.e.'. Raw Output: {"message": "[Google.Latin] Use 'that is' instead of 'i.e.'.", "location": {"path": "use-timescale/migration/dual-write-and-backfill.md", "range": {"start": {"line": 198, "column": 29}}}, "severity": "ERROR"}
and the beginning of the late-arriving data range.

The length of these ranges is defined by the properties of the application,
there is no one-size-fits-all way to determine what they are.
The consistency time `T` is an arbitrarily chosen time in the consistency range.

### 6. Backfill data from source to target

If your source database is using TimescaleDB, we recommend using our backfill
tool `timescaledb-backfill`.

If your source database is not using TimescaleDB, we recommend dumping the data
from your source database on a per-table basis into CSV format, and restoring
those CSVs into the target database using the `timescaledb-parallel-copy` tool.

### 7. Enable retention and compression policies

Reenable all retention and compression policies.
If the backfill process took long enough for there to be significant
retention/compression work to be done, it may be preferable to run the jobs
manually in order to have control over the pacing of the work until it is
caught up before reenabling.

### 8. Validate that all data is present in target database

One possible approach to validating this is to compare row counts on a
chunk-by-chunk basis. One way to do so is to run `select count(*) ...` which is
exact but potentially costly. Another way to do it would be to run analyze on
both the source and target chunk and then look at the `reltuples` column of the
`pg_class` table for the chunks' rows. This would not be exact but would be
less costly.

### 9. Validate that target database can handle production load

Assuming dual-writes have been in place, the target database should be holding
up to production write traffic. Now would be the right time to determine if the
new database can serve all production traffic (both reads _and_ writes). How
exactly this is done is application-specific and up to you to determine.

### 10. Switch production workload to new database

Once you've validated that all the data is present, and that the new database
can handle the production workload, the final step is to switch to the new
database as your primary. You may want to continue writing to the old database
for a period, until you are certain that the new database is holding up to all
production traffic.
44 changes: 38 additions & 6 deletions use-timescale/migration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,44 @@ tags: [ingest, migrate, RDS]

# Migrate your data to Timescale

You can migrate data from another database into Timescale using the PostgreSQL
`pg_dump` and `pg_restore` commands. You can also use these tools to migrate
your data from Managed Service for TimescaleDB, from a self-hosted Timescale
instance, or from another PostgreSQL database, including Amazon RDS.
There are a number of different ways to migrate your data to Timescale. Which
option you choose depends on a few different factors, the most important of
which are:

If you want to import data from another format, such as a `.csv` file, into a
new Timescale service, see the [data ingest section][data-ingest].
- How much downtime can you afford (minutes, or hours?)
- How much data are you migrating (megabytes, or terabytes?)

Check warning on line 16 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.WordList] Use 'data is' instead of 'data are'. Raw Output: {"message": "[Google.WordList] Use 'data is' instead of 'data are'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 16, "column": 12}}}, "severity": "WARNING"}
- Where will you be migrating your data from (Postgres, TimescaleDB, Influx, or MySQL?)

Check warning on line 17 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Will] Avoid using 'will'. Raw Output: {"message": "[Google.Will] Avoid using 'will'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 17, "column": 9}}}, "severity": "WARNING"}

Check warning on line 17 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Postgres'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Postgres'?", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 17, "column": 47}}}, "severity": "WARNING"}

Check warning on line 17 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.WordList] Use 'PostgreSQL' instead of 'Postgres'. Raw Output: {"message": "[Google.WordList] Use 'PostgreSQL' instead of 'Postgres'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 17, "column": 47}}}, "severity": "WARNING"}

If you are using Postgres or TimescaleDB and can afford to take your application

Check warning on line 19 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.WordList] Use 'PostgreSQL' instead of 'Postgres'. Raw Output: {"message": "[Google.WordList] Use 'PostgreSQL' instead of 'Postgres'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 19, "column": 18}}}, "severity": "WARNING"}

Check warning on line 19 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Postgres'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Postgres'?", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 19, "column": 18}}}, "severity": "WARNING"}
offline for a few hours, the simplest option is to migrate data from another
database into Timescale using PostgreSQL's `pg_dump` and `pg_restore` commands.
You can also use these tools to migrate your data from Managed Service for
TimescaleDB, from a self-hosted TimescaleDB instance, or from another
PostgreSQL database, including Amazon RDS. Consult our guide on [migrating with

Check warning on line 24 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Acronyms] Spell out 'RDS', if it's unfamiliar to the audience. Raw Output: {"message": "[Google.Acronyms] Spell out 'RDS', if it's unfamiliar to the audience.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 24, "column": 39}}}, "severity": "INFO"}

Check warning on line 24 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.We] Try to avoid using first-person plural like 'our'. Raw Output: {"message": "[Google.We] Try to avoid using first-person plural like 'our'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 24, "column": 52}}}, "severity": "WARNING"}
pg_dump and pg_restore][pg-dump-restore].

If you are looking for a low-downtime alternative (downtime on the order of
minutes), are willing to modify your ingestion code, and the bulk of your data
is stored in time-series tables, you can use the [dual-write and backfill][dual-write]
strategy for a low-downtime migration. This strategy also works if your source
database is not PostgreSQL-based.

If you're using PostgreSQL, you may also have heard of logical replication
being the recommended strategy for migrations with low downtime. Currently,
TimescaleDB doesn't work with logical replication, so this is not a viable
option, but we are actively working on making this possible.

Check warning on line 36 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.We] Try to avoid using first-person plural like 'we'. Raw Output: {"message": "[Google.We] Try to avoid using first-person plural like 'we'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 36, "column": 13}}}, "severity": "WARNING"}

If you're looking for a zero-downtime migration method, please let us know when

Check warning on line 38 in use-timescale/migration/index.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.Please] Avoid using 'please'. Raw Output: {"message": "[Google.Please] Avoid using 'please'.", "location": {"path": "use-timescale/migration/index.md", "range": {"start": {"line": 38, "column": 57}}}, "severity": "WARNING"}
you find it. We're looking for it too!

If you're migrating from some other source than PostgreSQL, and don't want to
use the dual-write and backfill approach, then the easiest way to move your
data to Timescale is by exporting the data from your existing database as a
`.csv` file, and importing it with [timescaledb-parallel-copy][parallel-copy].

For other ingestion methods, see the [data ingest section][data-ingest].

[data-ingest]: /use-timescale/:currentVersion:/ingest-data/
[dual-write]: /use-timescale/:currentVersion:/migration/dual-write-and-backfill/
[pg-dump-restore]: /use-timescale/:currentVersion:/migration/pg-dump-and-restore/
[parallel-copy]: /use-timescale/:currentVersion:/ingest-data/import-csv/
6 changes: 6 additions & 0 deletions use-timescale/page-index/page-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,12 @@ module.exports = [
excerpt:
"Migrate a hypertable or entire database with native PostgreSQL commands",
},
{
title: "Dual-write and backfill",
href: "dual-write-and-backfill",
excerpt:
"Migrate a large database with low downtime",
},
],
},
{
Expand Down

0 comments on commit a7d38fc

Please sign in to comment.