Skip to content

Latest commit

 

History

History
184 lines (149 loc) · 6.38 KB

README.md

File metadata and controls

184 lines (149 loc) · 6.38 KB

molt

Go Tests

Migrate Off Legacy Things is CockroachDB's suite of tools to assist migrations. This repo contains any open-source MOLT tooling.

Certain packages can be re-used by external tools and are subject to the Apache License.

Build

All commands require molt to be built. Example:

# Build molt for the local machine (goes to artifacts/molt)
go build -o artifacts/molt .

# Cross compiling.
GOOS=linux GOARCH=amd64 go build -v -o artifacts/molt .

Verification

molt verify does the following:

  • Verifies that tables between the two data sources are the same.
  • Verifies that table column definitions between the two data sources are the same.
  • Verifies that tables contain the row values between data sources.

It currently supports PostgreSQL and MySQL comparisons with CockroachDB. It takes in two connection strings as arguments: source and target. source is assumed to be the source of truth.

# Compare postgres and CRDB instance.
molt verify \
  --source 'postgres://user:pass@url:5432/db' \
  --target 'postgres://root@localhost:26257?sslmode=disable'

# Compare mysql and CRDB instance.
molt verify \
  --source 'jdbc:mysql://root@tcp(localhost:3306)/defaultdb' \
  --target 'postgresql://root@127.0.0.1:26257/defaultdb?sslmode=disable'

See molt verify --help for all available parameters.

Filters

To verify specific tables or schemas, use --table-filter or --schema-filter.

Continuous verification

If you want all tables to be verified in a loop, you can use --continuous.

Live verification

If you expect data to change as you do data verification, you can use --live. This makes verifier re-check rows before marking them as problematic.

Limitations

  • MySQL enums and set types are not supported.
  • Supports only comparing one MySQL database vs a whole CRDB schema (which is assumed to be "public").
  • Geospatial types cannot yet be compared.
  • We do not handle schema changes between commands well.

Data movement

flowchart LR
    LegacyDB[legacy database<br/>i.e. PG, MySQL]
    S3[(Amazon S3)]
    GCP[(Google Cloud<br/>Bucket)]
    Local[(Local File Server)]
    CRDB[CockroachDB]

    LegacyDB -- CSV Dump --> S3
    LegacyDB -- CSV Dump --> GCP
    LegacyDB -- CSV Dump --> Local
    S3 -- IMPORT INTO<br/>or COPY FROM --> CRDB
    GCP -- IMPORT INTO<br/>or COPY FROM --> CRDB
    Local -- "IMPORT INTO (exposes HTTP server)"<br/>or COPY FROM --> CRDB

    LegacyDB -- COPY FROM --> CRDB
Loading

molt datamove is able to migrate data from your PG or MySQL tables to CockroachDB without taking your PG/MySQL tables offline. It takes --source and --target as arguments (see molt verify documentation above for examples).

It outputs a cdc_cursor which can be fed to CDC programs (e.g. cdc-sink, AWS DMS) to migrate live data without taking your database offline.

It currently supports the following:

  • Pulling a table, uploading CSVs to S3/GCP/local machine (--listen-addr must be set) and running IMPORT on Cockroach for you.
  • Pulling a table, uploading CSVs to S3/GCP/local machine and running COPY TO on Cockroach from that CSV.
  • Pulling a table and running COPY TO directly onto the CRDB table without an intermediate store.

By default, data is imported using IMPORT INTO. You can use --live if you need target data to be queriable during loading, which uses COPY FROM instead.

Data can be truncated automatically if run with --truncate.

For now, schemas must be identical on both sides. This is verified upfront - tables with mismatching columns may only be partially migrated.

Example invocations

S3 usage:

# Ensure access tokens are appropriately set in the environment.
export AWS_REGION='us-east-1'
export AWS_SECRET_ACCESS_KEY='key'
export AWS_ACCESS_KEY_ID='id'
# Ensure the S3 bucket is created and accessible from CRDB.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --s3-bucket 'otan-test-bucket' \
  --truncate \ # automatically truncate destination tables before importing 
  --cleanup # cleans up any created s3 files

GCP usage:

# Ensure credentials are loaded using `gcloud init`.
# Ensure the GCP bucket is created and accessible from CRDB.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --gcp-bucket 'otan-test-bucket' \
  --cleanup # cleans up any created gcp files

Using a direct COPY FROM without storing intermediate files:

molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --direct-copy

Storing CSVs locally before running COPY TO:

molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --live

Storing CSVs locally and running a file server:

# set --local-path-crdb-access-addr if the automatic IP detection is incorrect.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --local-path-listen-addr '0.0.0.0:9005'

Local Setup

Running Tests

  • Ensure a local postgres instance is setup and can be logged in using postgres://postgres:postgres@localhost:5432/testdb (this can be overriden with the POSTGRES_URL env var):
CREATE USER 'postgres' PASSWORD 'postgres' ADMIN;
CREATE DATABASE testdb;
  • Ensure a local, insecure CockroachDB instance is setup (this can be overriden with the COCKROACH_URL env var): cockroach demo --insecure --empty.
  • Ensure a local MySQL is setup with username root and an empty password, with a defaultdb database setup (this can be overriden with the MYSQL_URL env var):
CREATE DATABASE defaultdb;
  • Run the tests: go test ./....
    • Data-driven tests can be run with -rewrite, e.g. go test ./verification -rewrite.