-
Notifications
You must be signed in to change notification settings - Fork 4
Getting Started
This section is a get-started guide for configuring Recce with actual databases for reconciliation.
Before proceeding with the steps below, trying out Recce with the example scenario provided can help with understanding how Recce works.
Recce is currently only published as a container image to a private GitHub Container Registry (GHCR) repo. Pulling officially built Docker images locally requires some additional setup to authenticate with the GHCR.
-
Generate a personal access token in your account with
packages:read
permission. -
Use Configure SSO to authorize the token for SSO access via the organisation
-
Login with something like this below (see here for details)
echo "ghp_REST_OF_TOKEN" | docker login https://ghcr.io -u my-github-username --password-stdin
-
Pull Recce's docker image from GHCR
docker pull ghcr.io/thoughtworks-sea/recce-server
-
You should be able to run Recce with the docker image locally, using this repository only for setting up a DB for Recce, and an example scenario.
# Run in one shell - starts a DB for Recce, and an example scenario ./batect run-deps # Run in another shell - runs Recce docker run -p 8080:8080 \ -v $(pwd)/examples/scenario/petshop-mysql:/config \ -e MICRONAUT_CONFIG_FILES=/config/application-petshop-mysql.yml \ -e DATABASE_HOST=host.docker.internal \ -e R2DBC_DATASOURCES_SOURCE_URL=r2dbc:pool:mysql://host.docker.internal:8000/db \ -e R2DBC_DATASOURCES_TARGET_URL=r2dbc:pool:mysql://host.docker.internal:8001/db \ ghcr.io/thoughtworks-sea/recce-server:latest
Recce is configured by adding datasources and datasets that you wish to reconcile. As a Micronaut application, much of Recce's configuration is open for hacking and can be expressed in multiple ways.
For this guide, it will take the recommended way of creating additional configuration and loading it into Recce through MICRONAUT_CONFIG_FILES.
Create a new YAML file inside the project, e.g. my-dataset-configs/config1.yml
mkdir -p my-dataset-configs
touch my-dataset-configs/config1.yml
Inside the newly created yaml file, configure the username and password for authentication
auth:
username: admin
password: admin
This configures the credentials used in basic authentication to protect the API endpoints. In this case, the username and password are both set to admin
.
Add all databases involved in reconciliation under the r2dbc.datasources
block of your configuration file. Multiple data sources can be configured for connection. For more details, visit the section on configuring datasources.
r2dbc:
datasources:
my-source-db: # Name your datasource anything you want, other than "default"
url: r2dbc:pool:mysql://source-db:3306/db # R2DBC URL for your database r2dbc:pool:DB_TYPE://DB_HOST:DB_PORT/DB_NAME
username: user
password: password
my-target-db:
url: r2dbc:pool:mysql://target-db:3306/db
username: user
password: password
In this case, two MySQL databases named my-source-db
and my-target-db
are added.
Add the various datasets for reconciliation under the reconciliation.datasets
block of your configuration file.
Each dataset has a source and target for reconciliation, where it will run the sql query
on the database referenced in datasourceRef
.
For more details, visit the section on configuring datasets.
reconciliation:
datasets:
pets: # Name your datasets however you would like
source:
# Reference to a datasource defined in `r2dbc.datasources`
datasourceRef: my-source-db
# Any SQL query to evaluate against the source DB
query: >
SELECT pet.id AS MigrationKey, category, name, status
FROM pet
target:
# Reference to a datasource defined in `r2dbc.datasources`
datasourceRef: my-target-db
# Any SQL query to evaluate against the source DB
query: >
SELECT pet.id AS MigrationKey, category.name AS category, pet.name, status
FROM pet INNER JOIN category ON pet.category_id = category.id
# Optional scheduling of regular or one-of reconciliations
schedule:
# Must adhere to format https://docs.micronaut.io/latest/api/io/micronaut/scheduling/cron/CronExpression.html
# or https://crontab.guru/ (without seconds)
cronExpression: 0 0 * * *
In the code above, one dataset named pets
was configured, reconciling between the source database my-source-db
and my-target-db
.
The query
configuration under datasets
have some constraints as to how it should be written. For more details, visit the section on writing dataset queries.
-
Recce needs to know which column represents a unique identifier for the row that should be consistent between
source
andtarget
and implies these rows represent the same entity. To do this, designate a column by naming it asMigrationKey
(case insensitive).SELECT natural_id AS MigrationKey, some, other, columns FROM my_table
-
Currently Recce ignores names of columns other than the
MigrationKey
column. That means that the order of columns is critical and must match between your two queries. If the column in position 3 represents datumX
in thesource
dataset, then the column in position 3 in thetarget
dataset should also represent the same datum.
Pass the configuration file's location my-dataset-configs/config1.yml
to Recce through the environment variable MICRONAUT_CONFIG_FILES
docker run -p 8080:8080 \
-v $(pwd)/my-dataset-configs:/config \
-e MICRONAUT_CONFIG_FILES=/config/config1.yml \
ghcr.io/thoughtworks-sea/recce-server:latest
-
Explore and trigger runs via Recce's APIs, accessible via interactive UI at http://localhost:8080/rapidoc.
-
Some non-exhaustive examples are included below, but fuller documentation is available via the UI.
-
Synchronously trigger a run, waiting for it to complete via UI or
curl -X POST http://localhost:8080/runs -H 'Content-Type: application/json' -d '{ "datasetId": "pets" }' -u "username:password"
-
Retrieve details of an individual run by ID for a dataset via UI, or
curl 'http://localhost:8080/runs/35' -u "username:password"
-
Retrieve details of recent runs for a dataset via UI, or
curl 'http://localhost:8080/runs?datasetId=categories' -u "username:password"
-