Skip to content

Getting Started

jiawen-tw edited this page Apr 6, 2022 · 3 revisions

Getting Started

This section is a get-started guide for configuring Recce with actual databases for reconciliation.

Before proceeding with the steps below, trying out Recce with the example scenario provided can help with understanding how Recce works.

Step 1: Getting Recce’s docker image

Recce is currently only published as a container image to a private GitHub Container Registry (GHCR) repo. Pulling officially built Docker images locally requires some additional setup to authenticate with the GHCR.

  1. Generate a personal access token in your account with packages:read permission.

  2. Use Configure SSO to authorize the token for SSO access via the organisation

  3. Login with something like this below (see here for details)

    echo "ghp_REST_OF_TOKEN" | docker login https://ghcr.io -u my-github-username --password-stdin
  4. Pull Recce's docker image from GHCR

    docker pull ghcr.io/thoughtworks-sea/recce-server
  5. You should be able to run Recce with the docker image locally, using this repository only for setting up a DB for Recce, and an example scenario.

    # Run in one shell - starts a DB for Recce, and an example scenario
    ./batect run-deps
      
    # Run in another shell - runs Recce
    docker run -p 8080:8080 \
    -v $(pwd)/examples/scenario/petshop-mysql:/config \
    -e MICRONAUT_CONFIG_FILES=/config/application-petshop-mysql.yml \
    -e DATABASE_HOST=host.docker.internal \
    -e R2DBC_DATASOURCES_SOURCE_URL=r2dbc:pool:mysql://host.docker.internal:8000/db \
    -e R2DBC_DATASOURCES_TARGET_URL=r2dbc:pool:mysql://host.docker.internal:8001/db \
    ghcr.io/thoughtworks-sea/recce-server:latest

Step 2: Configuring Recce

Recce is configured by adding datasources and datasets that you wish to reconcile. As a Micronaut application, much of Recce's configuration is open for hacking and can be expressed in multiple ways.

For this guide, it will take the recommended way of creating additional configuration and loading it into Recce through MICRONAUT_CONFIG_FILES.

Create a new YAML file inside the project, e.g. my-dataset-configs/config1.yml

mkdir -p my-dataset-configs
touch my-dataset-configs/config1.yml

Step 2a: Configuring authentication

Inside the newly created yaml file, configure the username and password for authentication

auth:
  username: admin
  password: admin

This configures the credentials used in basic authentication to protect the API endpoints. In this case, the username and password are both set to admin.

Step 2b: Configuring data sources

Add all databases involved in reconciliation under the r2dbc.datasources block of your configuration file. Multiple data sources can be configured for connection. For more details, visit the section on configuring datasources.

r2dbc:
  datasources:
    my-source-db: # Name your datasource anything you want, other than "default"
      url: r2dbc:pool:mysql://source-db:3306/db # R2DBC URL for your database r2dbc:pool:DB_TYPE://DB_HOST:DB_PORT/DB_NAME
      username: user
      password: password
    my-target-db:
      url: r2dbc:pool:mysql://target-db:3306/db
      username: user
      password: password

In this case, two MySQL databases named my-source-db and my-target-db are added.

Step 2c: Configuring datasets

Add the various datasets for reconciliation under the reconciliation.datasets block of your configuration file. Each dataset has a source and target for reconciliation, where it will run the sql query on the database referenced in datasourceRef.

For more details, visit the section on configuring datasets.

reconciliation:
  datasets:
    pets: # Name your datasets however you would like
      source:
        # Reference to a datasource defined in `r2dbc.datasources`
        datasourceRef: my-source-db
        # Any SQL query to evaluate against the source DB
        query: >
          SELECT pet.id AS MigrationKey, category, name, status
          FROM pet
      target:
        # Reference to a datasource defined in `r2dbc.datasources`  
        datasourceRef: my-target-db
        # Any SQL query to evaluate against the source DB
        query: >
          SELECT pet.id AS MigrationKey, category.name AS category, pet.name, status
          FROM pet INNER JOIN category ON pet.category_id = category.id
      # Optional scheduling of regular or one-of reconciliations
      schedule:
        # Must adhere to format https://docs.micronaut.io/latest/api/io/micronaut/scheduling/cron/CronExpression.html
        # or https://crontab.guru/ (without seconds)
        cronExpression: 0 0 * * *

In the code above, one dataset named pets was configured, reconciling between the source database my-source-db and my-target-db.

Step 2d: Configuring dataset queries

The query configuration under datasets have some constraints as to how it should be written. For more details, visit the section on writing dataset queries.

  1. Recce needs to know which column represents a unique identifier for the row that should be consistent between source and target and implies these rows represent the same entity. To do this, designate a column by naming it as MigrationKey (case insensitive).

    SELECT natural_id AS MigrationKey, some, other, columns
    FROM my_table
  2. Currently Recce ignores names of columns other than the MigrationKey column. That means that the order of columns is critical and must match between your two queries. If the column in position 3 represents datum X in the source dataset, then the column in position 3 in the target dataset should also represent the same datum.

Step 3: Run Recce with configuration

Pass the configuration file's location my-dataset-configs/config1.yml to Recce through the environment variable MICRONAUT_CONFIG_FILES

docker run -p 8080:8080 \
  -v $(pwd)/my-dataset-configs:/config \
  -e MICRONAUT_CONFIG_FILES=/config/config1.yml \
  ghcr.io/thoughtworks-sea/recce-server:latest

Step 4: Explore the APIs

  • Explore and trigger runs via Recce's APIs, accessible via interactive UI at http://localhost:8080/rapidoc.

  • Some non-exhaustive examples are included below, but fuller documentation is available via the UI.

    • Synchronously trigger a run, waiting for it to complete via UI or

      curl -X POST http://localhost:8080/runs -H 'Content-Type: application/json' -d '{ "datasetId": "pets" }' -u "username:password"
    • Retrieve details of an individual run by ID for a dataset via UI, or

      curl 'http://localhost:8080/runs/35' -u "username:password"
    • Retrieve details of recent runs for a dataset via UI, or

      curl 'http://localhost:8080/runs?datasetId=categories' -u "username:password"