data-extractor

Simple data extractor/shuffler

Runs a sql query against a database and saves the results in an azure storage container.

It can export the results as one of:

csv
json
jsonlines

It obtains its configuration from the following environment variables:

ETL_DB_URL: jdbc connection url (e.g. "jdbc:postgresql://localhost:5432")
ETL_DB_USER_FILE: file containing the db username relative to "/mnt/secrets" (e.g. data-extractor/aat-ccd-user)
ETL_DB_PASSWORD_FILE: file containing the db password relative to "/mnt/secrets" (e.g. data-extractor/aat-ccd-pwd)
ETL_SQL: sql statement to execute (e.g. "SELECT ID FROM parent WHERE ID = 1")
ETL_MSI_CLIENT_ID: pod identity client id to get credentials from keyvault and write access to blob storage.
ETL_ACCOUNT: Azure storage account where output should be saved (e.g. "devstoreaccount1")
ETL_CONTAINER: Azure storage container where output should be saved (e.g. "testcontainer")
ETL_FILE_TYPE: output file type. One of: jsonlines, csv, json (default "jsonlines")
ETL_FILE_PREFIX: prefix for the output file (e.g. "test01").

The 2 values: ETL_DB_USER_FILE and ETL_DB_PASSWORD_FILE are useful if the username and password are retrieved from Azure keyvault and exposed as flexvolumes. The same username and password can alternatively be passed as environment variables (ETL_DB_USER and ETL_DB_PASSWORD).

The output file obtained contains all the records generated by the query. File naming follows this convention: <prefix>-<datetime>.<type>

Helm chart

The easiest way to run a job is by using the included helm chart which is based on chart-job. This can be done running the following command: helm install hmcts/data-extractor-job --name data-extractor-job-001 --namespace mi -f job-values.yaml --wait where job-values.yaml is:

job:
  image: hmcts.azurecr.io/hmcts/data-extractor-job:prod-f888e665
  aadIdentityName: mi
  keyVaults:
    "data-extractor":
      resourceGroup: data-extractor
      secrets:
        - aat-ccdro-user
        - aat-ccdro-password
  labels:
    app.kubernetes.io/instance : data-extractor-job-001
    app.kubernetes.io/name: data-extractor-job
  environment:
    ETL_DB_URL: jdbc:postgresql://ccd-data-store-api-postgres-db-aat.postgres.database.azure.com:5432/ccd_data_store
    ETL_DB_USER_FILE: data-extractor/aat-ccdro-user
    ETL_DB_PASSWORD_FILE: data-extractor/aat-ccdro-password
    ETL_SQL: >
      SELECT id, created_date, event_id, summary, description, user_id, case_data_id,
      case_type_id, case_type_version, state_id, user_first_name, user_last_name,
      event_name, state_name, security_classification
      FROM case_event
      WHERE created_date >= (current_date-1 + time '00:00')
      AND created_date < (current_date + time '00:00')
      ORDER BY created_date ASC;
    ETL_ACCOUNT: midatastg
    ETL_CONTAINER: probate
    ETL_FILE_TYPE: jsonlines
    ETL_FILE_PREFIX: test01
    ETL_MSI_CLIENT_ID: 1461ff03-675c-423c-95a4-fb50d31254ff
global:
  job:
    kind: Job
  subscriptionId: "1c4f0704-a29e-403d-b719-b90c34ef14c9"
  tenantId: "531ff96d-0ae9-462a-8d2d-bec7c0b42082"
  environment: aat

For an example of how to run this using flux please see: flux github repo

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
charts/data-extractor-job		charts/data-extractor-job
config		config
gradle/wrapper		gradle/wrapper
lib		lib
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Jenkinsfile_CNP		Jenkinsfile_CNP
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-extractor

Helm chart

About

Releases

Packages

Languages

License

qzhou-hmcts/data-extractor

Folders and files

Latest commit

History

Repository files navigation

data-extractor

Helm chart

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages