This repository contains a Docker image that contains all the scripts and programs needed to download data from a source and upload them into a LakeFS instance, and a Helm chart for setting up a CronJob to execute this Docker image at a regular schedule on Sterling. For simplicity's sake, there is only a single Dockerfile (that contains all the scripts and all the programs needed) and only a single Helm chart (with different values.yaml files to choose which ingest you want to set up), but in the future this may need to be reorganized.
- Copy values-secret.yaml.txt to
charts/dug-data-ingest/values-secret.yaml
and add the authentication details for the LakeFS server. - Helm install the chart in
charts/dug-data-ingest
with three values files:values.yaml
, with the default settings.values-secret.yaml
, with the authentication details.values/*.yaml
corresponding to the Dug Data Ingest you want to set up, such asvalues/bdc-ingest.yaml
for the BDC ingest.
- Create a new directory for your ingest in the scripts directory, e.g.
scripts/babel
. - Add the scripts needed to run an ingest in this directory. We usually have an
ingest.sh
script to run an ingest, but you can call it whatever you like. - Add any Python requirements to requirements.txt.
- Add any Alpine requirements to Dockerfile.
- Add a values file in the
charts/dug-data-ingest/values
directory, e.g.babel-ingest.yaml
. At a minimum this should provide a name for the ingest job and provide the name of the script to be executed bybash
.- You may need to modify the
charts/dug-data-ingest/templates
files; if so, please make sure you don't break the other ingests!
- You may need to modify the
- Add an
on:push
trigger to.github/workflows/release-docker-to-renci-containers.yaml
to generate a container named after your branch, then use this tag to test your new CronJob, then remove them once you're done.