Metadata storage for jobs/workflows #240

anjensan · 2021-04-15T15:39:24Z

Metadata storage for bigflow jobs/workflows

There are several usecases for simple document/key-value storage

Save (append) information about executed workflows/jobs.
ID, run-time, docker hash, execution time, cost estimate, result etc...
Basically some sort of structured logs, which may be used to
see execution history & do some cost estimation (manually)
Query for running workflows/jobs, their status (history and/or curenly running workflows)

bigflow history -w workflow_id
Such cli api migh be a first step towards "airflow-free" solution
(aka ability to replace airflow with custom cron-like service)
Communicate between taks/workflows.
In some rare cases one workflow migh want to check status of another.
Also workflow migh check if another instance is currently running.
This especially important for dev-like environments, where
workflows are executed locally (via bigflow run).
Persist some information between tasks/jobs.
Like 'last-processed-id' (for incremental processing),
last time-per-batch (to auto-adjust batch-size) etc.

Database - anything for 1. BigQuery / any-sql-like DB for 1/2/3/4.

Client visible API - TBD.

Provide feedback