prod-airflow
is designed to help get you started running Airflow in production.
-
Based on Python (3.7-slim) official Image python:3.7-slim and uses the official Postgres as backend and Redis as queue.
-
Following the Airflow (1.10.3) release from Python Package Index
This repository was originally forked from Puckel's docker-airflow repository.
- Unit / integration testing with Docker.
- Included a smoke test that checks the basics of all your DAG's.
- Easily add more tests of your own.
- Pre-made Airflow DAG's and charts to monitor Airflow performance and uptime.
- Easy debugging a production-like environment using
docker-compose
. - Basic authentication setup for running in production.
- Makefile for easy docker commands.
- Install Docker
- Install Docker Compose
Pull the image from the Docker repository.
docker pull rkells/prod-airflow
Pull a specific version. The image version uses the format <airflow version>-<prod-airflow version>
.
docker pull r-kells/prod-airflow:1.10.3-0.0.1
- ENV_FILE
- EXECUTOR
ENV_FILE: Environment Variable Handling
We use .env
files to manage docker environment variables.
This is configurable through specifying the environment variable ENV_FILE
.
The default file is dev.env, also included prod.env
make <command> ENV_FILE=prod.env
EXECUTOR: Executor Type
The default executor type is LocalExecutor for make test
and make debug
make <command> EXECUTOR=Celery
make build
Optionally install Extra Airflow Packages. Please modify the dockerfile.
The Dockerfile mounts your /test
, /dags
and /plugins
directories to $AIRFLOW_HOME
.
This helps run your tests in a similar environment to production.
By default, we use the docker-compose-LocalExecutor.yml
to start the
webserver and scheduler in the same container, and Postgres in another.
Therefore you can easily have tests that interact with the database.
make test
To use the Celery Executor:
make test EXECUTOR=Celery
Included tests
- test_dag_smoke.py
- Tests that all DAGs compile
- Verifies DAG bag loads in under 2 seconds.
- Verifies all DAGs have
email
andonwer
set.
- sensors/test_catchup_s3_key_sensor.py
- An example custom plugin unit test.
Coverage
make test
runs unittests with coverage, then prints the results.
Similar to testing, we run airflow with docker-compose to replicate a production environment.
make debug
# inspect logs
docker logs -f <containerId>
# jump into the running container
docker exec -it <containerId> bash
To debug the CeleryExecutor:
make debug EXECUTOR=Celery
- Airflow: localhost:8080
- Flower: localhost:5555
By default, docker-airflow runs Airflow with SequentialExecutor.
This can be modified by configuring the executor within an .env
file.
Keep in mind, if you change the EXECUTOR=Local or Celery
,
the entrypoint.sh will expect a database connection to be available.
To start the container in detached mode:
make run
To run an arbitrary airflow command on the image:
make cmd SERVICE="airflow list_dags"
- Airflow: localhost:8080
- Flower: localhost:5555
The init_airflow.py automatically sets up airflow Charts for monitoring.
Airflow Charts: localhost:8080/admin/chart/
The Canary DAG
The canary DAG runs every 5 minutes.
It should have a connection check with a simple SQL query (e.g.” SELECT 1”) for all the critical data sources. By default its connected to the default postgres setup.
The “canary” DAG helps to answer the following questions:
- Do all critical connections work.
- How long it takes for the Airflow scheduler to schedule the task (scheduled execution_time — current_time).
- How long the task runs.
Add Airflow ENV variables to .env
files and reference them with docker
See Airflow documentation for more details
For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :
docker run rkells/prod-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"
The config prod.env enables basic password authentication for Airflow. Even if you are behind other security walls, this authentication is useful because of the ability to filter DAGs by owner.
See the documentation for setup and other details.
If you want to use Ad hoc query, make sure you've configured connections: By default the DAG init_airflow.py will setup a connection to postgres.
To add other connections: Go to Admin -> Connections and Edit: set the values (equivalent to values in airflow.cfg/docker-compose*.yml) :
- Host : postgres
- Schema : airflow
- Login : airflow
- Password : airflow
The init_airflow.py DAG runs once and is intended to help configure airflow to bootstrap a new installation, or setup for testing.
As currently configured:
- Creates a connection to postgres called
my_postgres
from the `Ad-hoc query UI. - Creates a pool
mypool
with 10 slots. - Creates the monitoring charts.
You are encouraged to extend this DAG for reproducible setup.
Documentation on plugins can be found here
An example plugin can be found here, along with it's unit tests.
- Create a file "requirements.txt" with the desired python modules
- The entrypoint.sh script will execute the pip install command (with --user option)
Alternatively, build your image with your desired packages