prod-airflow

prod-airflow is designed to help get you started running Airflow in production.

Based on Python (3.7-slim) official Image python:3.7-slim and uses the official Postgres as backend and Redis as queue.
Following the Airflow (1.10.3) release from Python Package Index

This repository was originally forked from Puckel's docker-airflow repository.

prod-airflow

Features

Unit / integration testing with Docker.
- Included a smoke test that checks the basics of all your DAG's.
- Easily add more tests of your own.
Pre-made Airflow DAG's and charts to monitor Airflow performance and uptime.
Easy debugging a production-like environment using docker-compose.
Basic authentication setup for running in production.
Makefile for easy docker commands.

Installation

Install Docker
Install Docker Compose

Pull the image from the Docker repository.

docker pull rkells/prod-airflow

Pull a specific version. The image version uses the format <airflow version>-<prod-airflow version>.

docker pull r-kells/prod-airflow:1.10.3-0.0.1

Makefile Configuration Options

ENV_FILE
EXECUTOR

ENV_FILE: Environment Variable Handling

We use .env files to manage docker environment variables. This is configurable through specifying the environment variable ENV_FILE. The default file is dev.env, also included prod.env

make <command> ENV_FILE=prod.env

EXECUTOR: Executor Type

The default executor type is LocalExecutor for make test and make debug

make <command> EXECUTOR=Celery

Build

make build

Optionally install Extra Airflow Packages. Please modify the dockerfile.

Test

The Dockerfile mounts your /test, /dags and /plugins directories to $AIRFLOW_HOME. This helps run your tests in a similar environment to production.

By default, we use the docker-compose-LocalExecutor.yml to start the webserver and scheduler in the same container, and Postgres in another.

Therefore you can easily have tests that interact with the database.

	make test

To use the Celery Executor:

make test EXECUTOR=Celery

Included tests

test_dag_smoke.py
- Tests that all DAGs compile
- Verifies DAG bag loads in under 2 seconds.
- Verifies all DAGs have email and onwer set.
sensors/test_catchup_s3_key_sensor.py
- An example custom plugin unit test.

Coverage

make test runs unittests with coverage, then prints the results.

Debug

Similar to testing, we run airflow with docker-compose to replicate a production environment.

make debug
# inspect logs
docker logs -f <containerId>
# jump into the running container
docker exec -it <containerId> bash

To debug the CeleryExecutor:

make debug EXECUTOR=Celery

Airflow: localhost:8080
Flower: localhost:5555

Run

By default, docker-airflow runs Airflow with SequentialExecutor. This can be modified by configuring the executor within an .env file. Keep in mind, if you change the EXECUTOR=Local or Celery, the entrypoint.sh will expect a database connection to be available.

To start the container in detached mode:

make run

To run an arbitrary airflow command on the image:

make cmd SERVICE="airflow list_dags"

Airflow: localhost:8080
Flower: localhost:5555

Monitoring

The init_airflow.py automatically sets up airflow Charts for monitoring.

Airflow Charts: localhost:8080/admin/chart/

The Canary DAG

The canary DAG runs every 5 minutes.

It should have a connection check with a simple SQL query (e.g.” SELECT 1”) for all the critical data sources. By default its connected to the default postgres setup.

The “canary” DAG helps to answer the following questions:

Do all critical connections work.
How long it takes for the Airflow scheduler to schedule the task (scheduled execution_time — current_time).
How long the task runs.

Configurating Airflow

Environment Variables

Add Airflow ENV variables to .env files and reference them with docker See Airflow documentation for more details

Fernet Key

For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :

docker run rkells/prod-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"

Authentication

The config prod.env enables basic password authentication for Airflow. Even if you are behind other security walls, this authentication is useful because of the ability to filter DAGs by owner.

See the documentation for setup and other details.

Ad hoc query / Connections

If you want to use Ad hoc query, make sure you've configured connections: By default the DAG init_airflow.py will setup a connection to postgres.

To add other connections: Go to Admin -> Connections and Edit: set the values (equivalent to values in airflow.cfg/docker-compose*.yml) :

Host : postgres
Schema : airflow
Login : airflow
Password : airflow

The init_airflow.py DAG

The init_airflow.py DAG runs once and is intended to help configure airflow to bootstrap a new installation, or setup for testing.

As currently configured:

Creates a connection to postgres called my_postgres from the `Ad-hoc query UI.
Creates a pool mypool with 10 slots.
Creates the monitoring charts.

You are encouraged to extend this DAG for reproducible setup.

Custom Airflow plugins

Documentation on plugins can be found here

An example plugin can be found here, along with it's unit tests.

Install custom python package

Create a file "requirements.txt" with the desired python modules
The entrypoint.sh script will execute the pip install command (with --user option)

Alternatively, build your image with your desired packages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prod-airflow

Features

Installation

Makefile Configuration Options

Build

Test

Debug

Run

Monitoring

Configurating Airflow

Environment Variables

Fernet Key

Authentication

Ad hoc query / Connections

The init_airflow.py DAG

Custom Airflow plugins

Install custom python package

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.circleci		.circleci
config		config
dags		dags
plugins		plugins
script		script
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dev.env		dev.env
docker-compose-CeleryExecutor.yml		docker-compose-CeleryExecutor.yml
docker-compose-LocalExecutor.yml		docker-compose-LocalExecutor.yml
prod.env		prod.env
setup.cfg		setup.cfg

License

r-kells/prod-airflow

Folders and files

Latest commit

History

Repository files navigation

prod-airflow

Features

Installation

Makefile Configuration Options

Build

Test

Debug

Run

Monitoring

Configurating Airflow

Environment Variables

Fernet Key

Authentication

Ad hoc query / Connections

The init_airflow.py DAG

Custom Airflow plugins

Install custom python package

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages