Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Production-ready docker compose for the production image #8605

Closed
3 tasks
potiuk opened this issue Apr 28, 2020 · 43 comments
Closed
3 tasks

Add Production-ready docker compose for the production image #8605

potiuk opened this issue Apr 28, 2020 · 43 comments
Labels
area:production-image Production image improvements and fixes kind:feature Feature Requests
Milestone

Comments

@potiuk
Copy link
Member

potiuk commented Apr 28, 2020

Description

In order to use the production image we are already working on a helm chart, but we might want to add a production-ready docker compose that will be able to run airflow installation.

Use case / motivation

For local tests/small deployments - being able to have such docker-compose environment would be really nice.

We seem to get to consensus that we need to have several docker-compose "sets" of files:

  • Local Executor
  • Celery Executor
  • Kubernetes Executor (??? do we need to have a Kubernetes Executor in a Compose ? I guess not...)

They should be varianted and possible to specify the number of parameters:

  • Database (Postgres/MySQL)
  • Redis vs. Rabitmq (should we choose one ???)
  • Ports
  • Volumes (persistent / not)
  • Airflow Images
  • Fernet Key
  • RBAC

Depending on the setup, those Docker compose file should do proper DB initialisation.


Example Docker Compose (From https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1587748008106000) that we might use as a base and #8548 . This is just example so this issue will not implement all of it and we will likely split those docker-compose into separate postgres/sqlite/mysql similarly as we do in CI script, so I wanted to keep it as separate issue - we will deal with user creation in #8606

version: '3'
services:
  postgres:
    image: postgres:latest
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=airflow
      - POSTGRES_PORT=5432
    ports:
      - 5432:5432
  redis:
    image: redis:latest
    ports:
      - 6379:6379
  flower:
    image: apache/airflow:1.10.10
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: flower
    ports:
      - 5555:5555
  airflow:
    image: apache/airflow:1.10.10
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: webserver
    ports:
      - 8080:8080
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
      - ./airflow-data/logs:/opt/airflow/logs
      - ./airflow-data/plugins:/opt/airflow/plugins
  airflow-scheduler:
    image: apache/airflow:1.10.10
    container_name: airflow_scheduler_cont
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: scheduler
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
      - ./airflow-data/logs:/opt/airflow/logs
      - ./airflow-data/plugins:/opt/airflow/plugins
  airflow-worker1:
    image: apache/airflow:1.10.10
    container_name: airflow_worker1_cont
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: worker
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
      - ./airflow-data/logs:/opt/airflow/logs
      - ./airflow-data/plugins:/opt/airflow/plugins
  airflow-worker2:
    image: apache/airflow:1.10.10
    container_name: airflow_worker2_cont
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: worker
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
      - ./airflow-data/logs:/opt/airflow/logs
      - ./airflow-data/plugins:/opt/airflow/plugins
  airflow-worker3:
    image: apache/airflow:1.10.10
    container_name: airflow_worker3_cont
    environment:
      - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
      - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
      - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
      - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
      - AIRFLOW__CORE__LOAD_EXAMPLES=False
      - AIRFLOW__WEBSERVER__RBAC=True
    command: worker
    volumes:
      - ./airflow-data/dags:/opt/airflow/dags
      - ./airflow-data/logs:/opt/airflow/logs
      - ./airflow-data/plugins:/opt/airflow/plugins

Another example from https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1587679356095400:

version: '3.7'
networks:
  airflow:
    name: airflow
    attachable: true
volumes:
  logs:
x-database-env: 
  &database-env
  POSTGRES_USER: airflow
  POSTGRES_DB: airflow
  POSTGRES_PASSWORD: airflow
x-airflow-env: 
  &airflow-env
  AIRFLOW__CORE__EXECUTOR: CeleryExecutor
  AIRFLOW__WEBSERVER__RBAC: 'True'
  AIRFLOW__CORE__CHECK_SLAS: 'False'
  AIRFLOW__CORE__STORE_SERIALIZED_DAGS: 'False'
  AIRFLOW__CORE__PARALLELISM: 50
  AIRFLOW__CORE__LOAD_EXAMPLES: 'False'
  AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS: 'False'
  AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC: 10
  
services:
  postgres:
    image: postgres:11.5
    environment:
      <<: *database-env
      PGDATA: /var/lib/postgresql/data/pgdata
    ports:
      - 5432:5432
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./database/data:/var/lib/postgresql/data/pgdata
      - ./database/logs:/var/lib/postgresql/data/log
    command: >
     postgres
       -c listen_addresses=*
       -c logging_collector=on
       -c log_destination=stderr
       -c max_connections=200
    networks:
      - airflow
  redis:
    image: redis:5.0.5
    environment:
      REDIS_HOST: redis
      REDIS_PORT: 6379
    ports:
      - 6379:6379
    networks:
      - airflow
  webserver:
    image: airflow:1.10.10
    user: airflow
    ports:
      - 8090:8080
    volumes:
      - ./dags:/opt/airflow/dags
      - logs:/opt/airflow/logs
      - ./files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      <<: *database-env
      <<: *airflow-env
      ADMIN_PASSWORD: airflow
    depends_on:
      - postgres
      - redis
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
    networks:
      - airflow
  flower:
    image: airflow:1.10.10
    user: airflow
    ports:
      - 5555:5555
    depends_on:
      - redis
    volumes:
      - logs:/opt/airflow/logs
    command: flower
    networks:
      - airflow
  scheduler:
    image: airflow:1.10.10
    volumes:
      - ./dags:/opt/airflow/dags
      - logs:/opt/airflow/logs
      - ./files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      <<: *database-env
    command: scheduler
    networks:
      - airflow
  worker:
    image: airflow:1.10.10
    user: airflow
    volumes:
      - ./dags:/opt/airflow/dags
      - logs:/opt/airflow/logs
      - ./files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      <<: *database-env
    command: worker
    depends_on:
      - scheduler

Related issues
The initial user creation #8606, #8548
Quick start documentation planned in #8542

@potiuk potiuk added kind:feature Feature Requests area:production-image Production image improvements and fixes labels Apr 28, 2020
@potiuk potiuk changed the title Add Production-ready docker compose for The production image Add Production-ready docker compose for the production image Apr 28, 2020
@potiuk
Copy link
Member Author

potiuk commented Apr 28, 2020

This one duplicates #8548 a bit - but I want to leave it for a while as I wanted to split it into smaller functional pieces.

@kaxil
Copy link
Member

kaxil commented Apr 28, 2020

It would be nice to have this in "Quick Start Guide when using Docker Image" too. WDYT

@potiuk
Copy link
Member Author

potiuk commented Apr 28, 2020

Absolutely. It's already planned in #8542 :)

@potiuk
Copy link
Member Author

potiuk commented Apr 28, 2020

Added missing label :)

@turbaszek turbaszek linked a pull request Apr 29, 2020 that will close this issue
6 tasks
@turbaszek turbaszek removed a link to a pull request Apr 29, 2020
6 tasks
@habibdhif
Copy link

Here is another example of a Docker Compose that I've been working on. The Compose defines multiple services to run Airflow.
There is an init service which is an ephemeral container to initialize the database and creates a user if necessary.
The init service command tries to run airflow list_users and if it fails it initializes the database and creates a user. Different approaches were considered but this one is simple enough and only involves airflow commands (no database-specific commands).

Extension fields are used for airflow environment variables to reduce code duplication.

I added a Makefile along the docker-compose.yml in my repo so all you have to do to run the docker-compose is run make run.

version: "3.7"
x-airflow-environment: &airflow-environment
  AIRFLOW__CORE__EXECUTOR: CeleryExecutor
  AIRFLOW__WEBSERVER__RBAC: "True"
  AIRFLOW__CORE__LOAD_EXAMPLES: "False"
  AIRFLOW__CELERY__BROKER_URL: "redis://:@redis:6379/0"
  AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow

services:
  postgres:
    image: postgres:11.5
    environment:
      POSTGRES_USER: airflow
      POSTGRES_DB: airflow
      POSTGRES_PASSWORD: airflow
  redis:
    image: redis:5
    environment:
      REDIS_HOST: redis
      REDIS_PORT: 6379
    ports:
      - 6379:6379
  init:
    image: apache/airflow:1.10.10
    environment:
      <<: *airflow-environment
    depends_on:
      - redis
      - postgres
    volumes:
      - ./dags:/opt/airflow/dags
    entrypoint: /bin/bash
    command: >
      -c "airflow list_users || (airflow initdb
      && airflow create_user --role Admin --username airflow --password airflow -e airflow@airflow.com -f airflow -l airflow)"
    restart: on-failure
  webserver:
    image: apache/airflow:1.10.10
    ports:
      - 8080:8080
    environment:
      <<: *airflow-environment
    depends_on:
      - init
    volumes:
      - ./dags:/opt/airflow/dags
    command: "webserver"
    restart: always
  flower:
    image: apache/airflow:1.10.10
    ports:
      - 5555:5555
    environment:
      <<: *airflow-environment
    depends_on:
      - redis
    command: flower
    restart: always
  scheduler:
    image: apache/airflow:1.10.10
    environment:
      <<: *airflow-environment
    depends_on:
      - webserver
    volumes:
      - ./dags:/opt/airflow/dags
    command: scheduler
    restart: always
  worker:
    image: apache/airflow:1.10.10
    environment:
      <<: *airflow-environment
    depends_on:
      - scheduler
    volumes:
      - ./dags:/opt/airflow/dags
    command: worker
    restart: always

@infused-kim
Copy link
Contributor

Here's my docker-compose config using LocalExecutor...

docker-compose.airflow.yml:

version: '2.1'
services:
    airflow:
        # image: apache/airflow:1.10.10
        build:
            context: .
            args:
                - DOCKER_UID=${DOCKER_UID-1000} 
            dockerfile: Dockerfile
        restart: always
        environment:
            - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:${POSTGRES_PW-airflow}@postgres:5432/airflow
            - AIRFLOW__CORE__FERNET_KEY=${AF_FERNET_KEY-GUYoGcG5xdn5K3ysGG3LQzOt3cc0UBOEibEPxugDwas=}
            - AIRFLOW__CORE__EXECUTOR=LocalExecutor
            - AIRFLOW__CORE__AIRFLOW_HOME=/opt/airflow/
            - AIRFLOW__CORE__LOAD_EXAMPLES=False
            - AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
            - AIRFLOW__CORE__LOGGING_LEVEL=${AF_LOGGING_LEVEL-info}
        volumes:
            - ../airflow/dags:/opt/airflow/dags:z
            - ../airflow/plugins:/opt/airflow/plugins:z
            - ./volumes/airflow_data_dump:/opt/airflow/data_dump:z
            - ./volumes/airflow_logs:/opt/airflow/logs:z
        healthcheck:
            test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

docker-compose.yml:

version: '2.1'
services:
    postgres:
        image: postgres:9.6
        container_name: af_postgres
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=${POSTGRES_PW-airflow}
            - POSTGRES_DB=airflow
            - PGDATA=/var/lib/postgresql/data/pgdata
        volumes:
            - ./volumes/postgres_data:/var/lib/postgresql/data/pgdata:Z
        ports:
            -  127.0.0.1:5432:5432

    webserver:
        extends:
            file: docker-compose.airflow.yml
            service: airflow
        container_name: af_webserver
        command: webserver
        depends_on:
            - postgres
        ports:
            - ${DOCKER_PORTS-8080}
        networks:
            - proxy
            - default
        environment:
            # Web Server Config
            - AIRFLOW__WEBSERVER__DAG_DEFAULT_VIEW=graph
            - AIRFLOW__WEBSERVER__HIDE_PAUSED_DAGS_BY_DEFAULT=true
            - AIRFLOW__WEBSERVER__RBAC=true

            # Web Server Performance tweaks
            # 2 * NUM_CPU_CORES + 1
            - AIRFLOW__WEBSERVER__WORKERS=${AF_WORKERS-2}
            # Restart workers every 30min instead of 30seconds
            - AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
        labels:
            - "traefik.enable=true"
            - "traefik.http.routers.airflow.rule=Host(`af.example.com`)"
            - "traefik.http.routers.airflow.middlewares=admin-auth@file"

    scheduler:
        extends:
            file: docker-compose.airflow.yml
            service: airflow
        container_name: af_scheduler
        command: scheduler
        depends_on:
            - postgres
        environment:
            # Performance Tweaks
            # Reduce how often DAGs are reloaded to dramatically reduce CPU use
            - AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=${AF_MIN_FILE_PROCESS_INTERVAL-60} 
            - AIRFLOW__SCHEDULER__MAX_THREADS=${AF_THREADS-1}

networks:
    proxy:
        external: true

Dockerfile:

# Custom Dockerfile
FROM apache/airflow:1.10.10

# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
    && apt-get install -y gcc freetds-dev \
    && apt-get install -y git procps \ 
    && apt-get install -y vim
RUN pip install apache-airflow[mssql,mssql,ssh,s3,slack] 
RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client \
    && pip install git+https://github.com/infusionsoft/Official-API-Python-Library.git \
    && pip install rocketchat_API

# This fixes permission issues on linux. 
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
    : "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
    && usermod -u ${DOCKER_UID} airflow \
    && find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
    && echo "Set airflow's uid to ${DOCKER_UID}"

USER airflow

Makefile

And here's my Makefile to control it the containers like make run:

SERVICE = "scheduler"
TITLE = "airflow containers"
ACCESS = "http://af.example.com"

.PHONY: run

build:
	docker-compose build

run:
	@echo "Starting $(TITLE)"
	docker-compose up -d
	@echo "$(TITLE) running on $(ACCESS)"

runf:
	@echo "Starting $(TITLE)"
	docker-compose up

stop:
	@echo "Stopping $(TITLE)"
	docker-compose down

restart: stop print-newline run

tty:
	docker-compose run --rm --entrypoint='' $(SERVICE) bash

ttyr:
	docker-compose run --rm --entrypoint='' -u root $(SERVICE) bash

attach:
	docker-compose exec $(SERVICE) bash

attachr:
	docker-compose exec -u root $(SERVICE) bash

logs:
	docker-compose logs --tail 50 --follow $(SERVICE)

conf:
	docker-compose config

initdb:
	docker-compose run --rm $(SERVICE) initdb

upgradedb:
	docker-compose run --rm $(SERVICE) upgradedb

print-newline:
	@echo ""
	@echo ""

@potiuk potiuk self-assigned this May 10, 2020
@wittfabian
Copy link
Contributor

@potiuk Is this the preferred way to add dependencies (airflow-mssql)?

# Custom Dockerfile
FROM apache/airflow:1.10.10

# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
    && apt-get install -y gcc freetds-dev \
    && apt-get install -y git procps \ 
    && apt-get install -y vim
RUN pip install apache-airflow[mssql,mssql,ssh,s3,slack] 
RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client \
    && pip install git+https://github.com/infusionsoft/Official-API-Python-Library.git \
    && pip install rocketchat_API

# This fixes permission issues on linux. 
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
    : "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
    && usermod -u ${DOCKER_UID} airflow \
    && find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
    && echo "Set airflow's uid to ${DOCKER_UID}"

USER airflow

@potiuk
Copy link
Member Author

potiuk commented May 14, 2020

I the preferred way will be to set properly AIRFLOW_EXTRAS variable and pass them as --build-arg

They are defined like that in the Dockerfile:

ARG AIRFLOW_EXTRAS="async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv"

and when building the dockerfile you can set them as --build-arg AIRFLOW_EXTRAS="...."

I think that maybe it's worth to have "additional extras" and append them though

@infused-kim
Copy link
Contributor

Oh, that's super cool.
But for that you have to rebuild the entire airflow image? Can you just add the build arg in the docker-compose and it will propagate through to the published airflow image?

@potiuk
Copy link
Member Author

potiuk commented May 14, 2020

You should also be able to build a new image using ON_BUILD feature - for building images depending on the base one. Added a separate issue here: #8872

@wittfabian
Copy link
Contributor

The same applies to additional Python packages.
https://github.com/puckel/docker-airflow/blob/master/Dockerfile#L64

if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi

@feluelle
Copy link
Member

feluelle commented Jun 12, 2020

My Apache Airflow docker-compose file for running LocalExecutor with postgres using official production Dockerfile

Moved to gist

@potiuk
Copy link
Member Author

potiuk commented Oct 7, 2020

In this link(https://hub.docker.com/r/apache/airflow/dockerfile), there is no production level dockerfile is there. Yesterday saw the production version.

I see it there (even in incognito mode). Must have been a temporary glitch of DockerHub.

  1. Whether we need to pass dockerfile with -f flag? I have tried above command and observed , it is looking for dockerfile.

As mentioned in the docs above, if you want to customize the image you need to checkout airflow sources and run the docker command inside the Airfllow sources. As it is in case of most Dockerfiles, they need context ("." in the command) and some extra files (for example entrypoint scripts) that have to be available in this context, and the easiest way it is to checkout Airflow Sources in the right version and customize the image from there.

You can find a nice description in here: https://airflow.readthedocs.io/en/latest/production-deployment.html - we moved the documentation to "docs" and it has not yet been released (but it will be in 1.10.13) - but you can use the "latest" version - it contains all detailed description of customizing vs. extending and even a nice table showing what are the differences - one point there is that you need to use Airflow sources to customize the image.

  1. If AIRFLOW_INSTALL_SOURCES=".", it points the installation from local sources(as per documentation). How it works?

See above - you need to run in inside checked out sources of Airflow.

  1. When I use the above command with -f Dockerfile, during the build process, I am facing this exception while running the step COPY scripts/docker scripts/docker --> "COPY failed: stat /var/lib/docker/tmp/docker-builder125807076/scripts/docker: no such file or directory". Whether i have to clone the git repo and then have to use docker build command?

Yes. That's the whole point - customisation only works if you have sources of Airflow checked out.

@potiuk potiuk added area:production-image Production image improvements and fixes and removed invalid wish:mssql-support labels Nov 11, 2020
@potiuk potiuk removed their assignment Nov 11, 2020
@kaxil
Copy link
Member

kaxil commented Nov 11, 2020

I think we should get this one in sooner before 2.0.0rc1, is someone willing to work on this one??

@kaxil
Copy link
Member

kaxil commented Nov 11, 2020

Also, I don't think docker-compose files need to be production-ready. It should just be meant for local-development or to quickly start / work on Airflow locally with different executors

@potiuk
Copy link
Member Author

potiuk commented Nov 12, 2020

Also, I don't think docker-compose files need to be production-ready. It should just be meant for local-development or to quickly start / work on Airflow locally with different executors

Agree. Starting small is good.

@ryw
Copy link
Member

ryw commented Nov 30, 2020

@potiuk should we move milestone to 2.1 for this?

@potiuk
Copy link
Member Author

potiuk commented Nov 30, 2020

Yep. Just did :).

@mik-laj
Copy link
Member

mik-laj commented Dec 10, 2020

My docker compose:

version: '3'
x-airflow-common:
  &airflow-common
  image: apache/airflow:1.10.12
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql://root@mysql/airflow?charset=utf8mb4
    - AIRFLOW__CORE__SQL_ENGINE_COLLATION_FOR_IDS=utf8mb3_general_ci
    - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
    - AIRFLOW__CELERY__RESULT_BACKEND=redis://:@redis:6379/0
    - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
    - AIRFLOW__CORE__LOAD_EXAMPLES=False
    - AIRFLOW__CORE__LOGGING_LEVEL=Debug
    - AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=False
    - AIRFLOW__WEBSERVER__RBAC=True
    - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
    - AIRFLOW__CORE__STORE_DAG_CODE=True
  volumes:
    - ./dags:/opt/airflow/dags
    - ./airflow-data/logs:/opt/airflow/logs
    - ./airflow-data/plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - mysql

services:
  mysql:
    image: mysql:5.7
    environment:
      - MYSQL_ALLOW_EMPTY_PASSWORD=true
      - MYSQL_ROOT_HOST=%
      - MYSQL_DATABASE=airflow
    volumes:
      - ./mysql/conf.d:/etc/mysql/conf.d:ro
      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
      - ./airflow-data/mysql-db-volume:/var/lib/mysql
    ports:
      - "3306:3306"
    command:
      - mysqld
      - --character-set-server=utf8mb4
      - --collation-server=utf8mb4_unicode_ci

  redis:
    image: redis:latest
    ports:
      - 6379:6379

  flower:
    << : *airflow-common
    command: flower
    ports:
      - 5555:5555

  airflow-init:
    << : *airflow-common
    container_name: airflow_init
    entrypoint: /bin/bash
    command:
      - -c
      - airflow list_users || (
          airflow initdb &&
          airflow create_user
            --role Admin
            --username airflow
            --password airflow
            --email airflow@airflow.com
            --firstname airflow
            --lastname airflow
        )
    restart: on-failure

  airflow-webserver:
    << : *airflow-common
    command: webserver
    ports:
      - 8080:8080
    restart: always

  airflow-scheduler:
    << : *airflow-common
    container_name: airflow_scheduler
    command:
      - scheduler
      - --run-duration
      - '30'
    restart: always

  airflow-worker:
    << : *airflow-common
    container_name: airflow_worker1
    command: worker
    restart: always

@mik-laj
Copy link
Member

mik-laj commented Dec 16, 2020

@BasPH shared on Slack: one-line command to start Airflow in docker:

In case you’ve ever wondered how to get the Airflow image to work in a one-liner (for demo purposes), here’s how:

docker run -ti -p 8080:8080 -v yourdag.py:/opt/airflow/dags/yourdag.py --entrypoint=/bin/bash apache/airflow:2.0.0b3-python3.8 -c '(airflow db init && airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email admin@example.org); airflow webserver & airflow scheduler'

Creates a user admin/admin and runs a SQLite metastore in the container

https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1608152276070500

@mik-laj
Copy link
Member

mik-laj commented Jan 13, 2021

I have prepared some Dockerfiles with some common configuration.

Postgres - Redis - Airflow 2.0
version: '3'
x-airflow-common:
  &airflow-common
  image: apache/airflow:1.10.14
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:airflow@postgres/airflow
#- AIRFLOW__CELERY__RESULT_BACKEND=redis://:@redis:6379/0
    - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
    - AIRFLOW__WEBSERVER__RBAC=True
    - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
    - AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
  volumes:
    - ./dags:/opt/airflow/dags
    - ./airflow-data/logs:/opt/airflow/logs
    - ./airflow-data/plugins:/opt/airflow/plugins
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:9.5
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - ./airflow-data/postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 30s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    << : *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    << : *airflow-common
    command: scheduler
    restart: always

  airflow-worker:
    << : *airflow-common
    command: celery worker
    restart: always

  airflow-init:
    << : *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - airflow users list || (
        airflow db init &&
        airflow users create
        --role Admin
        --username airflow
        --password airflow
        --email airflow@airflow.com
        --firstname airflow
        --lastname airflow
        )
    restart: on-failure

  flower:
    << : *airflow-common
    command: celery flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
Postgres - Redis - Airflow 1.10.14
version: '3'
x-airflow-common:
  &airflow-common
  image: apache/airflow:1.10.14
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:airflow@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
    - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
    - AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
  volumes:
    - ./dags:/opt/airflow/dags
    - ./airflow-data/logs:/opt/airflow/logs
    - ./airflow-data/plugins:/opt/airflow/plugins
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:9.5
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - ./airflow-data/postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 30s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    << : *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    << : *airflow-common
    command: scheduler
    restart: always

  airflow-worker:
    << : *airflow-common
    command: worker
    restart: always

  airflow-init:
    << : *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - airflow list_users || (
        airflow initdb &&
        airflow create_user
        --role Admin
        --username airflow
        --password airflow
        --email airflow@airflow.com
        --firstname airflow
        --lastname airflow
        )
    restart: on-failure

  flower:
    << : *airflow-common
    command: flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/healthcheck"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
Mysql 8.0 - Redis - Airflow 2.0
# Migrations are broken.
Mysql 8.0 - Redis - Airflow 1.10.14
version: '3'
x-airflow-common:
  &airflow-common
  image: apache/airflow:1.10.14
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql://root:airflow@mysql/airflow?charset=utf8mb4
    - AIRFLOW__CORE__SQL_ENGINE_COLLATION_FOR_IDS=utf8mb3_general_ci
    - AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
    - AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
    - AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
  volumes:
    - ./dags:/opt/airflow/dags
    - ./airflow-data/logs:/opt/airflow/logs
    - ./airflow-data/plugins:/opt/airflow/plugins
  depends_on:
    redis:
      condition: service_healthy
    mysql:
      condition: service_healthy

services:
  mysql:
    image: mysql:8.0
    environment:
      - MYSQL_ROOT_PASSWORD=airflow
      - MYSQL_ROOT_HOST=%
      - MYSQL_DATABASE=airflow
    volumes:
      - ./airflow-data/mysql-db-volume:/var/lib/mysql
    ports:
      - "3306:3306"
    command:
      - mysqld
      - --explicit-defaults-for-timestamp
      - --default-authentication-plugin=mysql_native_password
      - --character-set-server=utf8mb4
      - --collation-server=utf8mb4_unicode_ci
    healthcheck:
      test: ["CMD-SHELL", "mysql -h localhost -P 3306 -u root -pairflow -e 'SELECT 1'"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    << : *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    << : *airflow-common
    command: scheduler
    restart: always

  airflow-worker:
    << : *airflow-common
    command: worker
    restart: always

  airflow-init:
    << : *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - airflow list_users || (
        airflow initdb &&
        airflow create_user
        --role Admin
        --username airflow
        --password airflow
        --email airflow@airflow.com
        --firstname airflow
        --lastname airflow
        )
    restart: on-failure

  flower:
    << : *airflow-common
    command: flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

I added health checks where it was simple. Anyone have an idea for health-checks for airflow-scheduler/airflow-worker? This will improve stability.

Besides, I am planning to prepare a tool that is used to generate docker-compose files using a simple wizard. I am thinking of something similar to the Pytorch project.
https://pytorch.org/get-started/locally/

Screenshot 2021-01-13 at 15 01 49

@potiuk
Copy link
Member Author

potiuk commented Jan 13, 2021

Besides, I am planning to prepare a tool that is used to generate docker-compose files using a simple wizard. I am thinking of something similar to the Pytorch project.

Very good idea! ❤️

@ldacey
Copy link
Contributor

ldacey commented Feb 20, 2021

Has anyone successfully gotten turbodbc installed using pip? I have had to install miniconda and use conda-forge to get turbodbc + pyarrow working correctly. This adds a little complication to my Dockerfile, although I do kind of like the conda-env.yml file approach.

@mik-laj wow, I knew I could use common environment variables but I had no idea you could also do the volumes and images, that is super clean. Any reason why you have the scheduler restart every 30 seconds like that?

@ldealmei
Copy link

Thank you all for the docker-compose files :)
I'm sharing mine as it addresses some aspects that I couldn't find in this thread and had me spend some time on it to get it to work. These are:

  • Working with DockerOperator
  • Deploy behind a proxy (Traefik)
  • Deploy dags on push with git-sync (This one is optional but is quite convienent).

@mik-laj I also have a working healthcheck on the scheduler. Not the most expressive but works.

This configuration relies on an existing and initialized database.

External database - LocalExecutor - Airflow 2.0.0 - Traefik - Dags mostly based on DockerOperator.

version: "3.7"
x-airflow-environment: &airflow-environment
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__LOAD_EXAMPLES: "False"
  AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS: "False"
  AIRFLOW__CORE__SQL_ALCHEMY_CONN: ${DB_CONNECTION_STRING}
  AIRFLOW__CORE__FERNET_KEY: ${ENCRYPTION_KEY}
  AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/sync/git/dags
  AIRFLOW__CORE__ENABLE_XCOM_PICKLING: "True"  # because of https://github.com/apache/airflow/issues/13487
  AIRFLOW__WEBSERVER__BASE_URL: https://airflow.example.com
  AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX: "True"
  AIRFLOW__WEBSERVER__RBAC: "True"

services:
  traefik:
    image: traefik:v2.4
    container_name: traefik
    command:
      - --ping=true
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      # HTTP -> HTTPS redirect
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entrypoints.web.http.redirections.entrypoint.scheme=https
      # TLS config
      - --certificatesresolvers.myresolver.acme.dnschallenge=true
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
      ## Comment following line for a production deployment
      - --certificatesresolvers.myresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
      ## See https://doc.traefik.io/traefik/https/acme/#providers for other providers
      - --certificatesresolvers.myresolver.acme.dnschallenge.provider=digitalocean
      - --certificatesresolvers.myresolver.acme.email=user@example.com
    ports:
      - 80:80
      - 443:443
    environment:
      # See https://doc.traefik.io/traefik/https/acme/#providers for other providers
      DO_AUTH_TOKEN:
    restart: always
    healthcheck:
      test: ["CMD", "traefik", "healthcheck", "--ping"]
      interval: 10s
      timeout: 10s
      retries: 5
    volumes:
      - certs:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro

  # Required because of DockerOperator. For secure access and handling permissions.
  docker-socket-proxy:
    image: tecnativa/docker-socket-proxy:0.1.1
    environment:
      CONTAINERS: 1
      IMAGES: 1
      AUTH: 1
      POST: 1
    privileged: true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: always

  # Allows to deploy Dags on pushes to master
  git-sync:
    image: k8s.gcr.io/git-sync/git-sync:v3.2.2
    container_name: dags-sync
    environment:
      GIT_SYNC_USERNAME:
      GIT_SYNC_PASSWORD:
      GIT_SYNC_REPO: https://example.com/my/repo.git
      GIT_SYNC_DEST: dags
      GIT_SYNC_BRANCH: master
      GIT_SYNC_WAIT: 60
    volumes:
      - dags:/tmp:rw
    restart: always

  webserver:
    image: apache/airflow:2.0.0
    container_name: airflow_webserver
    environment:
      <<: *airflow-environment
    command: webserver
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    volumes:
      - dags:/opt/airflow/sync
      - logs:/opt/airflow/logs
    depends_on:
      - git-sync
      - traefik
    labels:
      - traefik.enable=true
      - traefik.http.routers.webserver.rule=Host(`airflow.example.com`)
      - traefik.http.routers.webserver.entrypoints=websecure
      - traefik.http.routers.webserver.tls.certresolver=myresolver
      - traefik.http.services.webserver.loadbalancer.server.port=8080

  scheduler:
    image: apache/airflow:2.0.0
    container_name: airflow_scheduler
    environment:
      <<: *airflow-environment
    command: scheduler
    restart: always
    healthcheck:
      test: ["CMD-SHELL", 'curl --silent http://airflow_webserver:8080/health | grep -A 1 scheduler | grep \"healthy\"']
      interval: 10s
      timeout: 10s
      retries: 5
    volumes:
      - dags:/opt/airflow/sync
      - logs:/opt/airflow/logs
    depends_on:
      - git-sync
      - webserver

volumes:
  dags:
  logs:
  certs:

I have an extra container (not shown) to handle rotating logs that are output directly to files. It is based on logrotate. Not sharing it here because it is a custom image and is beyond the scope of the thread. But if anybody interested, message me.

Hope it helps!

@mik-laj
Copy link
Member

mik-laj commented Feb 28, 2021

I added some improvements to the docker-compose file to make it more stable.
#14519
#14522
Now we have health-checks for all components.

@kaxil
Copy link
Member

kaxil commented Mar 30, 2021

@mik-laj Can we close this one since we already added the docker-compose files?

@potiuk
Copy link
Member Author

potiuk commented Mar 30, 2021

@kaxil -> I believe so. I do not think 'production-ready" docker-compose is even a thing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:production-image Production image improvements and fixes kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests