Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing a Docker Decorator task results in Jinja Error #26718

Closed
1 of 2 tasks
noah-gil opened this issue Sep 27, 2022 · 7 comments
Closed
1 of 2 tasks

Executing a Docker Decorator task results in Jinja Error #26718

noah-gil opened this issue Sep 27, 2022 · 7 comments
Labels
area:providers kind:bug This is a clearly a bug
Milestone

Comments

@noah-gil
Copy link

noah-gil commented Sep 27, 2022

Apache Airflow Provider(s)

docker

Versions of Apache Airflow Providers

apache-airflow-providers-docker==3.1.0

Apache Airflow version

2.4.0

Operating System

Debian GNU/Linux 11 (bullseye)

Deployment

Docker-Compose

Deployment details

Client: Docker Engine - Community
Cloud integration: v1.0.28
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:03:17 2022
OS/Arch: linux/amd64
Context: default
Experimental: true

Docker Compose: v2.7.0

Using a slightly modified version of the example docker-compose.yaml:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
#                                Default: apache/airflow:2.4.0
# AIRFLOW_UID                  - User ID in Airflow containers
#                                Default: 50000
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
#                                Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
#                                Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
#                                Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3'
x-airflow-common:
  &airflow-common
  # In order to add custom dependencies or upgrade provider packages you can use your extended image.
  # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
  # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
  image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.4.0}
  # build: .
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    # For backward compatibility, with Airflow <2.3
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
    IS_LOCAL: 'true'
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./kube.conf:/opt/airflow/kube.conf
    - /var/run/docker.sock:/var/run/docker.sock
  user: "${AIRFLOW_UID:-50000}:0"
  group_add:
    - '1001' # Add user to docker group. Change value depending on gid of docker on your machine
  depends_on:
    &airflow-common-depends-on
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    expose:
      - 6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-scheduler:
    <<: *airflow-common
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    # yamllint disable rule:line-length
    command:
      - -c
      - |
        function ver() {
          printf "%04d%04d%04d%04d" $${1//./ }
        }
        airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version)
        airflow_version_comparable=$$(ver $${airflow_version})
        min_airflow_version=2.2.0
        min_airflow_version_comparable=$$(ver $${min_airflow_version})
        if (( airflow_version_comparable < min_airflow_version_comparable )); then
          echo
          echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
          echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
          echo
          exit 1
        fi
        if [[ -z "${AIRFLOW_UID}" ]]; then
          echo
          echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
          echo "If you are on Linux, you SHOULD follow the instructions below to set "
          echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
          echo "For other operating systems you can get rid of the warning with manually created .env file:"
          echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
          echo
        fi
        one_meg=1048576
        mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
        cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
        disk_available=$$(df / | tail -1 | awk '{print $$4}')
        warning_resources="false"
        if (( mem_available < 4000 )) ; then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
          echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
          echo
          warning_resources="true"
        fi
        if (( cpus_available < 2 )); then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
          echo "At least 2 CPUs recommended. You have $${cpus_available}"
          echo
          warning_resources="true"
        fi
        if (( disk_available < one_meg * 10 )); then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
          echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
          echo
          warning_resources="true"
        fi
        if [[ $${warning_resources} == "true" ]]; then
          echo
          echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
          echo "Please follow the instructions to increase amount of resources available:"
          echo "   https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
          echo
        fi
        mkdir -p /sources/logs /sources/dags /sources/plugins
        chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
        exec /entrypoint airflow version
    # yamllint enable rule:line-length
    environment:
      <<: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
      _PIP_ADDITIONAL_REQUIREMENTS: ''
    user: "0:0"
    volumes:
      - .:/sources

  airflow-cli:
    <<: *airflow-common
    profiles:
      - debug
    environment:
      <<: *airflow-common-env
      CONNECTION_CHECK_MAX_COUNT: "0"
    # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
    command:
      - bash
      - -c
      - airflow

volumes:
  postgres-db-volume:

What happened

I was trying to test running a task using the @task.docker decorator, so I set up the following DAG with a series of Docker tasks.

from airflow import DAG
from airflow.decorators import task, dag

from docker.types import Mount
from datetime import datetime

@dag(
    description='Run a series of Docker containers with outputs',
    start_date=datetime(2022, 1, 1),
    catchup=False,
    schedule_interval=None,
)
def docker_parallel_decorator():
    @task.docker(image="python:3.9-slim-bullseye")
    def container_a():
        print("Hello from Container A")
        return None

    @task.docker(image="python:3.9-slim-bullseye")
    def container_b():
        print("Hello from Container B")
        return None

    @task.docker(image="python:3.9-slim-bullseye")
    def container_c():
        print("Hello from Container C")
        return None

    container_a() >> container_b() >> container_c()

docker_parallel_decorator()

In the past, I've had success with the DockerOperator, so I expected no difference. However, I received the following error in the output log:

[2022-09-27, 17:37:03 UTC] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py", line 111, in execute
    filename=script_filename,
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 128, in write_python_script
    template.stream(**jinja_context).dump(filename)
  File "/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 1618, in dump
    fp.writelines(iterable)
  File "/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 1613, in <genexpr>
    iterable = (x.encode(encoding, errors) for x in self)  # type: ignore
  File "/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 1662, in __next__
    return self._next()  # type: ignore
  File "/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 1354, in generate
    yield self.environment.handle_exception()
  File "/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/python_virtualenv_script.jinja2", line 23, in top-level template code
    {% if expect_airflow %}
jinja2.exceptions.UndefinedError: 'expect_***' is undefined

What you think should happen instead

I expected the Docker tasks to run the code in the provided Python function.

How to reproduce

  1. Deploy Airflow from the provided docker-compose.yaml file
  2. Place the provided DAG into the ./dags folder
  3. Manually trigger the docker_parallel_decorator from the web UI

Anything else

I have no experience with Jinja, so I don't know the specifics, but I noticed that I was able to create a workaround by patching the /home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py file in the airflow-scheduler service.

First, I copied the file out of the container.

docker compose cp airflow-scheduler:/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py ./docker.py

Then I changed the following snippet starting on line 101:

            write_python_script(
                jinja_context=dict(
                    op_args=self.op_args,
                    op_kwargs=self.op_kwargs,
                    pickling_library=self.pickling_library.__name__,
                    python_callable=self.python_callable.__name__,
                    python_callable_source=py_source,
                    string_args_global=False,
                ),
                filename=script_filename,
            )

To this:

            write_python_script(
                jinja_context=dict(
                    op_args=self.op_args,
                    op_kwargs=self.op_kwargs,
                    pickling_library=self.pickling_library.__name__,
                    python_callable=self.python_callable.__name__,
                    python_callable_source=py_source,
                    string_args_global=False,
                    expect_airflow=False, # Added this line
                ),
                filename=script_filename,
            )

Then I copied the file back into the container.

docker compose cp ./docker.py airflow-scheduler:/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py

After that, running the DAG resulted in no errors with the expected output in the logs.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@noah-gil noah-gil added area:providers kind:bug This is a clearly a bug labels Sep 27, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 27, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@noah-gil noah-gil changed the title Executing a decorated Docker task results in Jinja Error Executing a Docker Decorator task results in Jinja Error Sep 27, 2022
@potiuk potiuk added this to the Airflow 2.4.1 milestone Sep 27, 2022
@potiuk
Copy link
Member

potiuk commented Sep 27, 2022

Yeah. I think the problem is that we have not released the docker provider before releasing 2.4.0/2.4.1. We will fix it.

@potiuk
Copy link
Member

potiuk commented Sep 27, 2022

Sorry for that - I had not realised we had this implicit coupling.

@potiuk
Copy link
Member

potiuk commented Sep 28, 2022

Hey @noah-gil - can you please inslall RC candidate of 3.2.0 docker provider and confirm that the problem is fixed ? https://pypi.org/project/apache-airflow-providers-docker/3.2.0rc1/ - you can also comment here :) #26752

@potiuk potiuk modified the milestone: Airflow 2.4.1 Sep 28, 2022
@potiuk
Copy link
Member

potiuk commented Sep 28, 2022

Closing as the docker provider is in voting period.

@potiuk potiuk closed this as completed Sep 28, 2022
@noah-gil
Copy link
Author

I can confirm that the problem is fixed for the RC version. Thanks for the quick response!

@potiuk
Copy link
Member

potiuk commented Sep 29, 2022

Glad it worked :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

2 participants