Skip to content

S3 DAG bundle sync sometimes skips updated objects; local /tmp copy remains stale until folder is deleted #60223

@marcosmartinezfco

Description

@marcosmartinezfco

Apache Airflow version

3.1.5

If "Other Airflow 3 version" selected, which one?

No response

What happened?

We are using DAG bundles stored in S3. The Airflow “control plane” (scheduler / DAG processor) downloads bundles to a local folder under /tmp/airflow/<bundle-name>/ for parsing and UI display. Celery workers also download the bundle so they can execute tasks.

We are seeing intermittent/sticky behavior where updated DAG files are successfully uploaded to S3, but Airflow does not download the new version. Instead, Airflow logs:

Local file ... is up-to-date with S3 object ... Skipping download.

Even after waiting several minutes and multiple DAG processor loops, the local files under /tmp/airflow/... do not change. If we manually delete the local bundle directory, the next loop re-downloads the bundle and picks up changes.

This can impact:

  • Control plane: UI shows stale DAG code until the cache is manually deleted / container restarted.
  • Workers: tasks may execute with stale DAG code (we expected workers to re-download on each run in our setup, but they can also appear stale).

What you think should happen instead?

When the object in S3 changes (new upload), the next bundle sync should download the updated object and refresh the local bundle directory without requiring manual deletion of local files.

How to reproduce

We were able to reproduce this more consistently when the DAG change is only within templated fields, specifically the bash_command argument of BashOperator (i.e. a change inside the string that gets templated at runtime).

Empirically:

  • If we make a change that is only inside BashOperator(bash_command=...), the S3 bundle sync sometimes logs that the local file is “up-to-date” and does not re-download the updated DAG file (stale /tmp/airflow/...).
  • If we make a change outside of the templated bash_command string (e.g., a comment, a constant, changing a non-templated field), the change is much more likely to be detected and the updated file gets downloaded.

This makes the issue appear correlated with updates that only affect templated sections of the DAG file (though we have not proven causation).

from datetime import datetime, timedelta

from airflow.providers.standard.operators.bash import BashOperator
from airflow.sdk import DAG

default_args = {
    "owner": "owner",
    "retries": 1,
    "retry_delay": timedelta(minutes=1),
    "execution_timeout": timedelta(minutes=5),
    "start_date": datetime(2026, 1, 1),
    "queue": "queue",
}

# dummy comment
with DAG(
    dag_id="dummy",
    default_args=default_args,
    schedule="0 0 * * *",
    catchup=False,
    tags=["test"],
):
    hello_world = BashOperator(
        task_id="print_hello_world",
        bash_command="echo 'Hello World from dummy DAG bundle test! ;)'",

Operating System

AlmaLinux 9.5 (Teal Serval)

Versions of Apache Airflow Providers

apache-airflow 3.1.5
apache-airflow-core 3.1.5
apache-airflow-providers-amazon 9.18.0
apache-airflow-providers-celery 3.14.0
apache-airflow-providers-cncf-kubernetes 10.11.0
apache-airflow-providers-common-compat 1.10.0
apache-airflow-providers-common-io 1.7.0
apache-airflow-providers-common-messaging 2.0.1
apache-airflow-providers-common-sql 1.30.0
apache-airflow-providers-docker 4.5.0
apache-airflow-providers-elasticsearch 6.4.0
apache-airflow-providers-fab 3.0.3
apache-airflow-providers-ftp 3.14.0
apache-airflow-providers-git 0.1.0
apache-airflow-providers-google 19.1.0
apache-airflow-providers-grpc 3.9.0
apache-airflow-providers-hashicorp 4.4.0
apache-airflow-providers-http 5.6.0
apache-airflow-providers-microsoft-azure 12.9.0
apache-airflow-providers-mysql 6.4.0
apache-airflow-providers-odbc 4.11.0
apache-airflow-providers-openlineage 2.9.0
apache-airflow-providers-postgres 6.5.0
apache-airflow-providers-redis 4.4.0
apache-airflow-providers-sendgrid 4.2.0
apache-airflow-providers-sftp 5.5.0
apache-airflow-providers-slack 9.6.0
apache-airflow-providers-smtp 2.4.0
apache-airflow-providers-snowflake 6.7.0
apache-airflow-providers-ssh 3.14.0
apache-airflow-providers-standard 1.10.0
apache-airflow-task-sdk 1.1.5
google-cloud-orchestration-airflow 1.18.0

Deployment

Docker-Compose

Deployment details

We have one control plane running the airflow components and N celery queues with one celery worker per queue.

Metadata db in rds and redis running in control plane for the celery queue

Anything else?

We have multiple S3 bundles (different prefixes / bundle names). The issue reproduces for some bundles more often than others, but we were eventually able to reproduce it for multiple bundles.

In logs, when the change is detected correctly, we see something like:

S3 object size (20372) and local file size (20371) differ. Downloaded <dag>.py to /tmp/airflow/<bundle>/<dag>.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetprovider:amazonAWS/Amazon - related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions