-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
3.1.5
If "Other Airflow 3 version" selected, which one?
No response
What happened?
We are using DAG bundles stored in S3. The Airflow “control plane” (scheduler / DAG processor) downloads bundles to a local folder under /tmp/airflow/<bundle-name>/ for parsing and UI display. Celery workers also download the bundle so they can execute tasks.
We are seeing intermittent/sticky behavior where updated DAG files are successfully uploaded to S3, but Airflow does not download the new version. Instead, Airflow logs:
Local file ... is up-to-date with S3 object ... Skipping download.
Even after waiting several minutes and multiple DAG processor loops, the local files under /tmp/airflow/... do not change. If we manually delete the local bundle directory, the next loop re-downloads the bundle and picks up changes.
This can impact:
- Control plane: UI shows stale DAG code until the cache is manually deleted / container restarted.
- Workers: tasks may execute with stale DAG code (we expected workers to re-download on each run in our setup, but they can also appear stale).
What you think should happen instead?
When the object in S3 changes (new upload), the next bundle sync should download the updated object and refresh the local bundle directory without requiring manual deletion of local files.
How to reproduce
We were able to reproduce this more consistently when the DAG change is only within templated fields, specifically the bash_command argument of BashOperator (i.e. a change inside the string that gets templated at runtime).
Empirically:
- If we make a change that is only inside
BashOperator(bash_command=...), the S3 bundle sync sometimes logs that the local file is “up-to-date” and does not re-download the updated DAG file (stale/tmp/airflow/...). - If we make a change outside of the templated
bash_commandstring (e.g., a comment, a constant, changing a non-templated field), the change is much more likely to be detected and the updated file gets downloaded.
This makes the issue appear correlated with updates that only affect templated sections of the DAG file (though we have not proven causation).
from datetime import datetime, timedelta
from airflow.providers.standard.operators.bash import BashOperator
from airflow.sdk import DAG
default_args = {
"owner": "owner",
"retries": 1,
"retry_delay": timedelta(minutes=1),
"execution_timeout": timedelta(minutes=5),
"start_date": datetime(2026, 1, 1),
"queue": "queue",
}
# dummy comment
with DAG(
dag_id="dummy",
default_args=default_args,
schedule="0 0 * * *",
catchup=False,
tags=["test"],
):
hello_world = BashOperator(
task_id="print_hello_world",
bash_command="echo 'Hello World from dummy DAG bundle test! ;)'",Operating System
AlmaLinux 9.5 (Teal Serval)
Versions of Apache Airflow Providers
apache-airflow 3.1.5
apache-airflow-core 3.1.5
apache-airflow-providers-amazon 9.18.0
apache-airflow-providers-celery 3.14.0
apache-airflow-providers-cncf-kubernetes 10.11.0
apache-airflow-providers-common-compat 1.10.0
apache-airflow-providers-common-io 1.7.0
apache-airflow-providers-common-messaging 2.0.1
apache-airflow-providers-common-sql 1.30.0
apache-airflow-providers-docker 4.5.0
apache-airflow-providers-elasticsearch 6.4.0
apache-airflow-providers-fab 3.0.3
apache-airflow-providers-ftp 3.14.0
apache-airflow-providers-git 0.1.0
apache-airflow-providers-google 19.1.0
apache-airflow-providers-grpc 3.9.0
apache-airflow-providers-hashicorp 4.4.0
apache-airflow-providers-http 5.6.0
apache-airflow-providers-microsoft-azure 12.9.0
apache-airflow-providers-mysql 6.4.0
apache-airflow-providers-odbc 4.11.0
apache-airflow-providers-openlineage 2.9.0
apache-airflow-providers-postgres 6.5.0
apache-airflow-providers-redis 4.4.0
apache-airflow-providers-sendgrid 4.2.0
apache-airflow-providers-sftp 5.5.0
apache-airflow-providers-slack 9.6.0
apache-airflow-providers-smtp 2.4.0
apache-airflow-providers-snowflake 6.7.0
apache-airflow-providers-ssh 3.14.0
apache-airflow-providers-standard 1.10.0
apache-airflow-task-sdk 1.1.5
google-cloud-orchestration-airflow 1.18.0
Deployment
Docker-Compose
Deployment details
We have one control plane running the airflow components and N celery queues with one celery worker per queue.
Metadata db in rds and redis running in control plane for the celery queue
Anything else?
We have multiple S3 bundles (different prefixes / bundle names). The issue reproduces for some bundles more often than others, but we were eventually able to reproduce it for multiple bundles.
In logs, when the change is detected correctly, we see something like:
S3 object size (20372) and local file size (20371) differ. Downloaded <dag>.py to /tmp/airflow/<bundle>/<dag>.py