Skip to content

Non-deterministic DAG serialization for Dinamically generated dags leads to excessive versions #56471

@wolvery

Description

@wolvery

Apache Airflow version

3.1.0

If "Other Airflow 2 version" selected, which one?

No response

What happened?

We are experiencing an issue where the dag processor creates a new version of a DAG in the serialized_dag table on nearly every parsing cycle, even when the underlying DAG file is functionally unchanged.

The root cause appears to be a non-deterministic serialization process. The order of dictionary keys and list elements in the resulting JSON column of serialized_dag table is inconsistent between parses. This leads to different versions for logically identical DAGs, only the order of keys inside of the JSON are changing.

Example:

--- version_1
+++ version_2
@@ -1,13 +1,13 @@
-"dag_id": "test",
-"max_consecutive_failed_dag_runs": 7,
-"timetable": { ... },
-"relative_fileloc": "revision_dags/test.py",
-"task_group": { ... },
-"fileloc": "/opt/airflow/dags/revision_dags/test.py",
-"timezone": "UTC",
-"default_args": { ... },
-"description": "DAG for [domain='test', data_product='polaroid_input_features', pipeline='main']",
-"max_active_runs": 1,
-"tags": [ ... ],
-"start_date": 1640995200.0
+"max_consecutive_failed_dag_runs": 7,
+"task_group": { ... },
+"timezone": "UTC",
+"max_active_runs": 1,
+"fileloc": "/opt/airflow/dags/revision_dags/test.py",
+"timetable": { ... },
+"start_date": 1640995200.0,
+"description": "DAG for [domain='test', data_product='polaroid_input_features', pipeline='main']",
+"default_args": { ... },
+"tags": [ ... ],
+"relative_fileloc": "revision_dags/test.py",
+"dag_id": "test"

What you think should happen instead?

It should sort the keys internally to perform the comparison and avoid the creation of a new version.

How to reproduce

Creating a simple dynamic dag and importing in the global seems to lead to the problem.

Operating System

airflow:3.1.0 python 3.10 image

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Official Apache Airflow Helm Chart

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions