Skip to content

DAGs disappear from UI when a task gets stuck running in Airflow 3 #55029

@ldacey

Description

@ldacey

Apache Airflow version

3.0.5

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I am running Airflow with the official helm chart on GKE. I use a template which generates multiple DAGs for a source in a single file:

  1. Extract (cron)
  2. Raw to bronze (asset trigger)
  3. Bronze to silver (asset trigger)
  4. Silver to gold (asset trigger)
  5. Delta maintenance (cron)

If a task gets stuck running due to pod OOM errors then the DAG disappears from the UI. I am unable to see it when I search for the DAG or Asset name.

Image Image

Checking the DAG processor logs shows there is an error:

Bundle               File Path                                                     PID  Current Duration      # DAGs    # Errors  Last Duration    Last Run At
-------------------  ---------------------------------------------------------  ------  ------------------  --------  ----------  ---------------  -------------------

gitlab-sources-main  dags/history_dag.py                                                      0           1  0.27s            2025-08-28T14:40:58

However, if I check the Task Instances tab and filter for running tasks I can see the tasks which were OOM. They are stuck running which causes the DAG processor to fail to parse the other DAGs in the same file:

Image

After I mark those tasks as failed and wait a few minutes, the DAGs are parsed successfully:

Bundle               File Path                                                     PID  Current Duration      # DAGs    # Errors  Last Duration    Last Run At
-------------------  ---------------------------------------------------------  ------  ------------------  --------  ----------  ---------------  -------------------
gitlab-sources-main  dags/history_dag.py                                                      5           0  0.81s            2025-08-28T14:59:28

And the DAGs are now viewable in the UI:

Image

Also, I should add that the tasks are also not searchable:

Image

But If I remove the search filter and only select Running tasks then I can see the root cause:

Image

What you think should happen instead?

The DAG processor should be able to parse the DAG file whether or not a task is stuck running. It makes it very hard to know there is an issue if I cannot see the DAG in the UI. Not being able to search for the DAG or asset is also strange because it clearly exists in the DB and you can see it in the Task Instances page.

The fact that only one out of five DAGs had a problem but all five disappear is also not ideal.

How to reproduce

  • Generate multiple related (asset trigger) DAGs in a single DAG file
  • Hit OOM errors in a Kubernetes pod which cause the task to get stuck running instead of failing outright
  • The DAG processor will eventually fail to parse the the DAG file as a whole, and then related DAGs will disappear from the UI

Operating System

Debian Bookworm

Versions of Apache Airflow Providers

This has been an issue since upgrading to 3.0.0 (so several versions of providers). Here is the current list though:

apache-airflow-providers-cncf-kubernetes==10.6.2
apache-airflow-providers-common-compat==1.7.3
apache-airflow-providers-common-io==1.6.2
apache-airflow-providers-common-sql==1.27.4
apache-airflow-providers-docker==4.4.2
apache-airflow-providers-fab==2.3.1
apache-airflow-providers-ftp==3.13.2
apache-airflow-providers-git==0.0.5
apache-airflow-providers-google==17.0.0
apache-airflow-providers-http==5.3.3
apache-airflow-providers-microsoft-azure==12.6.0
apache-airflow-providers-odbc==4.10.2
apache-airflow-providers-postgres==6.2.2
apache-airflow-providers-sftp==5.3.3
apache-airflow-providers-smtp==2.1.2
apache-airflow-providers-ssh==4.1.2
apache-airflow-providers-standard==1.5.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

ArgoCD deploys the Helm chart via Kustomize

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions