Optimize DAG list query for users with limited access #57460

kaxil · 2025-10-29T00:17:44Z

When users have limited DAG access, the DAG list query was inefficiently grouping all DagRuns in the database before filtering. This caused severe performance degradation in large deployments where a user might access only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id

After (filters to accessible dags):

  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')

Performance impact: In a deployment with 100 DAGs (100 runs each) where a user has access to only 2 DAGs, this reduces the subquery from grouping 10,000 rows down to 200 rows (50x improvement), and eliminates fetching 98 unnecessary DAG models.

Fixes #57427

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

When users have limited DAG access, the DAG list query was inefficiently grouping all DagRuns in the database before filtering. This caused severe performance degradation in large deployments where a user might access only a few DAGs out of hundreds or thousands. The fix filters both the main DAG query and the DagRun subquery by accessible dag_ids before performing the expensive GROUP BY operation. Before (queries all dagruns): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id ``` After (filters to accessible dags): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2') GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2') ``` Performance impact: In a deployment with 100 DAGs (100 runs each) where a user has access to only 2 DAGs, this reduces the subquery from grouping 10,000 rows down to 200 rows (50x improvement), and eliminates fetching 98 unnecessary DAG models. Fixes apache#57427

tirkarthi

Thanks @kaxil , we had a very similar patch tested with a shared cluster used by multiple teams with varied count of accessible dags. The group by logs that examined all the rows stopped occuring in the MySQL slow query logs after the fix.

github-actions · 2025-10-29T13:03:44Z

Backport failed to create: v3-1-test. View the failure log Run details

Status	Branch	Result
❌	v3-1-test

You can attempt to backport this manually by running:

cherry_picker f271f2b v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

When users have limited DAG access, the DAG list query was inefficiently grouping all DagRuns in the database before filtering. This caused severe performance degradation in large deployments where a user might access only a few DAGs out of hundreds or thousands. The fix filters both the main DAG query and the DagRun subquery by accessible dag_ids before performing the expensive GROUP BY operation. Before (queries all dagruns): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id ``` After (filters to accessible dags): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2') GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2') ``` Performance impact: In a deployment with 100 DAGs (100 runs each) where a user has access to only 2 DAGs, this reduces the subquery from grouping 10,000 rows down to 200 rows (50x improvement), and eliminates fetching 98 unnecessary DAG models. Fixes apache#57427

When users have limited DAG access, the DAG list query was inefficiently grouping all DagRuns in the database before filtering. This caused severe performance degradation in large deployments where a user might access only a few DAGs out of hundreds or thousands. The fix filters both the main DAG query and the DagRun subquery by accessible dag_ids before performing the expensive GROUP BY operation. Before (queries all dagruns): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id ``` After (filters to accessible dags): ```sql SELECT ... FROM dag LEFT OUTER JOIN ( SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2') GROUP BY dag_run.dag_id ) AS mrq ON dag.dag_id = mrq.dag_id WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2') ``` Performance impact: In a deployment with 100 DAGs (100 runs each) where a user has access to only 2 DAGs, this reduces the subquery from grouping 10,000 rows down to 200 rows (50x improvement), and eliminates fetching 98 unnecessary DAG models. Fixes #57427 (cherry picked from commit f271f2b)

kaxil requested a review from tirkarthi October 29, 2025 00:17

kaxil requested review from bugraoz93, ephraimbuddy, jason810496, pierrejeambrun, rawwar and shubhamraj-git as code owners October 29, 2025 00:17

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Oct 29, 2025

kaxil mentioned this pull request Oct 29, 2025

generate_dag_with_latest_run_query queries all dagruns irrespective of the dag ids accessible #57427

Closed

2 tasks

kaxil force-pushed the dag-run-filter branch from 62b3461 to 5b3012c Compare October 29, 2025 11:03

kaxil force-pushed the dag-run-filter branch from 5b3012c to 9ea5854 Compare October 29, 2025 11:20

eladkal approved these changes Oct 29, 2025

View reviewed changes

tirkarthi approved these changes Oct 29, 2025

View reviewed changes

kaxil added this to the Airflow 3.1.2 milestone Oct 29, 2025

kaxil added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Oct 29, 2025

kaxil merged commit f271f2b into apache:main Oct 29, 2025
115 checks passed

kaxil deleted the dag-run-filter branch October 29, 2025 13:02

kaxil mentioned this pull request Oct 31, 2025

Status of testing of Apache Airflow 3.1.2rc2 & Task SDK 1.1.2rc2 #57648

Closed

ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 10, 2025

pierrejeambrun mentioned this pull request Jan 9, 2026

Dags list is not loading in the UI, espically when filtering with tags #56219

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize DAG list query for users with limited access #57460

Optimize DAG list query for users with limited access #57460

Uh oh!

kaxil commented Oct 29, 2025

Uh oh!

tirkarthi left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Optimize DAG list query for users with limited access #57460

Optimize DAG list query for users with limited access #57460

Uh oh!

Conversation

kaxil commented Oct 29, 2025

Uh oh!

tirkarthi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025

Backport failed to create: v3-1-test. View the failure log Run details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants