Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Oct 29, 2025

When users have limited DAG access, the DAG list query was inefficiently grouping all DagRuns in the database before filtering. This caused severe performance degradation in large deployments where a user might access only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id

After (filters to accessible dags):

  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')

Performance impact: In a deployment with 100 DAGs (100 runs each) where a user has access to only 2 DAGs, this reduces the subquery from grouping 10,000 rows down to 200 rows (50x improvement), and eliminates fetching 98 unnecessary DAG models.

Fixes #57427


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

When users have limited DAG access, the DAG list query was inefficiently
grouping all DagRuns in the database before filtering. This caused severe
performance degradation in large deployments where a user might access
only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by
accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
```

After (filters to accessible dags):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
```

Performance impact: In a deployment with 100 DAGs (100 runs each) where
a user has access to only 2 DAGs, this reduces the subquery from grouping
10,000 rows down to 200 rows (50x improvement), and eliminates fetching
98 unnecessary DAG models.

Fixes apache#57427
Copy link
Contributor

@tirkarthi tirkarthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaxil , we had a very similar patch tested with a shared cluster used by multiple teams with varied count of accessible dags. The group by logs that examined all the rows stopped occuring in the MySQL slow query logs after the fix.

@kaxil kaxil added this to the Airflow 3.1.2 milestone Oct 29, 2025
@kaxil kaxil added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Oct 29, 2025
@kaxil kaxil merged commit f271f2b into apache:main Oct 29, 2025
115 checks passed
@kaxil kaxil deleted the dag-run-filter branch October 29, 2025 13:02
@github-actions
Copy link

Backport failed to create: v3-1-test. View the failure log Run details

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker f271f2b v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

Lzzz666 pushed a commit to Lzzz666/airflow that referenced this pull request Oct 30, 2025
When users have limited DAG access, the DAG list query was inefficiently
grouping all DagRuns in the database before filtering. This caused severe
performance degradation in large deployments where a user might access
only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by
accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
```

After (filters to accessible dags):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
```

Performance impact: In a deployment with 100 DAGs (100 runs each) where
a user has access to only 2 DAGs, this reduces the subquery from grouping
10,000 rows down to 200 rows (50x improvement), and eliminates fetching
98 unnecessary DAG models.

Fixes apache#57427
kaxil added a commit that referenced this pull request Oct 31, 2025
When users have limited DAG access, the DAG list query was inefficiently
grouping all DagRuns in the database before filtering. This caused severe
performance degradation in large deployments where a user might access
only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by
accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
```

After (filters to accessible dags):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
```

Performance impact: In a deployment with 100 DAGs (100 runs each) where
a user has access to only 2 DAGs, this reduces the subquery from grouping
10,000 rows down to 200 rows (50x improvement), and eliminates fetching
98 unnecessary DAG models.

Fixes #57427

(cherry picked from commit f271f2b)
kaxil added a commit that referenced this pull request Oct 31, 2025
When users have limited DAG access, the DAG list query was inefficiently
grouping all DagRuns in the database before filtering. This caused severe
performance degradation in large deployments where a user might access
only a few DAGs out of hundreds or thousands.

The fix filters both the main DAG query and the DagRun subquery by
accessible dag_ids before performing the expensive GROUP BY operation.

Before (queries all dagruns):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
```

After (filters to accessible dags):

```sql
  SELECT ... FROM dag
  LEFT OUTER JOIN (
    SELECT dag_run.dag_id, max(dag_run.id) AS max_dag_run_id
    FROM dag_run
    WHERE dag_run.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
    GROUP BY dag_run.dag_id
  ) AS mrq ON dag.dag_id = mrq.dag_id
  WHERE dag.dag_id IN ('accessible_dag_1', 'accessible_dag_2')
```

Performance impact: In a deployment with 100 DAGs (100 runs each) where
a user has access to only 2 DAGs, this reduces the subquery from grouping
10,000 rows down to 200 rows (50x improvement), and eliminates fetching
98 unnecessary DAG models.

Fixes #57427

(cherry picked from commit f271f2b)
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

generate_dag_with_latest_run_query queries all dagruns irrespective of the dag ids accessible

4 participants