Skip to content

generate_dag_with_latest_run_query queries all dagruns irrespective of the dag ids accessible #57427

@tirkarthi

Description

@tirkarthi

Apache Airflow version

main (development)

If "Other Airflow 2/3 version" selected, which one?

No response

What happened?

generate_dag_with_latest_run_query queries all dagruns irrespective of the dag ids accessible causing the query to be expensive. This causes grouping of all dagruns to get the latest dagrun to be thrown away later. Since readable_dags_filter.value which has the permitted dag_ids can be passed to this function thus grouping dagruns for only accessible dag ids. This will help in deployments where some users have access to only few dag ids but still resulting in queries where group by is performed for all dagruns.

What you think should happen instead?

No response

How to reproduce

  1. Generate 100 dags with 100 dagrun per dag.
  2. Create a user with auth manager configured where the user can access only few dag ids.
  3. Visit the dags list page.
  4. generate_dag_with_latest_run_query subquery referenced as mrq groups by all dagruns of all dagids.
(SELECT dag_run.dag_id AS dag_id, max(dag_run.id) AS max_dag_run_id FROM dag_run GROUP BY dag_run.dag_id) AS mrq

Operating System

Ubuntu 20.04

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions