-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DagRun execution order from queued to running not being properly followed #18061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can greatly improve the efficency of the queries (both your separate query and my CTE approach) by adding these indexes
create index idx_dag_run_dag_id on dag_run (dag_id);
create index idx_dag_run_running_dags on dag_run (state, dag_id) WHERE state = 'running';
Mysql doesn''t support the WHERE ...
on indexes, but everything else does. That's why the index is state,dag_id
(the order matters) -- that way MySQL can look up all the dags in running state, and then filter by a specific dag_id.
airflow/migrations/versions/092435bf5d12_add_max_active_runs_column_to_dagmodel_.py
Outdated
Show resolved
Hide resolved
d84517d
to
752f591
Compare
airflow/migrations/versions/092435bf5d12_add_max_active_runs_column_to_dagmodel_.py
Outdated
Show resolved
Hide resolved
|
||
assert dr[0].state == State.RUNNING | ||
|
||
def test_no_dagruns_would_stuck_in_running(self, dag_maker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this test fully covers the behaviour we saw/fixed.
I think we should have:
Dag one starting in 2016 with max_active_runs=1 create 30 dag runs (1 running, 29 queued)
Dag two starting in 2021, with some queued dags created
The key to my mind is to test that the queued dags from dag one would "fill up" the dagruns to examine if we don't exclude dags at max active runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar test already in the previous PR See:
airflow/tests/jobs/test_scheduler_job.py
Line 2655 in 13e7d4a
def test_max_active_runs_in_a_dag_doesnt_stop_running_dagruns_in_otherdags(self, dag_maker): |
This one is different. I got it while changing some codes and added this test to prevent such bug in the future.
In the current main, without this PR, the test passes.
1225b18
to
158549b
Compare
airflow/migrations/versions/092435bf5d12_add_max_active_runs_column_to_dagmodel_.py
Show resolved
Hide resolved
158549b
to
97e18e1
Compare
We made a fix that resolved max_active_runs not allowing other dagruns to move to running state, see apache#17945 and introduced a bug that dagruns were not following the execution_date order when moving to running state. This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test not connected with this change was added because I was able to trigger the bug while working on this fixup! Fix DagRun execution order not being properly followed fixup! fixup! Fix DagRun execution order not being properly followed fixup! Fix DagRun execution order not being properly followed fixup! fixup! Fix DagRun execution order not being properly followed fixup! Fix DagRun execution order not being properly followed Use subquery as mysql 5.7 doesn't support cte fix doc error Apply suggestions from code review
97e18e1
to
958a567
Compare
…followed (#18061) We made a fix that resolved max_active_runs not allowing other dagruns to move to running state, see #17945 and introduced a bug that dagruns were not following the execution_date order when moving to running state. This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test not connected with this change was added because I was able to trigger the bug while working on this (cherry picked from commit ebbe2b4)
We made a fix that resolved max_active_runs not allowing other dagruns to move to
running state, see #17945 and introduced a bug that dagruns were not following the
execution_date order when moving to running state.
This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test
not connected with this change was added because I was able to trigger the bug while
working on this
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.