Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DagRun execution order from queued to running not being properly followed #18061

Merged
merged 2 commits into from
Sep 9, 2021

Conversation

ephraimbuddy
Copy link
Contributor

We made a fix that resolved max_active_runs not allowing other dagruns to move to
running state, see #17945 and introduced a bug that dagruns were not following the
execution_date order when moving to running state.

This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test
not connected with this change was added because I was able to trigger the bug while
working on this


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler kind:documentation labels Sep 7, 2021
@ephraimbuddy ephraimbuddy changed the title Fix DagRun execution order not being properly followed Fix DagRun execution order from queued to running not being properly followed Sep 7, 2021
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can greatly improve the efficency of the queries (both your separate query and my CTE approach) by adding these indexes

create index idx_dag_run_dag_id on dag_run (dag_id);
create index idx_dag_run_running_dags on dag_run (state, dag_id) WHERE state = 'running';

Mysql doesn''t support the WHERE ... on indexes, but everything else does. That's why the index is state,dag_id (the order matters) -- that way MySQL can look up all the dags in running state, and then filter by a specific dag_id.

airflow/models/dagrun.py Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved

assert dr[0].state == State.RUNNING

def test_no_dagruns_would_stuck_in_running(self, dag_maker):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this test fully covers the behaviour we saw/fixed.

I think we should have:

Dag one starting in 2016 with max_active_runs=1 create 30 dag runs (1 running, 29 queued)
Dag two starting in 2021, with some queued dags created

The key to my mind is to test that the queued dags from dag one would "fill up" the dagruns to examine if we don't exclude dags at max active runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar test already in the previous PR See:

def test_max_active_runs_in_a_dag_doesnt_stop_running_dagruns_in_otherdags(self, dag_maker):

This one is different. I got it while changing some codes and added this test to prevent such bug in the future.
In the current main, without this PR, the test passes.

We made a fix that resolved max_active_runs not allowing other dagruns to move to
running state, see apache#17945 and introduced a bug that dagruns were not following the
execution_date order when moving to running state.

This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test
not connected with this change was added because I was able to trigger the bug while
working on this

fixup! Fix DagRun execution order not being properly followed

fixup! fixup! Fix DagRun execution order not being properly followed

fixup! Fix DagRun execution order not being properly followed

fixup! fixup! Fix DagRun execution order not being properly followed

fixup! Fix DagRun execution order not being properly followed

Use subquery as mysql 5.7 doesn't support cte

fix doc error

Apply suggestions from code review
@ephraimbuddy ephraimbuddy merged commit ebbe2b4 into apache:main Sep 9, 2021
@ephraimbuddy ephraimbuddy deleted the fix-queued-running-order branch September 9, 2021 11:03
kaxil pushed a commit that referenced this pull request Sep 10, 2021
…followed (#18061)

We made a fix that resolved max_active_runs not allowing other dagruns to move to
running state, see #17945 and introduced a bug that dagruns were not following the
execution_date order when moving to running state.

This PR fixes it by adding a 'max_active_runs` column in dagmodel. Also an extra test
not connected with this change was added because I was able to trigger the bug while
working on this

(cherry picked from commit ebbe2b4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants