-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix backfill max_active_runs race condition with concurrent schedulers #58807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ephraimbuddy
merged 3 commits into
apache:main
from
astronomer:fix-backfill-max-active-runs
Dec 2, 2025
Merged
Fix backfill max_active_runs race condition with concurrent schedulers #58807
ephraimbuddy
merged 3 commits into
apache:main
from
astronomer:fix-backfill-max-active-runs
Dec 2, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When two schedulers run concurrently, both could start more backfill dag runs than max_active_runs allows. This happened because each scheduler read the count of running dag runs before either committed, causing both to see stale counts and start runs simultaneously. The fix adds row-level locking on the Backfill table. When a scheduler processes backfill dag runs, it first locks the relevant Backfill rows. If another scheduler already holds the lock, the current scheduler skips those backfills rather than potentially violating the max_active_runs constraint. This ensures that only one scheduler can process a given backfill's dag runs at a time, preventing the race condition while remaining non-blocking (schedulers don't wait on each other).
ephraimbuddy
commented
Nov 28, 2025
2 tasks
Lee-W
approved these changes
Dec 2, 2025
vatsrahul1001
approved these changes
Dec 2, 2025
Co-authored-by: Wei Lee <weilee.rx@gmail.com>
ephraimbuddy
commented
Dec 2, 2025
phanikumv
approved these changes
Dec 2, 2025
Backport failed to create: v3-1-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 22af27e v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
ephraimbuddy
added a commit
that referenced
this pull request
Dec 2, 2025
#58807) * Fix backfill max_active_runs race condition with concurrent schedulers When two schedulers run concurrently, both could start more backfill dag runs than max_active_runs allows. This happened because each scheduler read the count of running dag runs before either committed, causing both to see stale counts and start runs simultaneously. The fix adds row-level locking on the Backfill table. When a scheduler processes backfill dag runs, it first locks the relevant Backfill rows. If another scheduler already holds the lock, the current scheduler skips those backfills rather than potentially violating the max_active_runs constraint. This ensures that only one scheduler can process a given backfill's dag runs at a time, preventing the race condition while remaining non-blocking (schedulers don't wait on each other). (cherry picked from commit 22af27e)
Lee-W
pushed a commit
that referenced
this pull request
Dec 2, 2025
ephraimbuddy
added a commit
that referenced
this pull request
Dec 3, 2025
RoyLee1224
pushed a commit
to RoyLee1224/airflow
that referenced
this pull request
Dec 3, 2025
apache#58807) * Fix backfill max_active_runs race condition with concurrent schedulers When two schedulers run concurrently, both could start more backfill dag runs than max_active_runs allows. This happened because each scheduler read the count of running dag runs before either committed, causing both to see stale counts and start runs simultaneously. The fix adds row-level locking on the Backfill table. When a scheduler processes backfill dag runs, it first locks the relevant Backfill rows. If another scheduler already holds the lock, the current scheduler skips those backfills rather than potentially violating the max_active_runs constraint. This ensures that only one scheduler can process a given backfill's dag runs at a time, preventing the race condition while remaining non-blocking (schedulers don't wait on each other).
itayweb
pushed a commit
to itayweb/airflow
that referenced
this pull request
Dec 6, 2025
apache#58807) * Fix backfill max_active_runs race condition with concurrent schedulers When two schedulers run concurrently, both could start more backfill dag runs than max_active_runs allows. This happened because each scheduler read the count of running dag runs before either committed, causing both to see stale counts and start runs simultaneously. The fix adds row-level locking on the Backfill table. When a scheduler processes backfill dag runs, it first locks the relevant Backfill rows. If another scheduler already holds the lock, the current scheduler skips those backfills rather than potentially violating the max_active_runs constraint. This ensures that only one scheduler can process a given backfill's dag runs at a time, preventing the race condition while remaining non-blocking (schedulers don't wait on each other).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:Scheduler
including HA (high availability) scheduler
backport-to-v3-1-test
Mark PR with this label to backport to v3-1-test branch
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When two schedulers run concurrently, both could start more backfill
dag runs than max_active_runs allows. This happened because each
scheduler read the count of running dag runs before either committed,
causing both to see stale counts and start runs simultaneously.
The fix adds row-level locking on the Backfill table. When a scheduler
processes backfill dag runs, it first locks the relevant Backfill rows.
If another scheduler already holds the lock, the current scheduler skips
those backfills rather than potentially violating the max_active_runs
constraint.
This ensures that only one scheduler can process a given backfill's
dag runs at a time, preventing the race condition while remaining
non-blocking (schedulers don't wait on each other).
Related: #58752