Add best-effort cleanup to EcsRunTaskOperator on post-start failure#61051
Merged
shahar1 merged 1 commit intoapache:mainfrom Feb 10, 2026
Merged
Conversation
vincbeck
approved these changes
Jan 26, 2026
Closed
2 tasks
5ac6336 to
3aedb26
Compare
occur after successful task start (e.g. waiter failures due to missing DescribeTasks permissions). This change adds best-effort cleanup when post-start steps fail by attempting to stop tasks started by the operator. Cleanup errors are logged but do not mask the original exception. Tests cover successful cleanup on failure and ensure cleanup failures do not override the original error.
3aedb26 to
479e0c5
Compare
vincbeck
approved these changes
Jan 29, 2026
81 tasks
Alok-kumar-priyadarshi
pushed a commit
to Alok-kumar-priyadarshi/airflow
that referenced
this pull request
Feb 11, 2026
Ratasa143
pushed a commit
to Ratasa143/airflow
that referenced
this pull request
Feb 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added best-effort cleanup to
EcsRunTaskOperatorto ensure ECS tasks are stopped when failures occur after a task has been successfully started. Cleanup behavior is guarded by a flag and is opted in by default.Previously, the operator could successfully start an ECS task via
RunTaskand then fail during post-start steps (for example, when waiting for task completion withwait_for_completion=Trueand missingecs:DescribeTaskspermissions). In these cases, the Airflow task failed while the ECS task continued running in AWS.The operator now attempts to stop any ECS task that was started by the current task instance if an exception is raised after task start. Cleanup is performed opportunistically and does not mask or replace the original exception if stopping the task fails. It is only triggered for post-start failures (throwing
WaiterError), ensuring it runs only when an ECS task has been created and avoiding interception of non-AWS exceptions.Rationale
EcsRunTaskOperatormanages the lifecycle of an external resource whose execution extends beyond the lifetime of the Airflow task. If task start succeeds but subsequent execution steps fail, Airflow can no longer reliably observe or manage the running ECS task, potentially leaving resources running unexpectedly.Failures after task start can occur for multiple reasons, including IAM permission errors (for example, missing
ecs:DescribeTasks) or loss of access to systems used during task execution. Attempting best-effort cleanup in these scenarios avoids leaving unmanaged ECS tasks running while preserving existing failure semantics.Cleanup is only attempted when the operator can confidently determine that the ECS task was started by the current execution. This is achieved by tracking whether the task was started during the current run and using the task ARN returned by
RunTask. This avoids interfering with pre-existing tasks in reattach scenarios while still preventing resource leaks on post-start failures.Restricting cleanup to
WaiterErrorprevents unintended side effects from catching unrelated failures while still addressing orphaned ECS tasks created during execution.Tests
Documentation
The docstring for
EcsRunTaskOperatorhas been updated with a brief description of the new flagstop_task_on_failure.Backwards Compatibility
A new flag called
stop_task_on_failurehas been added toEcsRunTaskOperator' with a default setting ofTrue. Cleanup will now be attempted on a best-effort basis ifWaiterError` is encountered.Closes: #61050