Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Aug 12, 2025

(Manual backport of #53435)

In issue #51301, it was reported that failure callbacks do not run for task instances that get stuck in queued and fail in Airflow 2.10.5. This is happening due to the changes introduced in PR #43520 . In this PR, logic was introduced to requeue tasks that get stuck in queued (up to two times by default) before failing them.

Previously, the executor's fail method was called when the task needed to be failed after max requeue attempts. This was replaced by the task instance's set_state method in the PR ti.set_state(TaskInstanceState.FAILED, session=session). Without the executor's fail method being called, failure callbacks will not be executed for such task instances. Therefore, I changed the code to call the executor's fail method instead in Airflow 3.
(cherry picked from commit 6da77b1)

Co-authored-by: Karen Braganza karenbraganza15@gmail.com


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@ashb ashb requested a review from XD-DENG as a code owner August 12, 2025 10:32
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Aug 12, 2025
@ashb ashb force-pushed the backport-6da77b1-v3-0-test branch from eb446bb to c856a17 Compare August 12, 2025 21:36
…#53435)

In issue #51301, it was reported that failure callbacks do not run for task instances that get stuck in queued and fail in Airflow 2.10.5. This is happening due to the changes introduced in PR #43520 . In this PR, logic was introduced to requeue tasks that get stuck in queued (up to two times by default) before failing them.

Previously, the executor's fail method was called when the task needed to be failed after max requeue attempts. This was replaced by the task instance's set_state method in the PR ti.set_state(TaskInstanceState.FAILED, session=session). Without the executor's fail method being called, failure callbacks will not be executed for such task instances. Therefore, I changed the code to call the executor's fail method instead in Airflow 3.
(cherry picked from commit 6da77b1)

Co-authored-by: Karen Braganza <karenbraganza15@gmail.com>
@ashb ashb force-pushed the backport-6da77b1-v3-0-test branch from c856a17 to 9c8b8ac Compare August 13, 2025 09:21
@kaxil kaxil merged commit 60ee14c into v3-0-test Aug 13, 2025
54 checks passed
@kaxil kaxil deleted the backport-6da77b1-v3-0-test branch August 13, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants