Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increment try_number while clearing deferred tasks. #38984

Closed
wants to merge 2 commits into from

Conversation

tirkarthi
Copy link
Contributor

closes: #38735
related: #38735

This is already done while marking tasks as success/failure. Should this be done for up_for_reschedule tasks too? See also #30669

# The try_number was decremented when setting to up_for_reschedule and deferred.
# Increment it back when changing the state again
if task_instance.state in (TaskInstanceState.DEFERRED, TaskInstanceState.UP_FOR_RESCHEDULE):
task_instance._try_number += 1

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Apr 13, 2024
@eladkal eladkal added this to the Airflow 2.9.1 milestone Apr 13, 2024
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Apr 13, 2024
# When the task is deferred the try_number is decremented so that the same try
# number is used when the task resumes execution to process the event. But in case of
# clearing the try number should be incremented so that the next run doesn't reuse the same try_number
if ti.state == TaskInstanceState.DEFERRED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe clear action is meant to quickly restart things. However, here we are increasing the try_number nothing else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dirrao Sorry, I don't understand your comment. When the task moves from running to deferred the try number is decremented so that when task resumes from the trigger to running the try number is incremented so that same try number is used and helps with logging etc. When the task is cleared then the try number is incremented but since it's in deferred state with try number decremented it causes issues where same log file is used, retry count mismatch etc. for different attempts.

Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should definitely get a second approval on this one.

@bbovenzi
Copy link
Contributor

bbovenzi commented May 2, 2024

cc: @dstandish

@dstandish
Copy link
Contributor

dstandish commented May 2, 2024

this should be obsoleted by this PR #39336

i'm just working through bunch of test fixes cus we make a lot of assertions about try_number because it's always been so problematic

i would suggest we close this one

@tirkarthi
Copy link
Contributor Author

@dstandish I agree the linked PR is a good long term solution but this PR is good enough for a bugfix in 2.9.x series with 2.10 release few months away.

@dstandish
Copy link
Contributor

@dstandish I agree the linked PR is a good long term solution but this PR is good enough for a bugfix in 2.9.x series with 2.10 release few months away.

Sure no problem

@tirkarthi tirkarthi force-pushed the fix-try-number-clear-defer branch from 46f3b89 to f450542 Compare May 3, 2024 01:04
@eladkal
Copy link
Contributor

eladkal commented Jun 3, 2024

@tirkarthi @dstandish do we want to merge this PR for 2.9.2?

@dstandish
Copy link
Contributor

dstandish commented Jun 3, 2024

@tirkarthi @dstandish do we want to merge this PR for 2.9.2?

@eladkal I think we cannot merge this anymore now that #39336 is merged. At least not into main. I think the only way this could be released is if it's patched on a 2.9.2 branch specifically. WDYT?

@eladkal
Copy link
Contributor

eladkal commented Jun 3, 2024

Then its up to @ephraimbuddy
I think the real question here is if do we think there is value? If so then @tirkarthi will need to change the base branch of this PR from main to v2.9 stable

@ephraimbuddy
Copy link
Contributor

Then its up to @ephraimbuddy I think the real question here is if do we think there is value? If so then @tirkarthi will need to change the base branch of this PR from main to v2.9 stable

We can only PR it to v2-9-test but not stable. However, we don't usually do that. We only PR against the test branch if we have issues during the release process and it's mostly CI changes. cc @potiuk

@tirkarthi
Copy link
Contributor Author

I am okay with closing this since main has a long term fix and 2.10 will be released in few months hopefully. We had this patch since we found this during our upgrade to 2.7.0 using deferrable operators and the patch has been working well.

@potiuk potiuk closed this Jun 4, 2024
@potiuk
Copy link
Member

potiuk commented Jun 4, 2024

Yeah. 2.10 is not that far away - and until then @tirkarthi can simply run their forked Airlfow until they upgrade to 2.10.

@ephraimbuddy ephraimbuddy removed this from the Airflow 2.9.2 milestone Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clear a deferred task do not increment the tries
8 participants