-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send explicit task logs when marking tasks stuck in queued as failed #35857
Conversation
Using the feature built in apache#32646, when the scheduler marks tasks stuck in queued as failed, send such an explicit log indicating the action to the task logs so that it helps users identify why exactly the task was marked failed in such a case.
"Marking task instance %s stuck in queued as failed. " | ||
"If the task instance has available retries, it will be retried.", | ||
ti, | ||
ti=ti, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but we would lose this information if the user disables the feature. I think both scheduler and task should have the information for the time being.
Another thing that worries me is the performance implication but in theory, I don't think there would be many tasks stuck in queued state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't loose this information. Even if the user disables the feature the TCL would log this information using this site's logger in the scheduler itself https://github.com/apache/airflow/pull/32646/files#diff-fb48bd1344270ccbaadb60b2b7fbc5d74bb5440f908eedd384bd25ada648c05dR91
Yes, we can test the performance. If it hits the performance badly, we can disable the feature and still have logs as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the call site logger is set from the scheduler component while initialising TCL instance here: https://github.com/apache/airflow/pull/32646/files#diff-b0491913f69327937706aea8fc77a71efeb979897898e405ade2b162ad862476R239
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to log both separately. One is for reasons of clarity. The other is that the information you need is slightly different in the two contexts. E.g. with task instance log, the message doesn't need to reference the task instance details, cus it's implied by the context. But these are things we can tweak later since everything is private.
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
Using the feature built in #32646, when the scheduler marks
tasks stuck in queued as failed, send such an explicit log
indicating the action to the task logs so that it helps users
identify why exactly the task was marked failed in such a case.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.