-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] python task retries=n causes "Requests overridden" error log message for interruptible task #5455
Comments
The error message you are encountering seems to be related to how Flyte handles retries for interruptible tasks. According to the Flyte documentation, setting the Would you like more detail on how to configure retries for interruptible tasks or how to handle resource requests and limits in Flyte? Sources: This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not. Mention @RunLLM to ask follow-up questions. See runllm.com for more! |
@tblom do you have a reproducible example? |
Here is the output of running this locally with
Note that the issue described only occurs when |
@tblom thanks for sharing. So if you set
|
Describe the bug
With flytekit/flyte-binary 1.12.0, If a python task is given the retries=n argument, an error is logged during execution:
Requests overridden on node dn1 (<FlyteLiteral(NodeMetadata) name: "<task name appears here>" retries { retries: 2 }>) without specifying limits. Requests are clamped to original limits.
More details, as described in Flyte Slack Community:
I am running interruptible workflows by setting the interruptible flag at the workflow level, which causes all tasks in the workflow to be run as interruptible (even if they have no interruptible flag specifically set) which is my intention.
This is working, but I see an error associated with each task called by this workflow, referring to the retries param for the task:
Requests overridden on node dn1 (<FlyteLiteral(NodeMetadata) name: "rf_v2_train_and_test_task" retries { retries: 2 }>) without specifying limits. Requests are clamped to original limits.
I have supplied the argument
retries=2
to my tasks, because I want to ensure that if the spot instance is reclaimed, it will be retried once more on a spot instance, and finally, if it is reclaimed again, it will be run on an on-demand instance. That is my understanding of these docs :If you set retries=n, for instance, and the task gets preempted repeatedly, Flyte will retry on a preemptible/spot instance n-1 times and for the last attempt will retry your task on a non-spot (regular) instance. Please note that tasks will only be retried if at least one retry is allowed using the retries parameter in the task decorator.
I don't understand the meaning of the error message, and what I should be doing differently. I was suggested in the Flyte Slack Community by a member of Union that this looks like a bug, and that I should file an issue.
Expected behavior
I expect to not receive the logged error message.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: