-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature sbg task retries #5
base: main
Are you sure you want to change the base?
Conversation
We already have a function call get_task_state(). This PR proposes a higher level method called monitor_task() that will return when the task succeeds or fails. |
] | ||
settings = { | ||
0: { | ||
"instance_type": "m5.8xlarge;ebs-gp2;4096", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there instance types hard-coded here? This assumes that is what should be used. There doesn't seem to be a way to override that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @golharam and @yfan2012,
I see what you're saying here. When I wrote this, I just had the 10X launcher in mind. I forgot that the sevenbridges_platform.py
should work for all the launchers. I can implement something similar to how the launcher functioned previously and move the instance types to the config file. I can also add arguments to specify whether or not a task should be retried, as well as a specific instance type to re-run the task on.
Is this something you'd like us to work on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I want to make sure this package stays generic enough to not include anything specific to an individual pipeline. The config settings should be part of the launcher for the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @golharam,
I rewrote this logic to be launcher agnostic, but need permissions to create a PR on this repo. Would you mind granting me those? And the rest of our team if that's okay with you?
Ryan
This PR adds task retry functionality to the SBG platform as the launcher waits for tasks to complete. This is implemented in the
monitor_task()
function which does the following:SevenBridgesInstance
class), it will resubmit the task up to three times on progressively larger instances also defined in theSevenBridgesInstance
class. It returns either a completed task or a failed task that has been retried.