-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a timeout to the workflow API #937
base: main
Are you sure you want to change the base?
Conversation
6cbb975
to
96b3c55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameter in the schema should have a unit associated with it, imo. Either sec
or min
.
# Maybe move to the job service? | ||
def _stop_job(self, job_id: UUID): | ||
resp = requests.post(urljoin(settings.RAY_JOBS_URL, f"{job_id}/stop"), timeout=5) # 5 seconds | ||
if resp.status_code == HTTPStatus.NOT_FOUND: | ||
raise JobUpstreamError("ray", "job_id not found when retrieving logs") from None | ||
elif resp.status_code != HTTPStatus.OK: | ||
raise JobUpstreamError( | ||
"ray", | ||
f"Unexpected status code getting job logs: {resp.status_code}, error: {resp.text or ''}", | ||
) from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Yeah this does seem like something that should be in the job service.
What's changing
Jobs in a workflow currently expire at a fixed timeout of 600 seconds. Also, timed out jobs are not stopped. This PR provides a timeout field in the workflow that sets the waiting time for all jobs in the workflow, with the expected default value of 600 seconds. It should also stop jobs that have timed out.
Closes #934
How to test it
Steps to test the changes:
Additional notes for reviewers
N/A
I already...
/docs
)