-
Notifications
You must be signed in to change notification settings - Fork 7k
[Utils] Make worker connection timeout parameters configurable #58372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Utils] Make worker connection timeout parameters configurable #58372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes the Ray worker connection timeout parameters configurable through environment variables, which is a valuable addition for environments with slower startup times. The change correctly centralizes the configuration at the module level and removes redundant code.
My main feedback is regarding the robustness of parsing the environment variables. The current implementation can lead to a ValueError and crash the application if the environment variables are set to non-numeric values. I've left a comment with a suggestion to make this more robust by using os.environ.get() and adding error handling, ideally by leveraging existing helper functions within Ray to ensure consistency and prevent crashes from misconfiguration.
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Left an adjustment on the naming to make it consistent w/ other constants we have.
Please also sign off your commits to fix the DCO build.
f0da4f0 to
f11b006
Compare
f11b006 to
730fb03
Compare
|
The linter is failing in CI: https://buildkite.com/ray-project/premerge/builds/53116 Instructions for running it locally are here: https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting |
|
@edoakes Thanks a lot for reviewing this. Has fixed! |
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution
…roject#58372) When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable. --------- Signed-off-by: OneSizeFitsQuorum <txypotato@gmail.com>
…roject#58372) When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable. --------- Signed-off-by: OneSizeFitsQuorum <txypotato@gmail.com>
…roject#58372) When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable. --------- Signed-off-by: OneSizeFitsQuorum <txypotato@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…roject#58372) When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable. --------- Signed-off-by: OneSizeFitsQuorum <txypotato@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…roject#58372) When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable. --------- Signed-off-by: OneSizeFitsQuorum <txypotato@gmail.com>
Description
When starting a Ray cluster in a Kuberay environment, the startup process may sometimes be slow. In such cases, it is necessary to increase the timeout duration for proper startup, otherwise, the error "ray client connection timeout" will occur. Therefore, we need to make the timeout and retry policies for the Ray worker configurable.