Speed up refresh: delay the slower ray status
call & use cached IPs.
#2079
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that
status -r
is taking way too long. With some investigation with @Michaelvll, this PR Implemented two optimizations:ray status
callAll tests below are 1-node on-demand GCP clusters. These are all "normal" case where the runtime on the cluster did not become problematic.
handle.external_ips(use_cached_ips=False) -> ray get head-ip/worker-ips
is too slow; replace with NodeProvider calls or cloud CLI/SDK calls?Results:
do
ray status
last + use_cached_ips=TrueSTOPPED -> STOPPED
UP, autostop not set -> UP
UP, autostopped -> STOPPED
UP, autostop set -> UP
INIT -> INIT
(if we do the first optimization only) do
ray status
lastSTOPPED -> STOPPED
UP, autostop not set -> UP
UP, autostopped -> STOPPED
UP, autostop set -> UP
INIT -> INIT
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
:pytest tests/test_smoke.py --generic-cloud aws
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh