Skip to content

Commit

Permalink
Fix unnecessary ssh hanging issue on Ray (#851)
Browse files Browse the repository at this point in the history
* Fix ray hanging ssh issue

* Fix

* change the order back

* Update node status after first attempt
  • Loading branch information
infwinston authored Oct 25, 2022
1 parent 173b819 commit 6545cf5
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
3 changes: 3 additions & 0 deletions sky/skylet/ray_patches/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ def patch() -> None:
_run_patch(resource_demand_scheduler.__file__,
_to_absolute('resource_demand_scheduler.py.patch'))

from ray.autoscaler._private import updater
_run_patch(updater.__file__, _to_absolute('updater.py.patch'))

# Fix the Azure get-access-token (used by ray azure node_provider) timeout issue,
# by increasing the timeout.
# Tracked in https://github.com/Azure/azure-cli/issues/20404#issuecomment-1249575110
Expand Down
7 changes: 7 additions & 0 deletions sky/skylet/ray_patches/updater.py.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
0a1,4
> # From https://github.com/ray-project/ray/blob/releases/2.0.1/python/ray/autoscaler/_private/updater.py
> # Sky patch changes:
> # - Ensure the node state is refreshed before checking the node is terminated.
>
318a319
> self.provider.non_terminated_nodes({})

0 comments on commit 6545cf5

Please sign in to comment.