You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Head High-Availability Feature, which reduces the impact of Head FO in ray clusters.
Implementation:
Start two or more head nodes at the same time.
The startup process is before initializing the node and starting the head node process. It connects to redis and compete for the leadership through redis's distributed lock.
Only the node that successfully competes for the leadership will execute the subsequent gcs_server/dashboard process startup normally.
The standby node will be stuck in the competition process until the original leader node fails.
After normal startup, the startup process of the leader node will periodically renew the distributed lock of redis to maintain the leader status. Then the startup process will run as a daemon process to check the leadership of this head node.
If the entire pod of the leader node fails or the lease renewal fails, it considers itself as a standby node and kills all processes and itself and then exit the startup process. Exit of the startup process will cause the pod to restart, which is done by kuberay.
The standby node will terminate the competition process when it finds itself as the leader, starting the gcs and dashboard processes, etc.
Then the newly started process in step 6 will be stuck in the competition process as a standby node until the current leader node in step 7 fails.
Use case
Set the environment variable RAY_ENABLE_HEAD_HA to True to enable it.
Dependency
Related Kuberay modification for creating multi head nodes.
Worker nodes must access the head node through the domain name provided by Kuberay.
The text was updated successfully, but these errors were encountered:
Description
Head High-Availability Feature, which reduces the impact of Head FO in ray clusters.
Implementation:
Use case
Set the environment variable RAY_ENABLE_HEAD_HA to True to enable it.
Dependency
Related Kuberay modification for creating multi head nodes.
Worker nodes must access the head node through the domain name provided by Kuberay.
The text was updated successfully, but these errors were encountered: