You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[opt](scheduler) Improve Graceful Shutdown Behavior for BE and FE, and Optimize Query Retry During BE Shutdown (#56601)
Related PR: #23865
This PR includes the following main changes:
1. New BE Parameter: `grace_shutdown_post_delay_seconds`
When using the BE graceful stop feature, after the main process waits
for all currently running tasks to complete, it will continue to wait
for an additional period to ensure that queries still running on other
nodes have also finished.
Since a BE node cannot detect the execution status of tasks on other BE
nodes, this threshold may need to be increased to allow a longer waiting
time.
2. Enhanced BE `api/health` Endpoint
* When the BE has not yet fully started or is in the process of shutting
down, the endpoint will return:
* Message: `"Server is not available"`
* HTTP Code: `200`
* Under normal circumstances:
* Message: `"OK"`
* HTTP Code: `200`
When using `stop_fe.sh --grace`, the FE will wait for currently running
queries to finish before exiting.
Note, Currently, only query tasks are waited for; import and other types
of tasks are not yet included.
In cloud mode, when encountering the error `"No backend available as
scan node"`,
the FE will now internally retry the query to reassign it to other
available BE nodes.
0 commit comments