-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add search backpressure cancellation at the coordinator level #5173
Comments
With the existing trackers for SearchShardTask and with minimum threshold values, parent task(SearchTask) is getting cancelled when the node is in duress. Below are the observed logs:
Will introduce new trackers and define new thresholds for parent task cancellation. |
Introduced these settings which will be dynamically configurable:
After adding the above settings,
|
Added
|
After multiple iterations, we have introduced these settings which will be dynamically configurable:
In addition to the above settings, we have also deprecated a few settings as mentioned below:
We have also introduced replacement settings for the above settings:
After adding the above settings,
|
This is fixed as a part of #5605. |
Is your feature request related to a problem? Please describe.
#1042 aims to build back-pressure support for search requests. As a part of #4575, we have already added cancellation for SearchShardTasks based on resource consumption. This feature aims to cancel the resource guzzling queries. As a part of #3982, we are already tracking the resource consumption of SearchTasks, using which we will make cancellation decision for a query.
Describe the solution you'd like
Cancelling on-going most resource intensive search requests on a coordinator node based on the resource consumption of SearchTask, if the resource limits for that node have started breaching the assigned limits, and there is no recovery for a certain time threshold. The back-pressure model should support identification of queries which are most resource guzzling with minimal wasteful work. Moreover, if partial results is not enabled for a query, cancellation of parent task will result into cancellation of all children tasks as well.
Describe alternatives you've considered
Another alternative we have considered is, rather than only considering resource stats from the parent task, we can consider the resource consumption by children tasks as well. However, by just looking at the children task consumptions, we cannot correctly estimate the resources required by the parent task and hence we will consider the resource consumption only by the parent task.
Additional context
Just by looking at the resource consumption or aggregating the resource stats of child tasks, we cannot get the estimate of resource consumption of the coordinator task. Hence we cannot estimate whether a search task will cause the node go in duress or not.
The text was updated successfully, but these errors were encountered: