-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Description
Admins faced with busy nodes have no way of knowing what bad queries users are sending to their cluster. The _top/searches API should provide a list of all currently executing queries, how long they have been executing, and the ability to kill a query (where possible).
This could be implemented as follows:
- a coordinating node adds an ID to each search, and keeps the search request in some data structure until it is complete
- a
GET _top/searchrequest will reach out to all nodes to retrieve currently running requests, their elapsed execution time, and which nodes they are running on POST _top/search/_kill/[searchid]will cause the coordinating node to update the timeout for the request to 0, killing the request as soon as possible (if possible)
NOTE: a script like while (1) {...} is not killable without restarting affected nodes. We can't use thread interrupts because they are buggy.
Inspired by #4329
evanvolgas