Closed
Description
Admins faced with busy nodes have no way of knowing what bad queries users are sending to their cluster. The _top/searches
API should provide a list of all currently executing queries, how long they have been executing, and the ability to kill a query (where possible).
This could be implemented as follows:
- a coordinating node adds an ID to each search, and keeps the search request in some data structure until it is complete
- a
GET _top/search
request will reach out to all nodes to retrieve currently running requests, their elapsed execution time, and which nodes they are running on POST _top/search/_kill/[searchid]
will cause the coordinating node to update the timeout for the request to 0, killing the request as soon as possible (if possible)
NOTE: a script like while (1) {...}
is not killable without restarting affected nodes. We can't use thread interrupts because they are buggy.
Inspired by #4329