Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _top/searches API #12187

Closed
clintongormley opened this issue Jul 10, 2015 · 10 comments
Closed

Add _top/searches API #12187

clintongormley opened this issue Jul 10, 2015 · 10 comments

Comments

@clintongormley
Copy link
Contributor

Admins faced with busy nodes have no way of knowing what bad queries users are sending to their cluster. The _top/searches API should provide a list of all currently executing queries, how long they have been executing, and the ability to kill a query (where possible).

This could be implemented as follows:

  • a coordinating node adds an ID to each search, and keeps the search request in some data structure until it is complete
  • a GET _top/search request will reach out to all nodes to retrieve currently running requests, their elapsed execution time, and which nodes they are running on
  • POST _top/search/_kill/[searchid] will cause the coordinating node to update the timeout for the request to 0, killing the request as soon as possible (if possible)

NOTE: a script like while (1) {...} is not killable without restarting affected nodes. We can't use thread interrupts because they are buggy.

Inspired by #4329

@nik9000
Copy link
Member

nik9000 commented Jul 10, 2015

It's a fine proposal I'd love to try my hand at! Unless someone else wants
it. I won't be able to be 100% on it for a few weeks and don't want to
cookie lick it but I do care deeply about the issue.
On Jul 10, 2015 12:03 PM, "Clinton Gormley" notifications@github.com
wrote:

Admins faced with busy nodes have no way of knowing what bad queries users
are sending to their cluster. The _top/searches API should provide a list
of all currently executing queries, how long they have been executing, and
the ability to kill a query (where possible).

This could be implemented as follows:

  • a coordinating node adds an ID to each search, and keeps the search
    request in some data structure until it is complete
  • a GET _top/search request will reach out to all nodes to retrieve
    currently running requests, their elapsed execution time, and which nodes
    they are running on
  • POST _top/search/_kill/[searchid] will cause the coordinating node
    to update the timeout for the request to 0, killing the request as soon as
    possible (if possible)

NOTE: a script like while (1) {...} is not killable without restarting
affected nodes. We can't use thread interrupts because they are buggy.

Inspired by #4329 #4329


Reply to this email directly or view it on GitHub
#12187.

@jtharpla
Copy link

This sounds very much like MongoDB's db.currentOp() and db.killOp(), both of which are immensely useful in the MongoDB world. Can't wait to see this functionality in Elasticsearch as well.

@eskibars
Copy link
Contributor

From a HTTP verb perspective, does it make sense to use DELETE rather than POST to kill the query?

@srikanthbirada
Copy link

Can we expect this feature for the coming elasticsearch-2.0 version ?

@clintongormley
Copy link
Contributor Author

It looks like the task management API will fulfill this need. Closing in favour of #15117

@nik9000
Copy link
Member

nik9000 commented Jan 18, 2016

It might be useful to keep this around as an explicit use case for task management. It may be we get all these "for free" using stuff built into task management but it'd be nice to talk about this feature explicitly in docs, maybe have tests for it, etc.

@pickypg
Copy link
Member

pickypg commented Mar 25, 2016

Reopening based on @nik9000's comment.

@pickypg pickypg reopened this Mar 25, 2016
@ppf2
Copy link
Member

ppf2 commented Mar 25, 2016

+1 This is a fairly common request so it will be nice to track it separately, or close it once we have confirmed that it has been released with the task management api. thx!

@evanvolgas
Copy link

Being able to list running queries would be helpful; being able to kill bad ones if needed would be really helpful.

Admins faced with busy nodes have no way of knowing what bad queries users are sending to their cluster.

The key thing that stands out to me about this statement is that it has to do with logging queries at the cluster level. The slow query log doesn't do this. Without a view of queries being sent to the cluster, it's impossible to do any kind of reasoning about where you can get the most bang for your buck helping devs rewrite their queries and/or revising schemas to support the queries they need to run.

If the _top/searches API also had the option to log queries to ES, you could also start to do something like Percona does for MySQL with their Percona Query Digest tool (xref #9172).

IMO, adding a _top/searches API (or perhaps an active/_searches API would make more sense, naming wise?) would be a godsend. If we can log those searches to Elastic when they complete, that would be ever better and would remove a tremendous blindspot that ops people have when trying to help devs reason about their queries and schemas. I really really really like what @clintongormley is suggesting here.

@clintongormley
Copy link
Contributor Author

This feature is being implemented as part of the task management API here #20405

The key thing that stands out to me about this statement is that it has to do with logging queries at the cluster level. The slow query log doesn't do this.

Agreed - this will be tackled separately.

Closing in favour of #20405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants