-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pagination in REST APIs #64099
Comments
When there are many indexes and shards, executing _cat/indices or _cat/shards on the client side will be very time-consuming and difficult to view the results. |
Pinging @elastic/es-core-features (:Core/Features/CAT APIs) |
I remember this (or similar) coming up some time ago, when there was no point in time API. But nowadays I think that the internal functionality behind the point in time API can be used for pagination with sorting. I think it's just a matter of settling on parameter and URL names consistently across our APIs. There's currently one example that we could follow, the relatively recently introduced query watches API, in addition to the old get watch API. Consequently, I think we could pull https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/SearchAfterDocumentsIterator.java from ML into server code. @martijnvg or @jimczi can you please confirm this is a reasonable approach to extend to other APIs that need to return potentially long lists? |
I think having a general parameters to do some basic form of server side pagination makes sense. I think it comes down to coming up with terminology that makes sense across APIs. Some APIs that return a large response, it doesn't make sense to support pagination, because the response can't be broken down to units (for example cluster state). However in the case in cat APIs, the unit is clear, which is a row. Also I suspect that pagination parameters in the request body isn't possible in all places, due to conflicts with existing request body formats, but having common params in the query string should I think work. One additional note about the cat APIs. These APIs should only be used via the terminal. Kibana and other systems should use the json based APIs. I know that the cat APIs provide functionality that isn't supported in the core json APIs, but that is something we should address seperately. |
Thanks @martijnvg! In Security we're imminently concerned about the get API Keys API, and others will follow, eg get users, privileges, roles. For these APIs the unit is clear (eg an API key, a user def). To clarify, I think we have a choice to make between extending the existing Get APIs or introducing new Query APIs. |
I'm not sure if this is what you're suggesting, but I like the idea of a query API where cluster metadata such as API keys, cluster state, ingest pipelines, stats, etc., are exposed as documents in an index with all the features of the ES query DSL available. This would be much like system tables in relational databases where table metadata, system configuration, stats, etc. are exposed through SQL and has the advantage of allowing users to interact with ES through a single unified API. It's not low-hanging fruit, but it would address a variety of requests that we get for being able to sort, paginate, summarize, etc. the responses to various APIs that return large results. |
I like the idea of system indices for Elasticsearch configuration and stats. However most of these things aren't really stored in an index, but we can make them appear as retrievable/queryable? This is just an idea, but we could create a generic system query api:
Implementing the query parameter for things that aren't stored in a system index is tricky, The goal would be a unified way to retrieve config and stats. Maybe something like this would help to get there. |
I would like to push on the requirements a bit, before we embark too far on solving this because it's going to add complexity and increase the API surface area, etc. I start with the premise:
I struggle with this. If a there are enough indices in the cluster, that even a small relatively small response like the list of indices is going to cause Kibana to run out of memory and crash, then how is Kibana going to deal with processing search and aggregation results into Discover, or Lens? That is, I say relatively small because as soon as we have "enough indices" in the cluster to cause this problem, then surely there's enough data in the deployment to cause problems elsewhere in Kibana. If the Elasticsearch deployment is large, then Kibana needs to be large too. Responses to management APIs like listing indices are going to be small relative to the data that Elasticsearch could return in search results. |
Point taken that paginating data that ES holds in memory into the cluster state is unimportant in practice. Maybe supporting the combination of Range and Content-Range HTTP headers at the network layer is appropriate in such cases, but I agree I don't feel the urgency of it. But, it is this case In Security where the get API Keys API returns docs from the Could very well a niche use case, so maybe we come with a tailored approach and work from it to see if and how it generalizes? |
Related to #74350, in which we're implementing pagination for the Get Snapshots API. |
Pinging @elastic/es-data-management (Team:Data Management) |
Situation
Management UIs such as Index Management currently provide client-side pagination of tabular information. For example, to show the user a table of the indices in the cluster, the client requests a complete list of indices and renders a subset of them into the table. As the user applies filters/search input and paginates through the table, the client performs the necessary logic to determine which indices to render to the table.
Problem
This is problematic for clusters that contain many indices because it requires the Kibana server to store the full set of tabular information in memory before returning it the client. In the above example, this would be the full set of indices returned by
GET /*?expand_wildcards=hidden,all
as well as those returned byGET /_cat/indices?format=json&h=health,status,index,uuid,pri,rep,docs.count,sth,store.size&expand_wildcards=hidden,all&index=*
. If this occupies more memory than is allocated to the Kibana server, it will cause the server to run out of memory and crash. Note that we need to gather more information about the frequency of this problem before we can prioritize a solution to this problem:Severity
Based on some conversation with members of the Kibana team, I think it's unlikely that we'll encounter a scenario where the size of an ES REST API response is large enough to cause Kibana to OOM. Given that, I think this is a very low priority feature.
Complexities
If we decide to move forward with implementing server-side pagination, we'll need to consider how the ES APIs will support the current client-side logic that impacts pagination behavior, such as filtering and searching. We'll have to audit our various Management UIs to get a complete picture of what kind of logic the ES APIs will need to support.
Related issues
The text was updated successfully, but these errors were encountered: