Support pagination in REST APIs #64099

cjcenizal · 2020-10-23T17:28:49Z

Situation

Management UIs such as Index Management currently provide client-side pagination of tabular information. For example, to show the user a table of the indices in the cluster, the client requests a complete list of indices and renders a subset of them into the table. As the user applies filters/search input and paginates through the table, the client performs the necessary logic to determine which indices to render to the table.

Problem

This is problematic for clusters that contain many indices because it requires the Kibana server to store the full set of tabular information in memory before returning it the client. In the above example, this would be the full set of indices returned by GET /*?expand_wildcards=hidden,all as well as those returned by GET /_cat/indices?format=json&h=health,status,index,uuid,pri,rep,docs.count,sth,store.size&expand_wildcards=hidden,all&index=*. If this occupies more memory than is allocated to the Kibana server, it will cause the server to run out of memory and crash. Note that we need to gather more information about the frequency of this problem before we can prioritize a solution to this problem:

We need to measure the relationship between response size and memory occupied
We need to determine an upper bound on the size of these responses for most users
We can compare this upper bound with the 1.4 GB default memory limit for the Kibana server to determine the frequency of this problem

Severity

Based on some conversation with members of the Kibana team, I think it's unlikely that we'll encounter a scenario where the size of an ES REST API response is large enough to cause Kibana to OOM. Given that, I think this is a very low priority feature.

Complexities

If we decide to move forward with implementing server-side pagination, we'll need to consider how the ES APIs will support the current client-side logic that impacts pagination behavior, such as filtering and searching. We'll have to audit our various Management UIs to get a complete picture of what kind of logic the ES APIs will need to support.

Related issues

Sort and truncate pipeline aggregation (Sort and truncate pipeline aggregation #14928)
Pagination and/or filtering for GET /_snapshot//_all endpoint (Pagination and/or filtering for GET /_snapshot/<repo>/_all endpoint #19167)
Consider adding support for OFFSET (Consider adding support for OFFSET #31549)

The text was updated successfully, but these errors were encountered:

calm4wei · 2020-10-24T03:22:02Z

When there are many indexes and shards, executing _cat/indices or _cat/shards on the client side will be very time-consuming and difficult to view the results.
I think the server can support the paging parameters _cat/indices?v&p=page:${pageNumber},size:${pageSize}, and then the rest api can use the incoming paging parameters to request the server results

elasticmachine · 2020-10-26T20:52:17Z

Pinging @elastic/es-core-features (:Core/Features/CAT APIs)

albertzaharovits · 2021-04-12T20:29:41Z

I remember this (or similar) coming up some time ago, when there was no point in time API.

But nowadays I think that the internal functionality behind the point in time API can be used for pagination with sorting. I think it's just a matter of settling on parameter and URL names consistently across our APIs. There's currently one example that we could follow, the relatively recently introduced query watches API, in addition to the old get watch API. Consequently, I think we could pull https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/SearchAfterDocumentsIterator.java from ML into server code.

@martijnvg or @jimczi can you please confirm this is a reasonable approach to extend to other APIs that need to return potentially long lists?

martijnvg · 2021-04-13T14:19:50Z

I think having a general parameters to do some basic form of server side pagination makes sense. I think it comes down to coming up with terminology that makes sense across APIs. Some APIs that return a large response, it doesn't make sense to support pagination, because the response can't be broken down to units (for example cluster state). However in the case in cat APIs, the unit is clear, which is a row. Also I suspect that pagination parameters in the request body isn't possible in all places, due to conflicts with existing request body formats, but having common params in the query string should I think work.

One additional note about the cat APIs. These APIs should only be used via the terminal. Kibana and other systems should use the json based APIs. I know that the cat APIs provide functionality that isn't supported in the core json APIs, but that is something we should address seperately.

albertzaharovits · 2021-04-13T15:00:24Z

Thanks @martijnvg!

In Security we're imminently concerned about the get API Keys API, and others will follow, eg get users, privileges, roles. For these APIs the unit is clear (eg an API key, a user def).

To clarify, I think we have a choice to make between extending the existing Get APIs or introducing new Query APIs.
I think the existing Get APIs will become bloated if they now take generic query, sort and search_after parameters in the request body. For this reason I favor the new Query APIs approach.
For the Get APIs we might get by with building a term query in the request handler and basic asc/desc sorting params, which I think is enough for the basic table views in Kibana, but after a certain scale that's most likely not enough. Overall this feels counter-productive to me, better to reuse common powerful filter and paging parameters, that users might be familiar with from our search APIs.

danhermann · 2021-04-13T15:29:09Z

To clarify, I think we have a choice to make between extending the existing Get APIs or introducing new Query APIs.

I'm not sure if this is what you're suggesting, but I like the idea of a query API where cluster metadata such as API keys, cluster state, ingest pipelines, stats, etc., are exposed as documents in an index with all the features of the ES query DSL available. This would be much like system tables in relational databases where table metadata, system configuration, stats, etc. are exposed through SQL and has the advantage of allowing users to interact with ES through a single unified API. It's not low-hanging fruit, but it would address a variety of requests that we get for being able to sort, paginate, summarize, etc. the responses to various APIs that return large results.

martijnvg · 2021-04-14T09:33:18Z

I like the idea of system indices for Elasticsearch configuration and stats. However most of these things aren't really stored in an index, but we can make them appear as retrievable/queryable? This is just an idea, but we could create a generic system query api:

Watches: GET /_system/_query/watches
Pipelines: GET /_system/_query/pipelines
Api keys: GET /_system/_query/api-keys

Implementing the query parameter for things that aren't stored in a system index is tricky,
perhaps we should also have a generic system list api that implements just pagination and sorting,
and the system query api is an extension of that.

The goal would be a unified way to retrieve config and stats. Maybe something like this would help to get there.

jasontedor · 2021-04-14T17:02:47Z

I would like to push on the requirements a bit, before we embark too far on solving this because it's going to add complexity and increase the API surface area, etc. I start with the premise:

This is problematic for clusters that contain many indices because it requires the Kibana server to store the full set of tabular information in memory before returning it the client. In the above example, this would be the full set of indices returned by GET /*?expand_wildcards=hidden,all as well as those returned by GET /_cat/indices?format=json&h=health,status,index,uuid,pri,rep,docs.count,sth,store.size&expand_wildcards=hidden,all&index=*. If this occupies more memory than is allocated to the Kibana server, it will cause the server to run out of memory and crash.

I struggle with this. If a there are enough indices in the cluster, that even a small relatively small response like the list of indices is going to cause Kibana to run out of memory and crash, then how is Kibana going to deal with processing search and aggregation results into Discover, or Lens? That is, I say relatively small because as soon as we have "enough indices" in the cluster to cause this problem, then surely there's enough data in the deployment to cause problems elsewhere in Kibana. If the Elasticsearch deployment is large, then Kibana needs to be large too. Responses to management APIs like listing indices are going to be small relative to the data that Elasticsearch could return in search results.

albertzaharovits · 2021-04-15T19:48:08Z

Point taken that paginating data that ES holds in memory into the cluster state is unimportant in practice. Maybe supporting the combination of Range and Content-Range HTTP headers at the network layer is appropriate in such cases, but I agree I don't feel the urgency of it.

But, it is this case In Security where the get API Keys API returns docs from the .security index and we should prepare for 10_000s of results. Also unlike the other Security APIs, this uses field conjunction to filter down the results. We need to find a way to break down the response (for the ES node's sake too), which probably entails ordering. It is at this point, that it sounds close to the point-in-time query API.

Could very well a niche use case, so maybe we come with a tailored approach and work from it to see if and how it generalizes?

cjcenizal · 2021-07-15T16:10:11Z

Related to #74350, in which we're implementing pagination for the Get Snapshots API.

elasticsearchmachine · 2023-11-16T18:12:49Z

Pinging @elastic/es-data-management (Team:Data Management)

cjcenizal added needs:triage Requires assignment of a team area label Team:Deployment Management Meta label for Management Experience - Deployment Management team labels Oct 23, 2020

calm4wei mentioned this issue Oct 24, 2020

Request to support quota ratelimit #64102

Closed

mayya-sharipova added >enhancement :Data Management/CAT APIs Text APIs behind /_cat and removed needs:triage Requires assignment of a team area label labels Oct 26, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Oct 26, 2020

mayya-sharipova added :Core/Features/Features and removed Team:Data Management Meta label for data/management team labels Oct 26, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Oct 26, 2020

mayya-sharipova added the discuss label Oct 26, 2020

jtibshirani mentioned this issue May 13, 2021

Add search_after parameter to new terms_enum api #72910

Closed

cjcenizal mentioned this issue Jul 16, 2021

Optimize Index Management's indices list network performance elastic/kibana#106041

Closed

cjcenizal mentioned this issue Sep 28, 2021

Migrate Index Management away from using the cat APIs elastic/kibana#57286

Closed

yuliacech mentioned this issue Feb 23, 2022

[Index Management] Large number of indices take a long time to load elastic/kibana#126242

Open

3 tasks

dakrone added :Data Management/Stats Statistics tracking and retrieval APIs and removed :Data Management/Other labels Nov 16, 2023

elasticsearchmachine removed the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pagination in REST APIs #64099

Support pagination in REST APIs #64099

cjcenizal commented Oct 23, 2020 •

edited

Loading

calm4wei commented Oct 24, 2020

elasticmachine commented Oct 26, 2020

albertzaharovits commented Apr 12, 2021

martijnvg commented Apr 13, 2021

albertzaharovits commented Apr 13, 2021

danhermann commented Apr 13, 2021

martijnvg commented Apr 14, 2021

jasontedor commented Apr 14, 2021

albertzaharovits commented Apr 15, 2021

cjcenizal commented Jul 15, 2021

elasticsearchmachine commented Nov 16, 2023

Support pagination in REST APIs #64099

Support pagination in REST APIs #64099

Comments

cjcenizal commented Oct 23, 2020 • edited Loading

Situation

Problem

Severity

Complexities

Related issues

calm4wei commented Oct 24, 2020

elasticmachine commented Oct 26, 2020

albertzaharovits commented Apr 12, 2021

martijnvg commented Apr 13, 2021

albertzaharovits commented Apr 13, 2021

danhermann commented Apr 13, 2021

martijnvg commented Apr 14, 2021

jasontedor commented Apr 14, 2021

albertzaharovits commented Apr 15, 2021

cjcenizal commented Jul 15, 2021

elasticsearchmachine commented Nov 16, 2023

cjcenizal commented Oct 23, 2020 •

edited

Loading