Replace `_scroll` with the ability to acquire point-in-time views + search_after #26472

jpountz · 2017-09-01T13:29:33Z

_scroll comes with limitations such as the fact that it only allows to go forward, which can be an issue if there are connectivity issues while the client is retrieving one page as this particular page cannot be retried.

We would like to replace _scroll with an API that allows to open point-in-time views of the data and then allows to paginate through it using search_after. We could even keep the sort by _doc optimization if users provide "track_total_hits": false in their request.

In FixitFriday it was suggested that we call this API Jim's API.

The text was updated successfully, but these errors were encountered:

mayya-sharipova · 2017-10-24T11:01:27Z

Will this API also allow to go backward certain doc, for example using search_before? or for retrieving a previous page, we can use search_after with a reverse sort order?

jpountz · 2017-10-24T14:13:08Z

I think your idea to go backwards is the right one. Alternatively, clients could remember sort values associated with previously visited pages so that they can go back efficiently. So we don't need to add any new API/option?

mayya-sharipova · 2017-10-24T22:11:54Z

Right, we don't need another API/option for going backwards if docs are returned in the stable sorted order; in this case we can just reverse the the sort order.
When you were saying "clients could remember sort values associated with previously visited pages", what are these "sort values"? Are these docs as in search_after parameter?

jpountz · 2017-10-25T09:03:02Z

Sort values are values that each document was sorted against and that we return in the response. For instance if you sort by score then price, those could be [2.2, 100] and if the client application can store somewhere in the session what sort values have been passed to search_after in order to get hits from page X, then it could easily go to page X again by reusing those sort values, it does not need to cache the entire response.

This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when elastic#26472 is implemented. Closes elastic#43599

) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when #26472 is implemented. Closes #43599

ssmelov · 2020-03-27T19:01:57Z

Do you know if there is any design document for this new API?

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

jimczi · 2020-09-08T11:06:50Z

This feature has been merged in #61062, hence closing.

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: elastic#52741 - Allow searches with a specific reader context: elastic#53989 - Add the ability to acquire readers in IndexShard: elastic#54966 Relates elastic#46523 Relates elastic#26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

jpountz added :Scroll high hanging fruit labels Sep 1, 2017

jpountz mentioned this issue Sep 1, 2017

retryable scroll #26433

Closed

jpountz changed the title ~~Replace _scroll with the ability to acquite point-in-time views + search_after~~ Replace _scroll with the ability to acquire point-in-time views + search_after Sep 1, 2017

nik9000 mentioned this issue Sep 5, 2017

Reindex API: Reindex task es_rejected_execution_exception search queue failure #26153

Closed

jordanlibrande mentioned this issue Sep 20, 2017

Retries are dangerous for scrolled search queries olivere/elastic#610

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Scroll labels Feb 14, 2018

jpountz mentioned this issue Mar 15, 2018

Java RestHighLevelClient throws ParsingException if scroll response contains suggests #28873

Closed

jimczi mentioned this issue Apr 8, 2019

Point in time reader context for multiple (scroll) queries #25674

Closed

$@polyfractal$ polyfractal mentioned this issue Jul 3, 2019

Terms agg: calculate aggs on 'other' bucket #12411

Closed

jimczi mentioned this issue Sep 10, 2019

Move the state of search requests to the coordinator node #46523

Closed

8 tasks

This was referenced Sep 13, 2019

[ML] Consider using search_after instead of scroll in datafeeds #29781

Open

[ML] Make sort order for datafeeds deterministic #39187

Open

jimczi mentioned this issue Nov 15, 2019

Add a cluster setting to disallow loading fielddata on _id field #49166

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

rjernst added the Team:Search Meta label for search team label May 4, 2020

dnhatn mentioned this issue May 9, 2020

Introduce search context - point in time view of indices #56480

Closed

dnhatn mentioned this issue Aug 12, 2020

Introduce point in time APIs in x-pack basic #61062

Merged

dnhatn mentioned this issue Aug 24, 2020

Network-safe scroll API #61449

Closed

dnhatn mentioned this issue Sep 2, 2020

Introduce point in time APIs in x-pack basic #61872

Closed

bpintea mentioned this issue Sep 2, 2020

SQL: replace the scroll with PIT for data batching #61873

Closed

jimczi closed this as completed Sep 8, 2020

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

pquentin mentioned this issue Feb 2, 2024

Too Many Requests /_search/scroll when using helpers.scan elastic/elasticsearch-py#2426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `_scroll` with the ability to acquire point-in-time views + search_after #26472

Replace `_scroll` with the ability to acquire point-in-time views + search_after #26472

jpountz commented Sep 1, 2017

mayya-sharipova commented Oct 24, 2017

jpountz commented Oct 24, 2017

mayya-sharipova commented Oct 24, 2017

jpountz commented Oct 25, 2017

ssmelov commented Mar 27, 2020

jimczi commented Sep 8, 2020

Replace _scroll with the ability to acquire point-in-time views + search_after #26472

Replace _scroll with the ability to acquire point-in-time views + search_after #26472

Comments

jpountz commented Sep 1, 2017

mayya-sharipova commented Oct 24, 2017

jpountz commented Oct 24, 2017

mayya-sharipova commented Oct 24, 2017

jpountz commented Oct 25, 2017

ssmelov commented Mar 27, 2020

jimczi commented Sep 8, 2020

Replace `_scroll` with the ability to acquire point-in-time views + search_after #26472

Replace `_scroll` with the ability to acquire point-in-time views + search_after #26472