Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paging past 10k items #111

Closed
matthewhanson opened this issue Aug 24, 2021 · 3 comments
Closed

Paging past 10k items #111

matthewhanson opened this issue Aug 24, 2021 · 3 comments
Assignees
Labels
medium Effort
Milestone

Comments

@matthewhanson
Copy link
Member

matthewhanson commented Aug 24, 2021

Paging past 10k items throws a meaningless (to users) error: search_phase_execution_exception

up to 10k items

https://earth-search.aws.element84.com/v0/search?limit=100&page=100

past 10k items

https://earth-search.aws.element84.com/v0/search?limit=100&page=101

This limit can be changed:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-max-result-window

or search_after can be used which would change how pagination worked in stac-server:
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after

Even if the limit were raised (which will consume more memory and may be less performance, performance testing would be required) there will still be a limit, only higher, so at the very least stac-server should throw a meaningful error message when trying to page past 10k items.

@matthewhanson matthewhanson added this to the 0.4.0 milestone Aug 24, 2021
@banesullivan
Copy link

Is there any plan to use search_after so that stac-server can properly handle deep pages?

@matthewhanson
Copy link
Member Author

It hasn't been determined yet, but likely yes as it keeps coming up. Don't have an ETA on this though.

The next version of pystac-client, which will come in Q1 of 2022 will allow for splitting up searches and making async requests, so that would be another way, and perhaps better way, to get around this limit.

@philvarner
Copy link
Collaborator

philvarner commented Feb 14, 2022

search_after with a stable sort like created ascending is the preferred way to do this.

One note is that ES uses milliseconds since the epoch for the search_after value for datatime fields (which is unclear from their docs).

A next value that would be guaranteed to be stable would be like this, also using the itemid and collection to ensure documents with the same creation timestamp don't get arbitrarily reordered in the queries and mess up pagination:

?next={created_ms_since_epoch},{itemId},{collection}

Update: it looks like ES now supports specifying the ISO8601 datetime instead of converting it to seconds, but I don't know if that's supported in the 7.12 we're using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium Effort
Projects
None yet
Development

No branches or pull requests

3 participants