Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Handle large numbers of jobs in the GET jobs response #34864

Closed
davidkyle opened this issue Oct 25, 2018 · 7 comments
Closed

[ML] Handle large numbers of jobs in the GET jobs response #34864

davidkyle opened this issue Oct 25, 2018 · 7 comments
Labels
:ml Machine learning team-discuss

Comments

@davidkyle
Copy link
Member

davidkyle commented Oct 25, 2018

Relates to the Job in Index feature branch.

The current GET jobs api returns all jobs no matter how many there when that wild card _all is used as they are all available in the clusterstate. This is not the case when the jobs are index documents as the number returned is limited by the size of the search. The current default search size of 10 is not sufficient. Alternatives are to increase the search size to an arbitrary large number or use scan and scroll to collect the jobs.

Page parameters could be added to the GET jobs request then it would be up to the client to page through jobs but this would be a breaking change if the behaviour of GET jobs was changed to return a subset of the jobs (page size) instead of all. Additionally page params do not work well with scan and scroll which is the preferred way to page results.

@davidkyle davidkyle added :ml Machine learning team-discuss labels Oct 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@davidkyle davidkyle changed the title [ML] Page the GET jobs response [ML] Handle large numbers of jobs in the GET jobs response Oct 25, 2018
@droberts195
Copy link
Contributor

Since adding paging to the GET jobs endpoint is a breaking change I think we should do the following:

  1. In 6.x change the default search size to something big like 100000. This will make the endpoint behave as it does today except for users with extreme, unheard of, numbers of jobs.
  2. In 7.0 introduce paging with a lowish default page size, and document it as a breaking change. This will also require a major overhaul of how the jobs list in the UI works though, so we will have to check that time is available to do the UI work before adding paging.

Alternatively, in 7.0 the UI could be changed to search the .ml-config index directly to get jobs. Then the UI's ability to search and filter the jobs list could be preserved without us having to add a large amount of functionality to the GET jobs endpoint.

@sophiec20
Copy link
Contributor

Current implementation is limited to first 10 jobs. This means that the UI only ever shows the first 10 jobs in the Job List page and the job list picker. You can never see more than these same 10 jobs.

Having such a small page size is a blocker to testing in the short term.

Can we consider increasing default search size in the short term?

@dimitris-athanasiou
Copy link
Contributor

There is another way to approach this without changing the search size setting. We can bite the bullet and implement search/scroll in 6.x. Then we just remove it in master and we add instead a way to query jobs and pagination.

@davidkyle
Copy link
Member Author

The search size has been bumped to 10,000 and the index.max_result_window setting explicitly set to 10,000 in the template and when created by the migrator. This prevents the case where a search errors if the size is above index.max_result_window.

Leaving this issue open as scan and scroll or pagination may be preferred in 7

@dimitris-athanasiou
Copy link
Contributor

I think we might have to revisit the limit before 6.6 goes out and consider setting it even higher but it shouldn't be a blocker for feature freeze. I'll raise a task in the meta issue to make sure we remember.

@davidkyle
Copy link
Member Author

Closing as ancient and superseded by #59405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning team-discuss
Projects
None yet
Development

No branches or pull requests

5 participants