-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Task Manager] Add more functional tests for resource based task claiming #189111
Labels
Feature:Task Manager
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
Comments
ymao1
added
Feature:Task Manager
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
labels
Jul 24, 2024
Pinging @elastic/response-ops (Team:ResponseOps) |
ymao1
added a commit
that referenced
this issue
Jul 24, 2024
Resolves #185043 ## Summary ### Task types can define a `cost` associated with running it - Optional definition that defaults to `Normal` cost ### New `xpack.task_manager.capacity` setting - Previous `xpack.task_manager.max_workers` setting is deprecated, changed to optional, and a warning will be logged if used - New optional `xpack.task_manager.capacity` setting is added. This represents the number of normal cost tasks that can be run at one time. - When `xpack.task_manager.max_workers` is defined and `xpack.task_manager.capacity` is not defined, a deprecation warning is logged and the value for max workers will be used as the capacity value. - When `xpack.task_manager.capacity` is defined and `xpack.task_manager.max_workers` is not defined, the capacity value will be used. For the `default` claiming strategy, this capacity value will be used as the `max_workers` value - When both values are set, a warning will be logged and the value for `xpack.task_manager.capacity` will be used - When neither value is set, the `DEFAULT_CAPACITY` value will be used. ### Updates to `TaskPool` class - Moves the logic to determine used and available capacity so that we can switch between capacity calculators based on claim strategy. For the `default` claim strategy, the capacity will be in units of workers. For the `mget` claim strategy, the capacity will be in units of task cost. ### Updates to `mget` task claimer - Updated `taskStore.fetch` call to take a new parameter that will return a slimmer task document that excludes that task state and task params. This will improve the I/O efficiency of returning up to 400 task docs in one query - Applies capacity constraint to the candidate tasks. - Bulk gets the full task documents for the tasks we have capacity for in order to update them to `claiming` status. Uses the `SavedObjectsClient.bulkGet` which uses an `mget` under the hood. ### Updates the monitoring stats - Emitting capacity config value and also capacity as translated into workers and cost. - Added total cost of running and overdue tasks to the health report ## Tasks for followup issues - Update mget functional tests to include tasks with different costs. - #189111 - Update cost of indicator match rule to be Extra Large - #189112 - Set `xpack.task_manager.capacity` on ECH based on the node size - #189117 --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
48 tasks
ymao1
added a commit
that referenced
this issue
Aug 22, 2024
…acity based claiming (#189431) Resolves #189111 ## Summary Adds jest integration test to test cost capacity based claiming with the `mget` claim strategy. Using this integration test, we can exclude running other tasks other than our test types. We register a normal cost task and an XL cost task. We test both that we can claim tasks up to 100% capacity and that we will stop claiming tasks if the next task puts us over capacity, even if that means we're leaving capacity on the table. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Feature:Task Manager
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
Followup to #185043
Basic mget functional tests were updated so that they would pass but we should add more FT for claiming tasks with different costs. Not sure if there is a straightforward way of orchestrating specific tasks to claim but it would be interesting to try to add a functional test where a large cost task is left unclaimed due to capacity constraints and then claimed in the next cycle.
The text was updated successfully, but these errors were encountered: