Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Add more functional tests for resource based task claiming #189111

Closed
ymao1 opened this issue Jul 24, 2024 · 1 comment · Fixed by #189431
Closed
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@ymao1
Copy link
Contributor

ymao1 commented Jul 24, 2024

Followup to #185043

Basic mget functional tests were updated so that they would pass but we should add more FT for claiming tasks with different costs. Not sure if there is a straightforward way of orchestrating specific tasks to claim but it would be interesting to try to add a functional test where a large cost task is left unclaimed due to capacity constraints and then claimed in the next cycle.

@ymao1 ymao1 added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 24, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

ymao1 added a commit that referenced this issue Jul 24, 2024
Resolves #185043

## Summary

### Task types can define a `cost` associated with running it

- Optional definition that defaults to `Normal` cost

### New `xpack.task_manager.capacity` setting

- Previous `xpack.task_manager.max_workers` setting is deprecated,
changed to optional, and a warning will be logged if used
- New optional `xpack.task_manager.capacity` setting is added. This
represents the number of normal cost tasks that can be run at one time.
- When `xpack.task_manager.max_workers` is defined and
`xpack.task_manager.capacity` is not defined, a deprecation warning is
logged and the value for max workers will be used as the capacity value.
- When `xpack.task_manager.capacity` is defined and
`xpack.task_manager.max_workers` is not defined, the capacity value will
be used. For the `default` claiming strategy, this capacity value will
be used as the `max_workers` value
- When both values are set, a warning will be logged and the value for
`xpack.task_manager.capacity` will be used
- When neither value is set, the `DEFAULT_CAPACITY` value will be used.

### Updates to `TaskPool` class

- Moves the logic to determine used and available capacity so that we
can switch between capacity calculators based on claim strategy. For the
`default` claim strategy, the capacity will be in units of workers. For
the `mget` claim strategy, the capacity will be in units of task cost.

### Updates to `mget` task claimer

- Updated `taskStore.fetch` call to take a new parameter that will
return a slimmer task document that excludes that task state and task
params. This will improve the I/O efficiency of returning up to 400 task
docs in one query
- Applies capacity constraint to the candidate tasks.
- Bulk gets the full task documents for the tasks we have capacity for
in order to update them to `claiming` status. Uses the
`SavedObjectsClient.bulkGet` which uses an `mget` under the hood.

### Updates the monitoring stats

- Emitting capacity config value and also capacity as translated into
workers and cost.
- Added total cost of running and overdue tasks to the health report

## Tasks for followup issues

- Update mget functional tests to include tasks with different costs. -
#189111
- Update cost of indicator match rule to be Extra Large -
#189112
- Set `xpack.task_manager.capacity` on ECH based on the node size -
#189117

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@ymao1 ymao1 self-assigned this Jul 26, 2024
ymao1 added a commit that referenced this issue Aug 22, 2024
…acity based claiming (#189431)

Resolves #189111

## Summary

Adds jest integration test to test cost capacity based claiming with the
`mget` claim strategy. Using this integration test, we can exclude
running other tasks other than our test types. We register a normal cost
task and an XL cost task. We test both that we can claim tasks up to
100% capacity and that we will stop claiming tasks if the next task puts
us over capacity, even if that means we're leaving capacity on the
table.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
2 participants