[Response Ops][Alerting] Exposing background worker utilization load metric #153600

ymao1 · 2023-03-23T19:55:43Z

Summary

This PR exposes a metric that represents the background task worker utilization load at the end of each polling cycle. This is calculated as (# of workers already busy + claimed tasks) / max workers, which comes out to the number of workers in use at the end of each claim cycle. This metric is then averaged over the previous 15 seconds (or 5 polling cycles). This window size is configurable using xpack.task_manager.worker_utilization_running_average_window

This PR exposes this metric in the existing /internal/task_manager/_background_task_utilization API but also adds a public version of this API (/api/task_manager/_background_task_utilization) that only exposes this metric. We need the public API for serverless but I thought we could keep the private route as well to expose experimental metrics without the overhead of supporting them long term.

To Verify

Run ES and Kibana with this branch
Navigate to /internal/task_manager/_background_task_utilization and see the new metric exposed as stats.value.load along with the existing adhoc and recurring metrics
Navigate to /api/task_manager/_background_task_utilization and see only the load metric returned from the public API
You can also create some rules to see the load metric increase.

…le and expose in stats endpoint

ymao1 · 2023-03-29T16:25:50Z

@elasticmachine merge upstream

ymao1 · 2023-04-17T18:18:44Z

@elasticmachine merge upstream

…task-worker-utilization

…mao1/kibana into poc/background-task-worker-utilization

…task-worker-utilization

elasticmachine · 2023-04-24T22:49:25Z

Pinging @elastic/response-ops (Team:ResponseOps)

ymao1 · 2023-04-24T22:49:43Z

@elasticmachine merge upstream

ymao1 · 2023-04-25T12:22:10Z

@kobelb I made an /api and an /internal endpoint so we could have experimental metrics without the overhead of having to support them but if you think that's overkill, I can remove the /internal endpoint and remove the adhoc and recurring counters.

…task-worker-utilization

kobelb · 2023-04-25T15:58:12Z

@kobelb I made an /api and an /internal endpoint so we could have experimental metrics without the overhead of having to support them but if you think that's overkill, I can remove the /internal endpoint and remove the adhoc and recurring counters.

Works for me!

ymao1 · 2023-05-01T12:01:41Z

@elasticmachine merge upstream

ymao1 · 2023-05-01T18:07:23Z

@elasticmachine merge upstream

mikecote

Tested locally and changes LGTM! Saw the load go up to 100 when overloaded and back down to ~6 when idle from alerting rules.

x-pack/plugins/task_manager/server/monitoring/background_task_utilization_statistics.ts

…task-worker-utilization

jbudz

kibana-docker

kibana-ci · 2023-05-02T15:22:49Z

💚 Build Succeeded

Buildkite Build
Commit: 54fd3c3

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`enterpriseSearch`	19	21	+2
`securitySolution`	398	401	+3
`taskManager`	24	23	-1
total			+4

Total ESLint disabled count

id	before	after	diff
`enterpriseSearch`	20	22	+2
`securitySolution`	478	481	+3
`taskManager`	24	23	-1
total			+4

History

💔 Build #124755 failed 111c7ea
💛 Build #124567 was flaky 1fd04ef
💔 Build #124486 failed 75324dd
💛 Build #123043 was flaky 1c9971f
💚 Build #122723 succeeded bd34bcb

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

Adding worker utilization event to calculate load at end of claim cyc…

d1bb17f

…le and expose in stats endpoint

ymao1 mentioned this pull request Mar 23, 2023

Proof-of-concept: background task worker utilization autoscaling metric #152945

Closed

ymao1 added the ci:cloud-deploy Create or update a Cloud deployment label Mar 29, 2023

Merge branch 'main' into poc/background-task-worker-utilization

f5afb6a

kibanamachine and others added 10 commits April 17, 2023 14:18

Merge branch 'main' into poc/background-task-worker-utilization

a6f7127

Merge branch 'main' of github.com:elastic/kibana into poc/background-…

6b76317

…task-worker-utilization

Merge branch 'main' of github.com:elastic/kibana into poc/background-…

cc12288

…task-worker-utilization

Cleanup

f23d3c8

Merge branch 'poc/background-task-worker-utilization' of github.com:y…

c518ee6

…mao1/kibana into poc/background-task-worker-utilization

Creating a private and public route

83f22b1

Merge branch 'main' of github.com:elastic/kibana into poc/background-…

54e0088

…task-worker-utilization

Updating functional tests

ae84377

Fixing types

09155b1

Hardcoding the window size

a6df795

ymao1 changed the title ~~Adding worker utilization event to calculate load at end of claim cyc…~~ [Response Ops][Alerting] Exposing background worker utilization load metric Apr 21, 2023

ymao1 self-assigned this Apr 24, 2023

ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.9.0 labels Apr 24, 2023

ymao1 marked this pull request as ready for review April 24, 2023 22:49

ymao1 requested a review from a team as a code owner April 24, 2023 22:49

ymao1 requested a review from kobelb April 24, 2023 22:49

Merge branch 'main' into poc/background-task-worker-utilization

bd34bcb

ymao1 added 2 commits April 25, 2023 10:42

Merge branch 'main' of github.com:elastic/kibana into poc/background-…

b14c81f

…task-worker-utilization

Using configurable running average window size

1c9971f

ymao1 mentioned this pull request Apr 27, 2023

Add task_manager_utilization metricset to Kibana metricbeat module elastic/beats#35027

Closed

5 tasks

Merge branch 'main' into poc/background-task-worker-utilization

75324dd

ymao1 removed the ci:cloud-deploy Create or update a Cloud deployment label May 1, 2023

Merge branch 'main' into poc/background-task-worker-utilization

1fd04ef

mikecote approved these changes May 2, 2023

View reviewed changes

x-pack/plugins/task_manager/server/monitoring/background_task_utilization_statistics.ts Outdated Show resolved Hide resolved

ymao1 added 4 commits May 2, 2023 08:02

PR feedback

111c7ea

PR feedback

a6975e7

Merge branch 'main' of github.com:elastic/kibana into poc/background-…

4d798e8

…task-worker-utilization

Adding config to docker allowlist

54fd3c3

ymao1 requested a review from a team as a code owner May 2, 2023 13:33

jbudz approved these changes May 2, 2023

View reviewed changes

ymao1 merged commit 17487b8 into elastic:main May 2, 2023

ymao1 deleted the poc/background-task-worker-utilization branch May 2, 2023 15:32

kibanamachine added the backport:skip This commit does not require backporting label May 2, 2023

This was referenced May 2, 2023

Add background task worker utilization to the background task utilization endpoint #155761

Closed

Investigate collecting background task load metric from Kibana using Metricbeat elastic/beats#35312

Closed

ymao1 mentioned this pull request May 11, 2023

Adding HTTP metrics collection to Kibana integration elastic/integrations#6169

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Response Ops][Alerting] Exposing background worker utilization load metric #153600

[Response Ops][Alerting] Exposing background worker utilization load metric #153600

ymao1 commented Mar 23, 2023 •

edited

Loading

ymao1 commented Mar 29, 2023

ymao1 commented Apr 17, 2023

elasticmachine commented Apr 24, 2023

ymao1 commented Apr 24, 2023

ymao1 commented Apr 25, 2023

kobelb commented Apr 25, 2023

ymao1 commented May 1, 2023

ymao1 commented May 1, 2023

mikecote left a comment

jbudz left a comment

kibana-ci commented May 2, 2023

ESLint disabled line counts

Total ESLint disabled count

[Response Ops][Alerting] Exposing background worker utilization load metric #153600

[Response Ops][Alerting] Exposing background worker utilization load metric #153600

Conversation

ymao1 commented Mar 23, 2023 • edited Loading

Summary

To Verify

ymao1 commented Mar 29, 2023

ymao1 commented Apr 17, 2023

elasticmachine commented Apr 24, 2023

ymao1 commented Apr 24, 2023

ymao1 commented Apr 25, 2023

kobelb commented Apr 25, 2023

ymao1 commented May 1, 2023

ymao1 commented May 1, 2023

mikecote left a comment

Choose a reason for hiding this comment

jbudz left a comment

Choose a reason for hiding this comment

kibana-ci commented May 2, 2023

💚 Build Succeeded

Metrics [docs]

ESLint disabled line counts

Total ESLint disabled count

History

ymao1 commented Mar 23, 2023 •

edited

Loading