-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Alerting] Exposing background worker utilization load metric #153600
[Response Ops][Alerting] Exposing background worker utilization load metric #153600
Conversation
…le and expose in stats endpoint
@elasticmachine merge upstream |
@elasticmachine merge upstream |
…task-worker-utilization
…task-worker-utilization
…mao1/kibana into poc/background-task-worker-utilization
…task-worker-utilization
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
@kobelb I made an |
Works for me! |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and changes LGTM! Saw the load
go up to 100
when overloaded and back down to ~6 when idle from alerting rules.
x-pack/plugins/task_manager/server/monitoring/background_task_utilization_statistics.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kibana-docker
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
Resolves #155762, #155761
Summary
This PR exposes a metric that represents the background task worker utilization load at the end of each polling cycle. This is calculated as
(# of workers already busy + claimed tasks) / max workers
, which comes out to the number of workers in use at the end of each claim cycle. This metric is then averaged over the previous 15 seconds (or 5 polling cycles). This window size is configurable usingxpack.task_manager.worker_utilization_running_average_window
This PR exposes this metric in the existing
/internal/task_manager/_background_task_utilization
API but also adds a public version of this API (/api/task_manager/_background_task_utilization
) that only exposes this metric. We need the public API for serverless but I thought we could keep the private route as well to expose experimental metrics without the overhead of supporting them long term.To Verify
/internal/task_manager/_background_task_utilization
and see the new metric exposed asstats.value.load
along with the existingadhoc
andrecurring
metrics/api/task_manager/_background_task_utilization
and see only the load metric returned from the public API