-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CONVENTIONS: Update CPU query sum_irate #988
CONVENTIONS: Update CPU query sum_irate #988
Conversation
Catching up with [1], which landed in OpenShift 4.9 and later via [2]. [1]: kubernetes-monitoring/kubernetes-mixin#619 [2]: https://github.com/openshift/cluster-monitoring-operator/pull/1214/files#diff-3125af8c4a74a5a372c15a821e3c53b7f5710c3ebd5af1fb05f4d7294e2f1afdL529
# CPU usage of each container in the openshift-monitoring namespace | ||
max by (pod, container) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace="openshift-monitoring"}) | ||
max by (pod, container) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="openshift-monitoring"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More broadly, trying to mix openshift-monitoring
and openshift-sdn
results doesn't make sense to me. Perhaps this was intended to be commented out as an example of changing namespaces and dropping over-time aggregation? I'd expect something like:
sort_desc(
# Calculate the 90th percentile of CPU usage over the past hour and add 10% to that
1.1 * (max by (pod, container) (
quantile_over_time(0.9, node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=~"openshift-.*", container != "POD", container!=""}[60m]))
) /
# Calculate the maximum requested CPU per pod and container
max by (pod, container) (kube_pod_container_resource_requests{namespace=~"openshift-.*", resource="cpu", container!="", container!="POD"})
)
Or, if folks don't want to weight for bursts, dropping to avg_over_time
:
sort_desc(
# Calculate the average CPU usage over the past hour
(avg by (pod, container) (
avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=~"openshift-.*", container != "POD", container!=""}[60m]))
) /
# Calculate the maximum requested CPU per pod and container
max by (pod, container) (kube_pod_container_resource_requests{namespace=~"openshift-.*", resource="cpu", container!="", container!="POD"})
)
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
@wking: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Catching up with kubernetes-monitoring/kubernetes-mixin#619, which landed in OpenShift 4.9 and later here.