Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacking of work queue metrics in grafana dashboard is confusing #4000

Open
david-martin opened this issue Jul 3, 2024 · 5 comments
Open
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@david-martin
Copy link

david-martin commented Jul 3, 2024

What broke? What's expected?

In the Controller Runtime Metrics dashboard, the 'Seconds for items stay in queue (before being requested)' panel shows the different percentiles for all resources being reconciled.
However, the graph is stacking all the values together (see screenshot), which can be very confusing at first glance.
Particularly if your controller has multiple resources it is reconciling.
image
In this example, I was wondering why some items were longer than 40 seconds in the queue before being requested, but that wasn't actually the case.
I think it would be clearer to have no stacking in this panel. I don't see what the use case for having all the values stacked would be.

If there's agreement on having no stacking, I can make the change.
I believe it's this json here that would need to change.
https://github.com/kubernetes-sigs/kubebuilder/blob/master/pkg/plugins/optional/grafana/v1alpha/scaffolds/internal/templates/runtime.go#L443-L445

Reproducing this issue

Load the Controller Runtime Metrics into grafana and visualise metrics for a controller that's reconciling some resource.
See that the values are stacked in the panel mentioned.

KubeBuilder (CLI) Version

3.14.2

PROJECT version

3

Plugin versions

layout:
- go.kubebuilder.io/v3
plugins:
  grafana.kubebuilder.io/v1-alpha: {}
  manifests.sdk.operatorframework.io/v2: {}
  scorecard.sdk.operatorframework.io/v2: {}

Other versions

go version
go version go1.21.7 darwin/amd64

sigs.k8s.io/controller-runtime v0.16.3¬

kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-02-22T13:39:03Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.2", GitCommit:"4b8e819355d791d96b7e9d9efe4cbafae2311c88", GitTreeState:"clean", BuildDate:"2024-02-14T22:25:42Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/arm64"}
WARNING: version difference between client (1.26) and server (1.29) exceeds the supported minor version skew of +/-1

Extra Labels

No response

@david-martin david-martin added the kind/bug Categorizes issue or PR as related to a bug. label Jul 3, 2024
@camilamacedo86
Copy link
Member

Hi @Kavinjsir,

I think you are the best one to help us and give a look on this one. WDYT?

@camilamacedo86
Copy link
Member

Hi @david-martin

Can you add a screen shot over how it would be with the change proposed? Also, feel free to open a PR with.

@Kavinjsir
Copy link
Contributor

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2024
@Kavinjsir
Copy link
Contributor

/assign

@camilamacedo86 camilamacedo86 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants