Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to reflect controller health #53

Closed
Tracked by #43
ahg-g opened this issue Apr 22, 2023 · 6 comments · Fixed by #99 or #103
Closed
Tracked by #43

Add metrics to reflect controller health #53

ahg-g opened this issue Apr 22, 2023 · 6 comments · Fixed by #99 or #103
Assignees

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Apr 22, 2023

Possible metrics:

  • Reconciliation latency
  • Reconciliation failures
  • JobSets waiting to be processed (we need to look at controller-runtime if it exposes something)
@charles-chenzz
Copy link
Member

does this issue urgent? might need some time to check how to implement metrics

@ahg-g
Copy link
Contributor Author

ahg-g commented Apr 25, 2023

#60 is more urgent, can you remove the CEL validation and just validate inside the webhook?

@charles-chenzz
Copy link
Member

I need to check first, not sure if I know how to do it :(

@danielvegamyhre
Copy link
Contributor

/assign

@danielvegamyhre
Copy link
Contributor

Seems like Reconciliation Latency and Reconciliation Failures are already captured via controller-runtime prometheus metrics here: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/internal/controller/metrics/metrics.go#L37-L44

@ahg-g
Copy link
Contributor Author

ahg-g commented May 2, 2023

/re-open

We need to add docs explaining the couple of metrics you mentioned above, similar to https://kueue.sigs.k8s.io/docs/reference/metrics/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants