Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for flux custom metrics #4128

Closed
10 tasks done
darkowlzz opened this issue Aug 2, 2023 · 0 comments
Closed
10 tasks done

Support for flux custom metrics #4128

darkowlzz opened this issue Aug 2, 2023 · 0 comments
Labels
area/monitoring Monitoring related issues and pull requests umbrella-issue Umbrella issue for tracking progress of a larger effort

Comments

@darkowlzz
Copy link
Contributor

darkowlzz commented Aug 2, 2023

We have received a lot of feedback related to flux metrics from many users over time. In a lot of the feedback users showed the need to gather more information about particular flux resources (refer #3106, #3769, #2674, #1796), for example the Git branch or tag that their resources use, or the helm chart version that's currently deployed, so that they could set alerts when these configurations change. Some users expressed the need to be able to add custom fields in the metrics (refer #2632) so that they can use it to categorize the resources by their teams or cluster. A few also wanted to expose the reasons for the objects to be in a non-ready state for an extended period of time so that they could be alerted about them. Also, some wanted high quality metrics about flux objects that are more useful for them (refer #1479).
Most of these requests couldn't be implemented within flux as they would increase the cardinality of the flux metrics and also introduce fragmentation in the common metrics shared among the different flux resources kinds. The flux metrics also exported a lot of the metrics from controller-runtime which can help monitor the state of the controllers themselves.

Most of the data needed for the above metrics requirements are present on the flux custom resources in the spec and status fields. They don't need the flux controllers to be exported. Flux controllers export metrics about the operation of the reconcilers that manage the flux resources. These metrics are used in the flux control-plane dashboard to show the state of the flux components. The custom metrics that most users ask for can be exported by an external system that can query the data from kube-apiserver, independent of the flux controllers. Kube-state-metrics (KSM) is one such tool that can be used for this. The custom-resource state metrics docs shows some examples of how to expose such metrics.

In flux v2.1, in order to allow the users to set up and configure their own metrics based on their needs, the monitoring setup and configuration provided by flux will no longer be recommended, and new guides and examples will be provided to show how kube-state-metrics can be used to achieve the same. A new version of the flux cluster dashboard will be provided as an example dashboard which will use the metrics from kube-state-metrics to create the same panels.

In addition, the flux readiness condition metrics (gotk_reconcile_condition) exported by the controllers will be modified to only export metrics for live resources. Previously, this metric would show a Deleted status for Ready when objects get deleted. Flux v2.1 will only report metric for objects that exist. The deleted object metrics will no longer be exported. This is done to reduce the stale metrics of the deleted objects and gradually stop exporting such metrics as a whole in a future release. This may affect the old cluster dashboard. Use of kube-state-metrics for similar metrics and dashboard is encouraged. The new custom metrics guide will contain examples for it.
A similar change for the HelmRepository cache event metrics (gotk_cache_events_total) will be implemented to delete the stale metrics for the deleted HelmRepositories.

To-do:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring related issues and pull requests umbrella-issue Umbrella issue for tracking progress of a larger effort
Projects
None yet
Development

No branches or pull requests

2 participants