-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add alerts #18
Comments
Hi, at present, this repository only provides the basic flux monitoring setup using kube-prometheus-stack for the Flux monitoring docs https://fluxcd.io/flux/monitoring. The alerts page in the docs refer to alerting using Flux notificaton-controller. Since you have mentioned prometheus alerts, I'm assuming you would like to set up alerts on prometheus metrics. Alertmanager is disabled in the example configuration. I believe alerting could be subjective depending on the user and their environment. Some may like to use prometheus alertmanager, others may prefer grafana for the same. I think we have an assumption here that the users of these monitoring systems would know how to configure these systems themselves and we only provide the minimal example to get started. This repository only serves as an example and shouldn't be consumed directly as we don't offer compatibility guarantee. I think we would prefer to avoid breaking alerts silently for the users with an update to this repository. It is recommended to use this repository only as a reference and build your own monitoring configuration for your environment. |
Okay, sure. For the opinionated point, I think it is the default monitoring pipeline for kube-prometheus-stack users to use alertmanager (with different backends there), and since the KSM config is already present in this repo, I think it is just a matter of enabling alertmanager and adding PrometheusAlerts. Feel free to close this issue if it doesn't fit the scope of this minimal example repo. |
@TheKangaroo - I'd love to see your PR with They have sample rules for ArgoCD: https://samber.github.io/awesome-prometheus-alerts/rules.html#argocd, so Flux2 CD rules will fit there just fine. |
@antonblr I don't know if it's possible to add these alerts to awesome-prometheus-alerts as they rely on the custom kube-state-metrics config in this repo. |
@TheKangaroo - I see. Yeah, looks like all samples there are built around already exposed metrics. But let's wait for what they say. |
I have just seen your post, sorry for slow response! There actually used to be an alertmanager example in the Flux docs, but it was lost in a refactor some time ago. It was a bit problematic because the example did not come with full detail instruction about how to configure the AlertManager - it was just an alert assuming you have already done that. We discussed this one week at Bug Scrub and understood that if I am a new Kubernetes and Flux user following our Prometheus guide, I most certainly have not already configured the AlertManager for myself 😆 the Alert addition to the guide is incomplete without that addendum. I have one of my clusters still configured to use AlertManager, with some custom alerts and other configuration based on the earlier Flux monitoring example here: https://github.com/kingdonb/flux2/tree/monitoring It is very far behind and cannot easily be rebased now because of the refactor into a separate repo. But I will try to cobble something together out of this experience and make a minimum viable guide for Flux setup AlertManager on a new cluster. In the meanwhile, the examples I can already contribute are mixed in here with a deprecation notice: https://github.com/kingdonb/flux2/tree/monitoring/manifests/monitoring https://github.com/kingdonb/flux2/blob/ddf3c495133a2e49e20c97588887f01bb2f6b104/manifests/monitoring/kube-prometheus-stack/release.yaml#L460-L468 - name: GitOpsToolkit
rules:
- alert: ReconciliationFailure
expr: max(gotk_reconcile_condition{status="False",type="Ready"}) by (exported_namespace, name, kind) + on(exported_namespace, name, kind) (max(gotk_reconcile_condition{status="Deleted"}) by (exported_namespace, name, kind)) * 2 == 1
for: 15m
labels:
severity: page
annotations:
summary: '{{ $labels.kind }} {{ $labels.exported_namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.' which you can find historically in the flux2 docs, if you dig past the genesis of the flux2-monitoring-example repo in the website history, where that doc once lived. Edit: I will have to update that one, as it still uses the Deprecated Resource Metric |
Hey we wrote some alerts for fluxcd, which we actively using in Cozystack project |
@kvaps thank you! |
@kvaps Thank you for those examples! I added them to my existing PS. I find your ideas intriguing and I wish to subscribe to your newsletter. |
I had a hard time finding some reference prometheus alerts (PrometheusRules) to set up an actual alerting in addition to my flux monitoring and dashboards.
So I decided to build some alerts for our setup myself.
If this is something you are interested in adding to this repo, I'll be happy to send you a PR with some basic PrometheusRules.
The text was updated successfully, but these errors were encountered: