-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kube-prometheus-stack] Unexpected alerts firing - Why? #2720
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. |
@jgagnon44 did you manage to find a solution for this issue? |
Hi Team, |
We are facing the same issue here, I will investigate when I have some free time |
@jkroepke I saw you working around at #4460. Could you maybe please reopen this issue. I think this issue is still valid and should be easily solvable by adding the pod label to the I closed my previous PR because I recognized that the alerts are getting generated... |
… alerts on downtime of one etcd member (prometheus-community#2720) Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com>
… alerts on downtime of one etcd member (prometheus-community#2720) Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com>
) * added "pod" prometheus label to etcd alerts to prevent false positive alerts on downtime of one etcd member (#2720) Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com> * update chart.yaml Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com> * added reference Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com> --------- Signed-off-by: Julian Schreiner <20794518+jusch23@users.noreply.github.com>
I have a 4 node K8s cluster set up via kubeadm on a local VM cluster. I am using the following:
When I go into either Prometheus or Alertmanager, there are many alerts that are always firing. Another thing to note is that Alertmanager "cluster status" is reporting as "disabled". Not sure what bearing (if any) that may have on this. I have not added any new alerts of my own - everything was presumably deployed with the Helm chart.
I do not understand what these alerts are triggering for other than what I can glean from their names. It does not seem a good thing that these alerts should be firing. Either there is something seriously wrong with the cluster or something is poorly configured in the alerting configuration of the Helm chart. I'm leaning toward the second case, but will admit, I really don't know.
Here is a listing of the firing alerts, along with label info:
Here is my values.yaml:
Is there something wrong with this configuration? Are there any Kubernetes objects that might be missing or misconfigured? It seems very odd that one could install this Helm chart and experience this many "failures". Is there perhaps, a major problem with my cluster? I would think that if there was really something wrong with etcd, the kube-scheduler or kube-proxy that I would experience problems everywhere, but I am not.
If there is any other information I can pull from the cluster or related artifacts that might help, let me know and I will include them.
Here are some examples of the alerts:
Here's another interesting piece of the picture. I opened Prometheus and went to the targets tab. Below is an example of what I found. All of the unhealthy targets have this type of problem.
Seems like a security issue, probably certificate information is missing. If that is true, how do I fix this?
The text was updated successfully, but these errors were encountered: