-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notifications for both firing and resolved items doubts #591
Comments
With send_resolved disabled, there shouldn't have been mention of the resolved alert in any emails.
It's within 5m, so group_interval still applies. Is your alertmanager or prometheus under heavy load? |
No heavy load ... we're using the email and slack notifier and from the docs / code the send_resolved should be false. With the script below, and a group_interval=2m I'm getting the following mails/slack msgs:
I thought the
|
Can you share your full alertmanager config for that? |
Sure...
|
@brian-brazil : if you need me to do some further testing just let me know... Here's some debug logging output. From this output I got 4 mails where I expected only 2 mails. The last 2 mails are redundant, as they both say that alert b3 is active (I knew that from the second mail).
Our operators are only interested in the first 2 emails. (as they know now that they need to investigate a1,a2,a3, b1,b2,b3). The third / fourth email is noise to them. According to you with |
Is this only happening with email, and are you sure the config is what the alertmanager has loaded? |
Using Slack, Email and webhooks. For every email there is also a slack and a webhook call. So don't think that it's related to the receivers. Can you reproduce it using the curl commands above ? |
@brian-brazil : Could this be the problem ? https://github.com/prometheus/alertmanager/blob/master/notify/impl.go#L82
First time I'm looking at GO code, but shouldn't it be passing along the filtered alerts (res) instead of the original alerts (alerts) to the notify function ? Will do some testing with it tomorrow now that I know how to compile and run GO code :) |
Great, that looks to be it. Didn't notice that when I was perusing the code. |
Perfect. I've never used Go and don't know how high prio this is but what
are the chances somebody from the core team doing a PR ? I can do the
codefix but would lose a lot of time figuring out how to write the tests in
go and such...
But more than happy to test / validate and give feedback.
…On Wed, 4 Jan 2017 at 01:17, Brian Brazil ***@***.***> wrote:
Great, that looks to be it. Didn't notice that when I was perusing the
code.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#591 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAWPFo1Er-cQFqqWrqwy80HLMWjm6c0Lks5rOuUKgaJpZM4LZYH2>
.
|
Thanks for the detailed report and discussion, I'll prepare the fix and some tests around this shortly! |
Hi,
When an alert is active, the alertmanager will send out notifications periodically (as per the
group_wait
orgroup_interval
value) whenever new items are firing for that group but also when items got resolved in that group.AFAIK the fact that it also sends separate notifications when items get resolved in the group for an active alert is not related to the global send_resolved flag, but I was wondering if there was a way to disable these notifications. (We're currently receiving lots of emails because of it.)
I've written down the following scenario and listed some questions where I don't understand the behaviour. I've tried to make it as clear / simple as possible, only using 1 target / 1 metric (2 label values) / 1 alert definition and the following configuration
Every block below indicates a change in the alerting (new alert fired / alert getting resolved). I've included the timestamps of the actions that have happened and listed down the areas where I'm still missing something. I started with a clean prometheus / alertmanager with no previous alerts.
I've written down my assumptions and my questions.
metric1{v=a} == 0 FOR 1min
Assumption : Notification is sent after the interval following 06:25:38 + the
group_wait: 30s
)metric1{v=b} == 0 FOR 1min
Assumption : Alertmanager now waits
group_interval: 5m
after the previous email was sent. Does not usegroup_wait
metric1{v=a} == 1
Assumption: it again waits
group_interval: 5m
after the previous email was sentQ1 : Is there a way to resolve this email ? I already now metric1_down{v=b) is firing. Don't want a notification that metric1_down{v=a) is resolved
metric1{v=b} == 1
Assumption : everything was resolved, no alerts active anymore and send_resolved=false so no email is sent ?
metric1{v=a} == 0 FOR 1min
Q2 : A bit surprised to see "resolved(metric1_down{v=b}) in this notification. at 06:41 everything as resolved and the system was ok. Also unclear why it got sent at 06:46:25. Does it take into account group_interval or group_wait here ?
metric1{v=a} == 1
Assumption : Alertmanager considers the system healthy now ? ... all issues have been resolved.
metric1{v=b} == 0 FOR 1min
Q3: 07:01:30 - Was not expecting it to say resolved: metric1{v=b}) in the 07:01:30 msg. Same as Q2
Q4 : 07:11:30 - Not sure why this email is sent. Adds little value. I already know metric1_down{v=b} is firing. Why is this sent 10 minutes after the previous one ?
If you've made it this far thanks a lot ! :) Would appreciate some feedback to get a better understanding of it.
The text was updated successfully, but these errors were encountered: