[gossip] Don't merge expired gossip messages#1631
Merged
stuartnelson3 merged 3 commits intomasterfrom Nov 21, 2018
Merged
Conversation
mxinden
reviewed
Nov 20, 2018
Member
mxinden
left a comment
There was a problem hiding this comment.
Cool, that you caught the bug. Would you mind adding a unit test?
d1f7ce2 to
066e9ac
Compare
If they're expired, they should be cleaned up on the next GC cycle, but merging them in means that they'll probably be gossip'd continually between the cluster members. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
066e9ac to
5f811ad
Compare
Contributor
Author
|
Oh dear, my force-pushings are being reported! |
Contributor
Author
I expect so, we've just never noticed because they don't appear in the UI. |
The code for nflog was also constantly re-adding nflogs to the internal memory store, the same as the silence code was. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
Contributor
Author
|
The code between these two is so similar, the difference is just a generic away .. |
With the default 0 retention, the alerts would not be merged. Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
grobie
approved these changes
Nov 21, 2018
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If they're expired, they should be cleaned up on
the next GC cycle, but merging them in means that
they'll probably be gossip'd continually between
the cluster members.
fixes #1624
The issue was in the
mergefunction: https://github.com/prometheus/alertmanager/blob/master/silence/silence.go#L752-L756Expired silences would get deleted every 15 minutes (the hard-coded maintenance interval), but on each full-state sync an alertmanager would get a bunch of expired silences. Since they didn't exist in its state cache, they would be merged back in. Since the merge happens every minute, they were seemingly guaranteed to stick around. If all the alertmanagers were started at the same time they might GC all the silences before a full state sync, but if they're offset in their maintenance timing, they would keep sharing the same expired silences with each other. This explains why I saw the silences "GC'd", because I had looked within this magical window of "maintenance has run, but a full state sync hasn't yet", but then they came back.