Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-based muting to routing tree #2393

Merged
merged 47 commits into from
Mar 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
c2c0c55
Add gotime library
bsattelb Sep 22, 2020
5df6616
Allow time intervals in global config
bsattelb Sep 22, 2020
cbfbf07
Allow routes to reference time intervals
bsattelb Sep 22, 2020
1d912fa
Add config timeinterval that allows intervals to be dumped back out o…
bsattelb Sep 27, 2020
ea5b925
Add buildah script for test container
bsattelb Sep 27, 2020
fe4b839
Move time intervals to own section, add config validation
bsattelb Oct 6, 2020
44d8cb4
Update gotime to v0.0.2
bsattelb Oct 6, 2020
d1f5e07
Add mute time stage and pipeline
bsattelb Oct 6, 2020
f53e7a9
Add tests for TimeMuteStage
benridley Oct 12, 2020
a3cb125
Move timeinterval library into locally maintained package
benridley Oct 13, 2020
93e0117
Change logging to debug when notifications aren't sent due to route mute
benridley Oct 13, 2020
c51e598
Remove container testing script
benridley Oct 13, 2020
11c24d4
Correct case from renaming
benridley Oct 13, 2020
3d97ee5
Update docs to include mute time sections
benridley Oct 13, 2020
58c1ef5
Clarify boundaries of ranges in docs
benridley Oct 13, 2020
0e9838c
Fix formatting
benridley Oct 13, 2020
e052540
Add license header
benridley Oct 13, 2020
3ece406
Remove unused test function
benridley Oct 13, 2020
44e9aa9
Change undefined route to use quoted string formatting
benridley Oct 14, 2020
ad385c2
Add check for undefined name in a mute time interval
benridley Oct 14, 2020
dd7a35e
Add tests for configuration of mute times
benridley Oct 14, 2020
4c881ef
Tidy up error message formatting
benridley Oct 14, 2020
adf94c3
Correct errors, test for specific error messages, improve formatting
benridley Nov 15, 2020
2c86b69
Prevent clamping dates that start after the end of the month
benridley Nov 17, 2020
1957e3c
Add some more complete test cases, test for broken clamping
benridley Nov 17, 2020
2c5c17a
Update timeinterval/timeinterval.go
benridley Nov 23, 2020
70b138a
Apply formatting suggestions from code review
benridley Nov 24, 2020
fb60329
Add fullstops to comments
benridley Nov 24, 2020
11b643d
Remove superfluous test case
benridley Nov 24, 2020
1cc736c
Return from errors earlier
benridley Nov 24, 2020
8e03268
Add additional test cases from code review
benridley Nov 24, 2020
fa2fab6
Simplify logging on time mute
benridley Nov 24, 2020
c34003f
Pre-allocate mute time config slice
benridley Nov 24, 2020
ae116cf
Fix comment formatting
benridley Nov 24, 2020
5152a2f
Ensure mute time intervals are unique in config, add associated test
benridley Nov 24, 2020
5d4231b
Use consistent naming for mute time intervals
benridley Nov 24, 2020
24804e6
Improve comment fomatting
benridley Nov 24, 2020
7ffd4ca
Change docs to reflect mute_time_intervals in routes instead of mute_…
benridley Nov 24, 2020
0fe51de
Clarify timezone support in mute time intervals
benridley Nov 24, 2020
d177471
Improve comment formatting
benridley Nov 24, 2020
5983d20
Fix formatting
benridley Nov 24, 2020
253e28a
Remove unnecessary json tag in MuteTimeInterval
benridley Dec 16, 2020
bcab6aa
Change wording in root route mute time interval error to align with o…
benridley Dec 16, 2020
d9d7511
Revert unwanted formatting change
benridley Dec 16, 2020
d0217a8
Add JSON marshalling and unmarshalling support for time intervals
benridley Jan 24, 2021
3a5f4e5
Fix formatting after merge conflict in config.go
benridley Jan 24, 2021
df54b4b
Improve documentation wording and formatting in response to maintaine…
benridley Feb 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions cmd/alertmanager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ import (
"github.com/prometheus/alertmanager/provider/mem"
"github.com/prometheus/alertmanager/silence"
"github.com/prometheus/alertmanager/template"
"github.com/prometheus/alertmanager/timeinterval"
"github.com/prometheus/alertmanager/types"
"github.com/prometheus/alertmanager/ui"
)
Expand Down Expand Up @@ -413,6 +414,12 @@ func run() int {
integrationsNum += len(integrations)
}

// Build the map of time interval names to mute time definitions.
muteTimes := make(map[string][]timeinterval.TimeInterval, len(conf.MuteTimeIntervals))
for _, ti := range conf.MuteTimeIntervals {
muteTimes[ti.Name] = ti.TimeIntervals
benridley marked this conversation as resolved.
Show resolved Hide resolved
}

inhibitor.Stop()
disp.Stop()

Expand All @@ -423,6 +430,7 @@ func run() int {
waitFunc,
inhibitor,
silencer,
muteTimes,
notificationLog,
peer,
)
Expand Down
72 changes: 62 additions & 10 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import (
"gopkg.in/yaml.v2"

"github.com/prometheus/alertmanager/pkg/labels"
"github.com/prometheus/alertmanager/timeinterval"
)

const secretToken = "<secret>"
Expand Down Expand Up @@ -219,13 +220,32 @@ func resolveFilepaths(baseDir string, cfg *Config) {
}
}

// MuteTimeInterval represents a named set of time intervals for which a route should be muted.
type MuteTimeInterval struct {
Name string `yaml:"name"`
TimeIntervals []timeinterval.TimeInterval `yaml:"time_intervals"`
}

// UnmarshalYAML implements the yaml.Unmarshaler interface for MuteTimeInterval.
func (mt *MuteTimeInterval) UnmarshalYAML(unmarshal func(interface{}) error) error {
type plain MuteTimeInterval
if err := unmarshal((*plain)(mt)); err != nil {
return err
}
if mt.Name == "" {
return fmt.Errorf("missing name in mute time interval")
}
return nil
}

// Config is the top-level configuration for Alertmanager's config files.
type Config struct {
Global *GlobalConfig `yaml:"global,omitempty" json:"global,omitempty"`
Route *Route `yaml:"route,omitempty" json:"route,omitempty"`
InhibitRules []*InhibitRule `yaml:"inhibit_rules,omitempty" json:"inhibit_rules,omitempty"`
Receivers []*Receiver `yaml:"receivers,omitempty" json:"receivers,omitempty"`
Templates []string `yaml:"templates" json:"templates"`
Global *GlobalConfig `yaml:"global,omitempty" json:"global,omitempty"`
benridley marked this conversation as resolved.
Show resolved Hide resolved
Route *Route `yaml:"route,omitempty" json:"route,omitempty"`
InhibitRules []*InhibitRule `yaml:"inhibit_rules,omitempty" json:"inhibit_rules,omitempty"`
Receivers []*Receiver `yaml:"receivers,omitempty" json:"receivers,omitempty"`
Templates []string `yaml:"templates" json:"templates"`
MuteTimeIntervals []MuteTimeInterval `yaml:"mute_time_intervals,omitempty" json:"mute_time_intervals,omitempty"`

// original is the input from which the config was parsed.
original string
Expand Down Expand Up @@ -411,9 +431,23 @@ func (c *Config) UnmarshalYAML(unmarshal func(interface{}) error) error {
if len(c.Route.Match) > 0 || len(c.Route.MatchRE) > 0 {
return fmt.Errorf("root route must not have any matchers")
}
if len(c.Route.MuteTimeIntervals) > 0 {
return fmt.Errorf("root route must not have any mute time intervals")
}

// Validate that all receivers used in the routing tree are defined.
return checkReceiver(c.Route, names)
if err := checkReceiver(c.Route, names); err != nil {
return err
}

tiNames := make(map[string]struct{})
for _, mt := range c.MuteTimeIntervals {
if _, ok := tiNames[mt.Name]; ok {
return fmt.Errorf("mute time interval %q is not unique", mt.Name)
}
tiNames[mt.Name] = struct{}{}
}
return checkTimeInterval(c.Route, tiNames)
}

// checkReceiver returns an error if a node in the routing tree
Expand All @@ -433,6 +467,23 @@ func checkReceiver(r *Route, receivers map[string]struct{}) error {
return nil
}

func checkTimeInterval(r *Route, timeIntervals map[string]struct{}) error {
for _, sr := range r.Routes {
if err := checkTimeInterval(sr, timeIntervals); err != nil {
return err
}
}
if len(r.MuteTimeIntervals) == 0 {
return nil
}
for _, mt := range r.MuteTimeIntervals {
if _, ok := timeIntervals[mt]; !ok {
return fmt.Errorf("undefined time interval %q used in route", mt)
}
}
return nil
}

// DefaultGlobalConfig returns GlobalConfig with default values.
func DefaultGlobalConfig() GlobalConfig {
return GlobalConfig{
Expand Down Expand Up @@ -582,10 +633,11 @@ type Route struct {
// Deprecated. Remove before v1.0 release.
Match map[string]string `yaml:"match,omitempty" json:"match,omitempty"`
// Deprecated. Remove before v1.0 release.
MatchRE MatchRegexps `yaml:"match_re,omitempty" json:"match_re,omitempty"`
Matchers Matchers `yaml:"matchers,omitempty" json:"matchers,omitempty"`
Continue bool `yaml:"continue" json:"continue,omitempty"`
Routes []*Route `yaml:"routes,omitempty" json:"routes,omitempty"`
MatchRE MatchRegexps `yaml:"match_re,omitempty" json:"match_re,omitempty"`
Matchers Matchers `yaml:"matchers,omitempty" json:"matchers,omitempty"`
MuteTimeIntervals []string `yaml:"mute_time_intervals,omitempty" json:"mute_time_intervals,omitempty"`
Continue bool `yaml:"continue" json:"continue,omitempty"`
Routes []*Route `yaml:"routes,omitempty" json:"routes,omitempty"`

GroupWait *model.Duration `yaml:"group_wait,omitempty" json:"group_wait,omitempty"`
GroupInterval *model.Duration `yaml:"group_interval,omitempty" json:"group_interval,omitempty"`
Expand Down
127 changes: 127 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,103 @@ receivers:

}

func TestMuteTimeExists(t *testing.T) {
in := `
route:
receiver: team-Y
routes:
- match:
severity: critical
mute_time_intervals:
- business_hours

receivers:
- name: 'team-Y'
`
_, err := Load(in)

expected := "undefined time interval \"business_hours\" used in route"

if err == nil {
t.Fatalf("no error returned, expected:\n%q", expected)
}
if err.Error() != expected {
t.Errorf("\nexpected:\n%q\ngot:\n%q", expected, err.Error())
}

}

func TestMuteTimeHasName(t *testing.T) {
in := `
mute_time_intervals:
- name:
time_intervals:
- times:
- start_time: '09:00'
end_time: '17:00'

receivers:
- name: 'team-X-mails'

route:
receiver: 'team-X-mails'
routes:
- match:
severity: critical
mute_time_intervals:
- business_hours
`
_, err := Load(in)

expected := "missing name in mute time interval"

if err == nil {
t.Fatalf("no error returned, expected:\n%q", expected)
}
if err.Error() != expected {
t.Errorf("\nexpected:\n%q\ngot:\n%q", expected, err.Error())
}

}

func TestMuteTimeNoDuplicates(t *testing.T) {
in := `
mute_time_intervals:
- name: duplicate
time_intervals:
- times:
- start_time: '09:00'
end_time: '17:00'
- name: duplicate
time_intervals:
- times:
- start_time: '10:00'
end_time: '14:00'

receivers:
- name: 'team-X-mails'

route:
receiver: 'team-X-mails'
routes:
- match:
severity: critical
mute_time_intervals:
- business_hours
`
_, err := Load(in)

expected := "mute time interval \"duplicate\" is not unique"

if err == nil {
t.Fatalf("no error returned, expected:\n%q", expected)
}
if err.Error() != expected {
t.Errorf("\nexpected:\n%q\ngot:\n%q", expected, err.Error())
}

}

func TestGroupByHasNoDuplicatedLabels(t *testing.T) {
in := `
route:
Expand Down Expand Up @@ -231,6 +328,36 @@ receivers:

}

func TestRootRouteNoMuteTimes(t *testing.T) {
in := `
mute_time_intervals:
- name: my_mute_time
time_intervals:
- times:
- start_time: '09:00'
end_time: '17:00'

receivers:
- name: 'team-X-mails'

route:
receiver: 'team-X-mails'
mute_time_intervals:
- my_mute_time
`
_, err := Load(in)

expected := "root route must not have any mute time intervals"

if err == nil {
t.Fatalf("no error returned, expected:\n%q", expected)
}
if err.Error() != expected {
t.Errorf("\nexpected:\n%q\ngot:\n%q", expected, err.Error())
}

}

func TestRootRouteHasNoMatcher(t *testing.T) {
in := `
route:
Expand Down
1 change: 1 addition & 0 deletions dispatch/dispatch.go
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,7 @@ func (ag *aggrGroup) run(nf notifyFunc) {
ctx = notify.WithGroupLabels(ctx, ag.labels)
ctx = notify.WithReceiverName(ctx, ag.opts.Receiver)
ctx = notify.WithRepeatInterval(ctx, ag.opts.RepeatInterval)
ctx = notify.WithMuteTimeIntervals(ctx, ag.opts.MuteTimeIntervals)

// Wait the configured interval before calling flush again.
ag.mtx.Lock()
Expand Down
17 changes: 12 additions & 5 deletions dispatch/route.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ import (
// DefaultRouteOpts are the defaulting routing options which apply
// to the root route of a routing tree.
var DefaultRouteOpts = RouteOpts{
GroupWait: 30 * time.Second,
GroupInterval: 5 * time.Minute,
RepeatInterval: 4 * time.Hour,
GroupBy: map[model.LabelName]struct{}{},
GroupByAll: false,
GroupWait: 30 * time.Second,
GroupInterval: 5 * time.Minute,
RepeatInterval: 4 * time.Hour,
GroupBy: map[model.LabelName]struct{}{},
GroupByAll: false,
MuteTimeIntervals: []string{},
}

// A Route is a node that contains definitions of how to handle alerts.
Expand Down Expand Up @@ -65,6 +66,7 @@ func NewRoute(cr *config.Route, parent *Route) *Route {
if cr.Receiver != "" {
opts.Receiver = cr.Receiver
}

if cr.GroupBy != nil {
opts.GroupBy = map[model.LabelName]struct{}{}
for _, ln := range cr.GroupBy {
Expand Down Expand Up @@ -115,6 +117,8 @@ func NewRoute(cr *config.Route, parent *Route) *Route {

sort.Sort(matchers)

opts.MuteTimeIntervals = cr.MuteTimeIntervals

route := &Route{
parent: parent,
RouteOpts: opts,
Expand Down Expand Up @@ -203,6 +207,9 @@ type RouteOpts struct {
GroupWait time.Duration
GroupInterval time.Duration
RepeatInterval time.Duration

// A list of time intervals for which the route is muted.
MuteTimeIntervals []string
}

func (ro *RouteOpts) String() string {
Expand Down
Loading