Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager: Change timestamp label to .StartsAt #795

Merged
merged 1 commit into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions docs/spec/v1beta3/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -878,22 +878,37 @@ an [Event](events.md#event-structure) to the provided Prometheus Alertmanager
[Address](#address).

The Event will be formatted into a `firing` [Prometheus Alertmanager
alert](https://prometheus.io/docs/alerting/latest/notifications/#alert),
with the metadata added to the `labels` fields, and the `message` (and optional
`.metadata.summary`) added as annotations.
alert](https://prometheus.io/docs/alerting/latest/notifications/#alert), with
the metadata added to the `labels` fields, and the `message` (and optional
`.metadata.summary`) added as annotations. Event timestamp will be used to set
alert start time (`.StartsAt`).

In addition to the metadata from the Event, the following labels will be added:

| Label | Description |
|-----------|------------------------------------------------------------------------------------------------------|
| alertname | The string Flux followed by the Kind and the reason for the event e.g `FluxKustomizationProgressing` |
| severity | The severity of the event (`error` or `info`) |
| timestamp | The timestamp of the event |
| reason | The machine readable reason for the objects transition into the current status |
| kind | The kind of the involved object associated with the event |
| name | The name of the involved object associated with the event |
| namespace | The namespace of the involved object associated with the event |

Note that due to the way other Flux controllers currently emit events, there's
no way for notification-controller to figure out the time the event ends to set
`.EndsAt` (a reasonable estimate being double the reconciliation interval of the
resource involved) that doesn't involve a Kubernetes API roundtrip. A
possible workaround could be setting
[`global.resolve_timeout`][am_config_global] to an interval large enough for
events to reoccur:

[am_config_global]: https://prometheus.io/docs/alerting/latest/configuration/#file-layout-and-global-settings

```yaml
global:
resolve_timeout: 1h
```

This Provider type does support the configuration of a [proxy URL](#https-proxy)
and [TLS certificates](#tls-certificates).

Expand Down
46 changes: 45 additions & 1 deletion internal/notifier/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ package notifier
import (
"context"
"crypto/x509"
"encoding/json"
"fmt"
"net/url"
"time"

"golang.org/x/text/cases"
"golang.org/x/text/language"
Expand All @@ -38,6 +40,36 @@ type AlertManagerAlert struct {
Status string `json:"status"`
Labels map[string]string `json:"labels"`
Annotations map[string]string `json:"annotations"`

StartsAt AlertManagerTime `json:"startsAt"`
EndsAt AlertManagerTime `json:"endsAt,omitempty"`
}

// AlertManagerTime takes care of representing time.Time as RFC3339.
// See https://prometheus.io/docs/alerting/0.27/clients/
type AlertManagerTime time.Time

func (a AlertManagerTime) String() string {
return time.Time(a).Format(time.RFC3339)
}

func (a AlertManagerTime) MarshalJSON() ([]byte, error) {
return json.Marshal(a.String())
}

func (a *AlertManagerTime) UnmarshalJSON(jsonRepr []byte) error {
var serializedTime string
if err := json.Unmarshal(jsonRepr, &serializedTime); err != nil {
return err
}

t, err := time.Parse(time.RFC3339, serializedTime)
if err != nil {
return err
}

*a = AlertManagerTime(t)
return nil
}

func NewAlertmanager(hookURL string, proxyURL string, certPool *x509.CertPool) (*Alertmanager, error) {
Expand Down Expand Up @@ -75,18 +107,30 @@ func (s *Alertmanager) Post(ctx context.Context, event eventv1.Event) error {
labels["alertname"] = "Flux" + event.InvolvedObject.Kind + cases.Title(language.Und).String(event.Reason)
labels["severity"] = event.Severity
labels["reason"] = event.Reason
labels["timestamp"] = event.Timestamp.String()

labels["kind"] = event.InvolvedObject.Kind
labels["name"] = event.InvolvedObject.Name
labels["namespace"] = event.InvolvedObject.Namespace
labels["reportingcontroller"] = event.ReportingController

// The best reasonable `endsAt` value would be multiplying
// InvolvedObject's reconciliation interval by 2 then adding that to
// `startsAt` (the next successful reconciliation would make sure
// the alert is cleared after the timeout). Due to
// event.InvolvedObject only containing the object reference (namely
// the GVKNN) best we can do is leave it unset up to Alertmanager's
// default `resolve_timeout`.
//
// https://prometheus.io/docs/alerting/0.27/configuration/#file-layout-and-global-settings
startsAt := AlertManagerTime(event.Timestamp.Time)

payload := []AlertManagerAlert{
{
Labels: labels,
Annotations: annotations,
Status: "firing",

StartsAt: startsAt,
},
}

Expand Down
Loading