Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore!: adopt log/slog, drop go-kit/log #4089

Merged
merged 2 commits into from
Nov 6, 2024

Conversation

tjhop
Copy link
Contributor

@tjhop tjhop commented Oct 31, 2024

The bulk of this change set was automated by the following script which is being used to aid in converting the various exporters/projects to use slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

This commit includes several changes:

  • bump exporter-tookit to v0.13.1 for log/slog support
  • updates golangci-lint deprecated configs
  • enables sloglint linter
  • removes old go-kit/log linter configs
  • introduce some if logger == nil { $newLogger } additions to prevent nil references
  • converts cluster membership config to use a stdlib compatible slog adapter, rather than creating a custom io.Writer for use as the membership logOutput config

The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
  nil references
- converts cluster membership config to use a stdlib compatible slog
  adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop
Copy link
Contributor Author

tjhop commented Oct 31, 2024

cc: @SuperQ

@tjhop
Copy link
Contributor Author

tjhop commented Oct 31, 2024

Not sure why the lint stage took so long, but it failed due to a timeout:

  level=error msg="Timeout exceeded: try increasing it by passing --timeout option"
  
  Error: golangci-lint exit with code 4
  Ran golangci-lint in 60433ms

I don't run into any lint issue locally -- don't know how much caching we do for golangci-lint, but maybe a rerun will work 🤞

@tjhop
Copy link
Contributor Author

tjhop commented Oct 31, 2024

Similarly, I don't run into this test issue locally, either 🤔

        time=2024-10-31T03:41:02.745Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert=test1[5ae90ff][active]
        time=2024-10-31T03:41:03.746Z level=DEBUG source=dispatch.go:531 msg=flushing component=dispatcher aggrGroup="{}:{alertname=\"test1\"}" alerts=[test1[5ae90ff][active]]
        time=2024-10-31T03:41:03.747Z level=DEBUG source=notify.go:877 msg="Notify success" component=dispatcher receiver=default integration=webhook[0] aggrGroup="{}:{alertname=\"test1\"}" attempts=1 duration=897.598µs alerts=[test1[5ae90ff][active]]
    cli_test.go:80: 
        collector "webhook":
        
        interval [1,2]
        ---
        - &{map[] 0001-01-01T00:00:00.000Z <nil> [] 0001-01-01T00:00:01.000Z <nil> <nil> { map[alertname:test1]}}[-9.223372036854776e+09:]
          [ ✓ ]
        
        Expected total of 1 alerts, got 2
        received:
        @ 1.019406235
        - &{map[] 0001-01-01T00:00:00.000Z <nil> [] 2024-10-31T03:41:03.728Z <nil> <nil> { map[alertname:test1]}}[0.99986554:]
        @ 2.087403109
        - &{map[] 2024-10-31T03:41:04.728Z <nil> [] 2024-10-31T03:41:03.728Z <nil> <nil> { map[alertname:test1]}}[0.99986554:1.99986554]
        
FAIL
FAIL    github.com/prometheus/alertmanager/test/cli/acceptance  2.814s

https://app.circleci.com/pipelines/github/prometheus/alertmanager/5716/workflows/65d324ed-2660-4a25-b232-ae8d100d7797/jobs/24650/parallel-runs/0/steps/0-105

@SuperQ
Copy link
Member

SuperQ commented Oct 31, 2024

Looks like flaky tests. Re-ran and everything is green now.

@SuperQ SuperQ requested review from w0rm and gotjosh October 31, 2024 09:22
cluster/cluster.go Show resolved Hide resolved
cluster/tls_transport_test.go Outdated Show resolved Hide resolved
dispatch/dispatch.go Outdated Show resolved Hide resolved
notify/notify.go Outdated Show resolved Hide resolved
@@ -75,7 +74,8 @@ func (n *Notifier) Notify(ctx context.Context, as ...*types.Alert) (bool, error)
}
data := notify.GetTemplateData(ctx, n.tmpl, as, n.logger)

level.Debug(n.logger).Log("incident", key)
// @tjhop: should this use `group` for the keyval like most other notify implementations?
n.logger.Debug("extracted group key", "incident", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have mixed use of incident and group_key in a couple of places. It would be nice to make these consistent in a follow up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I went with group_key in most places for sake of something to key the log message on since most of the previous calls were bare; the pushover implementation already used the incident key, so I left if for consistency in that regard.

If there's a preference what to align on, we can definitely circle back

level.Error(n.logger).Log("err", err)
// @tjhop: should we `return false, err` here as we do in most
// other Notify() implementations?
n.logger.Error("error extracting group key", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so, this looks like an mistake!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack 👍

We've been trying to keep these logging conversions relatively straightforward -- I'd vote to fix this and the other s/Info/Error/ logging calls in a separate PR

@@ -446,7 +446,8 @@ Loop:
break Loop
case <-t.C:
if err := runMaintenance(doMaintenance); err != nil {
level.Info(s.logger).Log("msg", "Running maintenance failed", "err", err)
// @tjhop: this should probably log at error level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -456,7 +457,8 @@ Loop:
return
}
if err := runMaintenance(doMaintenance); err != nil {
level.Info(s.logger).Log("msg", "Creating shutdown snapshot failed", "err", err)
// @tjhop: this should probably log at error level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gotjosh gotjosh merged commit f6b942c into prometheus:main Nov 6, 2024
11 checks passed
tjhop added a commit to tjhop/alertmanager that referenced this pull request Nov 8, 2024
Addresses feedback from another PR
prometheus#4089 (comment)

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop tjhop mentioned this pull request Nov 8, 2024
SuperQ added a commit that referenced this pull request Dec 19, 2024
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166

Signed-off-by: SuperQ <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants