Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore!: adopt log/slog, remove go-kit/log #14906

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

tjhop
Copy link
Contributor

@tjhop tjhop commented Sep 15, 2024

For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:

  • removed unused logging util func RateLimit()
  • forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
  • move some of the json file logging functionality to use prom/common package functionality
  • refactored some of the new json file logging for scraping
  • changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
  • updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
  • added a healthy amount of if logger == nil { $makeLogger } type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like With() to add keyvals on the new *slog.Logger type

@tjhop
Copy link
Contributor Author

tjhop commented Sep 15, 2024

@SuperQ if you don't mind taking a look when you have time as well, I'd appreciate it!

And sorry all for the massive change set 😅

@tjhop
Copy link
Contributor Author

tjhop commented Sep 15, 2024

Opening as draft for now so it's known that there's work in progress -- still need to fix up tests around the custom query log testing that's done in cmd/prometheus/query_log_test.go, but I believe that should cover most things.

It builds and has been running well for me in my testing so far; feedback welcome!

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sterling stuff! Thanks for doing this; some initial comments below.
(I only read about 25% of the PR, so far)

cmd/prometheus/main.go Outdated Show resolved Hide resolved
cmd/prometheus/main.go Outdated Show resolved Hide resolved
cmd/prometheus/main.go Outdated Show resolved Hide resolved
Comment on lines -624 to -637
// Above level 6, the k8s client would log bearer tokens in clear-text.
klog.ClampLevel(6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem to have lost this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Above level 6, the k8s client would log bearer tokens in clear-text.
klog.ClampLevel(6)
klog.SetLogger(log.With(logger, "component", "k8s_client_runtime"))
klogv2.ClampLevel(6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as the other one -- though I do need to double check and see if/where we still need klog v1 in prometheus to see about creating SetSlogLogger() type redirect func like there is in klogv2 upstream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we still have v1 things in discovery, I took inspiration from klog's upstream v1/v2 coexistence docs and implemented a small wrapper to redirect klogv1 calls to klogv2, and since klogv2 is already redirected to log/slog, that should mean both flavors of klog end up in slog in the end

cmd/prometheus/main.go Outdated Show resolved Hide resolved
log.With(logger, "component", "scrape manager"),
func(s string) (log.Logger, error) { return logging.NewJSONFileLogger(s) },
logger.With("component", "scrape manager"),
func(s string) (*logging.JSONFileLogger, error) { return logging.NewJSONFileLogger(s) },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return a more abstract interface type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be convinced of the usefulness there.

The previous "raison d'être" for the json file logger, as far as I can tell, was to satisfy the promql.QueryLogger interface:
https://github.com/prometheus/prometheus/pull/14906/files#diff-ad2761bb162a2f12326b18f2714ef54f4cd476cfe3388f77b2b20c5ff82e4faaL127-L130

Does that feel more appropriate for our usage here?

cmd/prometheus/main.go Outdated Show resolved Hide resolved
tjhop added a commit to tjhop/common that referenced this pull request Sep 27, 2024
Simple convenience function to return an slog.Logger that writes to
io.Discard. Originally suggested by @ArthurSens
[here](prometheus#686 (comment)),
and requested again by @bboreham
[here](prometheus/prometheus#14906 (comment)).
As Bryan points out in the comment, there's 147 instances where a
discard logger is needed, so a consistent utility function to manage
them seems helpful.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch 2 times, most recently from cf7de3e to b49314d Compare September 27, 2024 20:58
@tjhop tjhop marked this pull request as ready for review September 27, 2024 20:59
@tjhop
Copy link
Contributor Author

tjhop commented Sep 27, 2024

@bboreham I believe this is ready for a re-review. I've rebased on current upstream main, incorporated fixes for some of the feedback provided so far, and go tests working locally.

Things work locally using go workspaces, but as I mention in the updated PR description, this requires functionality in 2 PRs that haven't made their way to an upstream release yet in prometheus/common. Because of this, test failures should be expected as things fail to compile for obvious reasons.

SuperQ pushed a commit to prometheus/common that referenced this pull request Sep 27, 2024
Simple convenience function to return an slog.Logger that writes to
io.Discard. Originally suggested by @ArthurSens
[here](#686 (comment)),
and requested again by @bboreham
[here](prometheus/prometheus#14906 (comment)).
As Bryan points out in the comment, there's 147 instances where a
discard logger is needed, so a consistent utility function to manage
them seems helpful.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@bboreham
Copy link
Member

Pin to a commit in go.mod ?

@tjhop
Copy link
Contributor Author

tjhop commented Sep 29, 2024

Pin to a commit in go.mod ?

Sure, I can do that. I'd prefer not to do that at merge time, but happy to do that for now at least.

@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch 2 times, most recently from 9a3c98a to d52d7f5 Compare September 29, 2024 13:46
@tjhop tjhop requested a review from aknuds1 as a code owner September 29, 2024 14:27
@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch 2 times, most recently from 7bee14c to a09e3bc Compare September 29, 2024 15:57
@tjhop
Copy link
Contributor Author

tjhop commented Sep 30, 2024

All lights are green (after pinning prometheus/common to commit) 🙃

@SuperQ
Copy link
Member

SuperQ commented Sep 30, 2024

I plan to cut a prometheus/common tag after prometheus/common#700.

@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch from 29fdeef to 3cd8b74 Compare October 1, 2024 14:01
@tjhop
Copy link
Contributor Author

tjhop commented Oct 1, 2024

Bumped prometheus/common to v0.60.0, rebased to solve a merge conflict, squashed commits, and updated commit description to match current state 👍

@roidelapluie
Copy link
Member

I approve this but I would appreciate if we do not have the full path to source file

@roidelapluie
Copy link
Member

time=2024-10-01T16:54:36.703+02:00 level=INFO source=/home/roidelapluie/dev/prometheus/cmd/prometheus/main.go:1363 msg="Notifier manager stopped"

didn't we go for ts ?

@SuperQ
Copy link
Member

SuperQ commented Oct 4, 2024

@roidelapluie It looks like you merged a broken change into this PR. 😕

@tjhop
Copy link
Contributor Author

tjhop commented Oct 4, 2024

resolved another merge conflict and fixed up go mods that got messed up earlier

@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch from 46e9d4c to c025769 Compare October 4, 2024 19:45
@roidelapluie
Copy link
Member

Needs a rebase, but I will merge asap once this is fixed

@tjhop tjhop force-pushed the chore/3.0-adopt-slog branch from c025769 to 509638c Compare October 7, 2024 13:14
roidelapluie
roidelapluie previously approved these changes Oct 7, 2024
Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, I see slog does not return error, but we sill moved query logger to slog?

@tjhop
Copy link
Contributor Author

tjhop commented Oct 7, 2024

Mmh, I see slog does not return error, but we sill moved query logger to slog?

Yes, so far we've been operating with the goal of "full slog adoption" as much as possible. The QueryLogger interface has been updated to match the slog sugar logger methods available, the difference being that With() on the query logger does not return a new logger but rather updates the querylogger's internal logger.

I had some similar thoughts. Do you have any thoughts/opinions?

@roidelapluie
Copy link
Member

Let's merge for now ?

@roidelapluie
Copy link
Member

I would have prefered not to change the QueryLogger and ScrapeFailure loggers but no strong opinion

@tjhop
Copy link
Contributor Author

tjhop commented Oct 7, 2024

I'm game for merge. Like @SuperQ had mentioned a ways above, let's get it in and iterate on improvements 👍

@SuperQ
Copy link
Member

SuperQ commented Oct 7, 2024

@roidelapluie As far as I remember looking at the go-kit/log code, the Log() interface swallowed all errors anyway and always returned nil. I think it was only there for internal use or something like that. This is why we added an exception to golangci-lint configs to ignore errcheck issues.

Which reminds me, @tjhop , you can remove this from the .golanci.yml

      # Never check for logger errors.
      - (github.com/go-kit/log.Logger).Log

@tjhop
Copy link
Contributor Author

tjhop commented Oct 7, 2024

Which reminds me, @tjhop , you can remove this from the .golanci.yml

Woops, I had it in there at one point, must've lost in one of the recent rebases. Thanks!

SuperQ
SuperQ previously approved these changes Oct 7, 2024
@tjhop
Copy link
Contributor Author

tjhop commented Oct 7, 2024

Did a thorough self-review and see a few other spots to tidy up, will get things better cleaned up

For: prometheus#14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets gooooooooooooooooo

@SuperQ SuperQ merged commit 805954d into prometheus:main Oct 7, 2024
26 checks passed
tjhop added a commit to tjhop/prometheus that referenced this pull request Nov 21, 2024
Resolves: prometheus#15433

When I converted prometheus to use slog in prometheus#14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying `slog.Logger`, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.

This commit does a few things:
- It was referenced in feedback in prometheus#14906 that it would've been better
  to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
  pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
  methods in the contract.
- updates tests accordingly
- cleans up unused methods

Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
tjhop added a commit to tjhop/prometheus that referenced this pull request Nov 23, 2024
Resolves: prometheus#15433

When I converted prometheus to use slog in prometheus#14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.

This commit does a few things:

- It was referenced in feedback in prometheus#14906 that it would've been better
  to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
  pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
  methods in the contract.
- updates tests accordingly
- cleans up unused methods

Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants