-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: promtail; clean up metrics generated from logs after a config reload. #11882
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptodev need to rebase/merge in master to get some CI updates
pl.Cleanup() | ||
|
||
if err := testutil.GatherAndCompare(registry, | ||
strings.NewReader("")); err != nil { | ||
t.Fatalf("mismatch metrics: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we specifically check for the absence of the metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After pl.Cleanup()
, no metrics should be visible. I don't see an advantage in checking individual metrics when there shouldn't be any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there's originally two metrics stages, and after we reload there should be only one? we should still check for it's existence and correct value, I think right now we're deleting metrics for stages that will still exist after the reload which would not be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that deleting all the metrics is the right thing to do. The fact that a metric exists doesn't mean that it's used in the same way in the new config. It's possible that the user changed the meaning of each metric, but retained the metric names. In that case it would make sense to start with fresh metrics.
In the future we could make the code smart and reload only the stages which changed, but I think that's a separate issue. To do that, we could avoid calling the Cleanup
method for stages which are completely unchanged as a whole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document this clearly in the metrics config section
#### metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this line to both doc pages:
If Promtail's configuration is reloaded, all metrics will be reset.
62e6140
to
a4e8ae0
Compare
a4e8ae0
to
85b0900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptodev sorry for the delay, I think we just need some docs, also have you guys been using this code downstream in Agent/Alloy already or would merging this be the first usage of this code?
pl.Cleanup() | ||
|
||
if err := testutil.GatherAndCompare(registry, | ||
strings.NewReader("")); err != nil { | ||
t.Fatalf("mismatch metrics: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document this clearly in the metrics config section
#### metrics |
85b0900
to
1d1261d
Compare
1d1261d
to
a300762
Compare
Thank you for the review! This is the first usage. Alloy and Agent import the Promtail code, so in order to update them I will need to merge this PR first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to approve/merge this with a note that if we get reports of issues I'll revert this change. Hopefully we here from the Alloy team after a short period of time whether they have confirmed usages that are working well or not.
It is possible to generate metrics from log lines in Promtail. If the config file is reloaded, those metrics are currently not cleaned up. This causes a few issues:
/metrics
endpoint.This will help us fix a bug reported in the Grafana Agent.
I suppose no changelog entry is required because PromQL is already designed to handle counter resets? But if you think it's appropriate, I could add a changelog entry?
I tested this change locally using a config file like this:
Promtail config
Then I'd do a
curl localhost:9080/reload
and checkhttp://localhost:9080/metrics
.I didn't add a whole lot of unit tests because TBH we should probably focus on adding tests in the Agent, given that we're sunsetting Promtail over time.