-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply error in loop should not kill the sidecar #2996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good for me, but I would add some metric.
PTAL @s-urbaniak @paulfantom from Prometheus Operator perspective (you use config reloader with this package 🤔 )
// Critical error. | ||
// TODO(bwplotka): There is no need to get process down in this case and decrease availability, handle the error in different way. | ||
return err | ||
level.Error(r.logger).Log("msg", "apply error", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have metric for that as well? (:
Signed-off-by: Max Neverov <neverov.max@gmail.com>
Signed-off-by: Max Neverov <neverov.max@gmail.com>
Signed-off-by: Max Neverov <neverov.max@gmail.com>
This is not the same problem I believe |
Yeah, my mistake not the same problem but the solution is somewhat converging something similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, solid code. Just one minor suggestion for metric name to be more explicit! 💪
Can we also remove |
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Max Neverov <neverov.max@gmail.com>
@bwplotka applied the changes. Please take a look 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 👍
LGTM, thanks!
Changes
Fixes: #2873
Log error when applying changes in the prometheus config instead of stopping the sidecar.
Add new metric
reloader_config_errors_total
to reflect config readings errors.Add new flags
--reloader.retry-interval
and--reloader.watch-interval
to tweak corresponding configs for the reloader.NOTE
I found prometheus.ready_timeout flag with underscore, which seems to be a typo. Please, let me know if it should be changed to
prometheus.ready-timeout
in this PR (breaking change?).Verification
To verify it manually add the following flags to the quickstart.sh:
After start, remove/rename
data/prom0/prometheus.yml
.The sidecar should still run, the error is logged
component=reloader msg="apply error" err="hash file: open data/prom0/prometheus.yml: no such file or directory"
.Check metrics: the new metric
reloader_config_errors_total
is updated.