-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(operatorconditions): don't retry on 404 #2679
fix(operatorconditions): don't retry on 404 #2679
Conversation
The controllers managing OperatorConditions can churn indefinitely, periodically producing error logs, when a CSV is deleted. To prevent this, don't retry when presented with select NOT_FOUND errors during reconciliation. Signed-off-by: Nick Hale <njohnhale@gmail.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: njhale The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -93,39 +93,34 @@ var _ reconcile.Reconciler = &OperatorConditionReconciler{} | |||
|
|||
func (r *OperatorConditionReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | |||
// Set up a convenient log object so we don't have to type request over and over again | |||
log := r.log.WithValues("request", req) | |||
log.V(2).Info("reconciling operatorcondition") | |||
log := r.log.WithValues("request", req).V(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V(1)
is debug level for the underlying logr implementation, zapr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol - of course it is. Why would you call it...Debug
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logr
doesn't have named levels, really. It has error logs and info logs. info logs can have varying degrees of importance, so the number reflects the verbosity. Higher number = higher verbosity.
However logr is just an interface. There's an implementation of this interface for all the major logging libraries, so if the underlying implementation has named levels, they end up being mapped to a verbosity number in logr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also... in logr, the verbosity is additive. So log.V(1).V(1)
is the same as log.V(2)
, It's somewhat misleading to say log.V(1)
is equivalent to Debug
since it depends on what the base verbosity of log
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure I'm a huge fan of the logr
api from a user perspective.
I'll try to look for some more examples of this being used in the wild. Maybe there's some wrapper/conventions around setting the level that we're missing.
log.V(1).Error(err, "Unable to find operatorcondition") | ||
return ctrl.Result{}, err | ||
if err := r.Client.Get(ctx, req.NamespacedName, operatorCondition); err != nil { | ||
log.Info("Unable to find OperatorCondition") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Error
is always logged, regardless of level -- let the invoking controller log the error we bubble up, if any.
return ctrl.Result{}, err | ||
if err := r.Client.Get(ctx, req.NamespacedName, operatorCondition); err != nil { | ||
log.Info("Unable to find OperatorCondition") | ||
return ctrl.Result{}, client.IgnoreNotFound(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-nil errors returned from Reconcilers
are always retried, regardless of the respective Result
.
/lgtm |
The controllers managing OperatorConditions can churn indefinitely, periodically producing error logs, when a CSV is deleted. To prevent this, don't retry when presented with select NOT_FOUND errors during reconciliation.
To verify this works, I followed the reproduction steps in #2678 both before and after this patch is applied. I then confirmed occurrences of
Unable to find
in the logs of the olm-operator pod were significantly reduced.Before
This number seems to increase over time.
After
This number seems to be the steady state.
Note: There's probably a better way to test this fix (for example, compare queue prometheus metrics or pprof cpu profiles)
fixes #2678