Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(operatorconditions): don't retry on 404 #2679

Merged

Conversation

njhale
Copy link
Member

@njhale njhale commented Mar 2, 2022

The controllers managing OperatorConditions can churn indefinitely, periodically producing error logs, when a CSV is deleted. To prevent this, don't retry when presented with select NOT_FOUND errors during reconciliation.

To verify this works, I followed the reproduction steps in #2678 both before and after this patch is applied. I then confirmed occurrences of Unable to find in the logs of the olm-operator pod were significantly reduced.

Before

$ kubectl -n olm logs olm-operator-785d845d6b-5plbq | rg 'Unable to find' | wc -l
31

This number seems to increase over time.

After

$ kubectl -n olm logs olm-operator-785d845d6b-vzsgm | rg 'Unable to find' | wc -l
 8

This number seems to be the steady state.

Note: There's probably a better way to test this fix (for example, compare queue prometheus metrics or pprof cpu profiles)

fixes #2678

The controllers managing OperatorConditions can churn indefinitely, periodically
producing error logs, when a CSV is deleted. To prevent this, don't
retry when presented with select NOT_FOUND errors during reconciliation.

Signed-off-by: Nick Hale <njohnhale@gmail.com>
@openshift-ci
Copy link

openshift-ci bot commented Mar 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
@@ -93,39 +93,34 @@ var _ reconcile.Reconciler = &OperatorConditionReconciler{}

func (r *OperatorConditionReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// Set up a convenient log object so we don't have to type request over and over again
log := r.log.WithValues("request", req)
log.V(2).Info("reconciling operatorcondition")
log := r.log.WithValues("request", req).V(1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(1) is debug level for the underlying logr implementation, zapr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol - of course it is. Why would you call it...Debug...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logr doesn't have named levels, really. It has error logs and info logs. info logs can have varying degrees of importance, so the number reflects the verbosity. Higher number = higher verbosity.

However logr is just an interface. There's an implementation of this interface for all the major logging libraries, so if the underlying implementation has named levels, they end up being mapped to a verbosity number in logr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also... in logr, the verbosity is additive. So log.V(1).V(1) is the same as log.V(2), It's somewhat misleading to say log.V(1) is equivalent to Debug since it depends on what the base verbosity of log is.

Copy link
Member Author

@njhale njhale Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure I'm a huge fan of the logr api from a user perspective.

I'll try to look for some more examples of this being used in the wild. Maybe there's some wrapper/conventions around setting the level that we're missing.

log.V(1).Error(err, "Unable to find operatorcondition")
return ctrl.Result{}, err
if err := r.Client.Get(ctx, req.NamespacedName, operatorCondition); err != nil {
log.Info("Unable to find OperatorCondition")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Error is always logged, regardless of level -- let the invoking controller log the error we bubble up, if any.

return ctrl.Result{}, err
if err := r.Client.Get(ctx, req.NamespacedName, operatorCondition); err != nil {
log.Info("Unable to find OperatorCondition")
return ctrl.Result{}, client.IgnoreNotFound(err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-nil errors returned from Reconcilers are always retried, regardless of the respective Result.

@perdasilva
Copy link
Collaborator

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2022
@openshift-merge-robot openshift-merge-robot merged commit 0b77ba3 into operator-framework:master Mar 2, 2022
@njhale njhale deleted the fix/404-reconcile branch March 2, 2022 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OperatorCondition/Generator controllers churn on missing CSVs
4 participants