Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-6596: Initialize controller for OperatorPolicy #160

Merged

Conversation

JeffeyL
Copy link
Contributor

@JeffeyL JeffeyL commented Aug 22, 2023

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good scaffold for adding more features bit by bit! The main thing I'd like to see is to put it behind a feature flag, if that's not too much to ask for this PR (if it ends up being hard, we can talk about delaying it).

Comment on lines 85 to 87
ctx context.Context, freq uint, elected <-chan struct{},
) {
_ = ctx // currently unused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use an underscore in the function signature for the same effect:

Suggested change
ctx context.Context, freq uint, elected <-chan struct{},
) {
_ = ctx // currently unused
_ context.Context, freq uint, elected <-chan struct{},
) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually, you can use the given context in the r.List( call below.

Comment on lines 163 to 164
maxRetries := 3
for i := 1; i <= maxRetries; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've tried this kind of retry thing before, but it eventually got removed (you've probably been digging through the code more recently than me though, so feel free to correct me). In my opinion, retries here aren't necessary - if it fails, it will just get retried next loop anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially as the status becomes more complex (when there are more things that could cause errors) I think it will just be easier to return the error immediately, and handle the retry at a higher level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found where this kind of thing is in the normal config-policy-controller... it looks like it was useful there because of some non-idempotent stuff: if the config-policy should create something, but then can't record that it created it in its status, then there will be buggy behavior on a later reconcile because it incorrectly thinks it didn't create the resource.

So... let's keep an eye out for that kind of situation in the operator policy controller, and see if this ends up being helpful. I think a better approach to the config-policy situation might have been to add an annotation or label to the object that the config-policy-controller created, that way it could determine later whether it had created it or not even if the status didn't get updated correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I also believe that it would get retried the next loop. I'll keep an eye out for the situations you described in your last comment.

Comment on lines 138 to 142
// when testing, evaluate at every interval regardless of policy status
if testFlag {
policy.Status.ComplianceState = policyv1.Compliant

err := r.updatePolicyStatus(policy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this interacts with the shouldEvaluate logic the way you might want... It looks like it will only ever update the status if this testFlag is true, and there is no current way to turn that on. I think the flag can just be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes definitely, having that if statement in there was an oversight from me, I had originally thought of using it in some way but it ended up not being of significant use.

Comment on lines 30 to 31
// OperatorPlcChan a channel used to pass operator policies ready for update.
OperatorPlcChan chan *policyv1beta1.OperatorPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I've seen this in other controllers, but I don't think it's used. Until I see a possible use for it, I'd rather keep it out of here.

main.go Outdated
Comment on lines 352 to 356
if err = OpReconciler.SetupWithManager(mgr); err != nil {
log.Error(err, "Unable to create controller", "controller", "OperatorPolicy")
os.Exit(1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this behind a feature flag? Something that would default to false in parseOpts. Then when we want to test stuff with it, we can turn it on by editing the deployment or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this.

main.go Outdated
Comment on lines 379 to 382
go func() {
OpReconciler.PeriodicallyExecOperatorPolicies(terminatingCtx, opts.frequency, mgr.Elected())
managerCancel()
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also go behind the feature flag.

@JeffeyL
Copy link
Contributor Author

JeffeyL commented Aug 24, 2023

Changes:

  • Removed ctx from PeriodicallyExecOperatorPolicies, I checked out the commit that added it (here) and it seems to be related to having cleaner exit/uninstall behavior. It states in a comment that ctx should not be passed so I opted to not include it. May revisit this in the future if the operator policy controller requires similar behavior.
  • Removed the retries
  • Added a feature flag --enable-operator-policy, to enable put --enable-operator-policy=true in the launch script

JustinKuli
JustinKuli previously approved these changes Aug 25, 2023
Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

Looks good enough to me! Some minor things you could address, but if you want to just move forward with the next steps, feel free to unhold this.

Comment on lines 24 to 25
OperatorControllerName string = "configuration-policy-controller"
OperatorCRDName string = "configurationpolicies.policy.open-cluster-management.io"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: OperatorCRDName isn't used, and OperatorControllerName should probably be "operator-policy-controller"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that's slightly embarrassing

Comment on lines 33 to 34
Scheme *runtime.Scheme
Recorder record.EventRecorder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't know if this reconciler will need these, I think the recorder in particular emits events asynchronously (and unreliably).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick scan through the other controller repos and they all seem to include a scheme in the controller, but proceed not to use it within the reconcile logic (some of the test code does use it, however). The kubebuilder docs have schemes in their controllers as well, but they actually use them for some function calls. Either way, the inclusion/exclusion does not seem to affect the functionality of the controller in any way that I can tell, so it seems to be safe to remove for now, and may be brought back for testing later once that is a concern.

@JustinKuli
Copy link
Member

It looks like it's failing test coverage, probably because with the feature flag, the new code never runs 😆 sorry about that.

Try adding the new flag to be true in https://github.com/open-cluster-management-io/config-policy-controller/blob/main/main_test.go#L19, that should get it running in the tests, which might just push it over the coverage threshold.

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/unhold

LGTM!

@openshift-ci
Copy link

openshift-ci bot commented Aug 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JeffeyL, JustinKuli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants