-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACM-6596: Initialize controller for OperatorPolicy #160
ACM-6596: Initialize controller for OperatorPolicy #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good scaffold for adding more features bit by bit! The main thing I'd like to see is to put it behind a feature flag, if that's not too much to ask for this PR (if it ends up being hard, we can talk about delaying it).
ctx context.Context, freq uint, elected <-chan struct{}, | ||
) { | ||
_ = ctx // currently unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use an underscore in the function signature for the same effect:
ctx context.Context, freq uint, elected <-chan struct{}, | |
) { | |
_ = ctx // currently unused | |
_ context.Context, freq uint, elected <-chan struct{}, | |
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually, you can use the given context in the r.List(
call below.
maxRetries := 3 | ||
for i := 1; i <= maxRetries; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've tried this kind of retry thing before, but it eventually got removed (you've probably been digging through the code more recently than me though, so feel free to correct me). In my opinion, retries here aren't necessary - if it fails, it will just get retried next loop anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially as the status becomes more complex (when there are more things that could cause errors) I think it will just be easier to return the error immediately, and handle the retry at a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found where this kind of thing is in the normal config-policy-controller... it looks like it was useful there because of some non-idempotent stuff: if the config-policy should create something, but then can't record that it created it in its status, then there will be buggy behavior on a later reconcile because it incorrectly thinks it didn't create the resource.
So... let's keep an eye out for that kind of situation in the operator policy controller, and see if this ends up being helpful. I think a better approach to the config-policy situation might have been to add an annotation or label to the object that the config-policy-controller created, that way it could determine later whether it had created it or not even if the status didn't get updated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I also believe that it would get retried the next loop. I'll keep an eye out for the situations you described in your last comment.
// when testing, evaluate at every interval regardless of policy status | ||
if testFlag { | ||
policy.Status.ComplianceState = policyv1.Compliant | ||
|
||
err := r.updatePolicyStatus(policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this interacts with the shouldEvaluate
logic the way you might want... It looks like it will only ever update the status if this testFlag
is true, and there is no current way to turn that on. I think the flag can just be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes definitely, having that if statement in there was an oversight from me, I had originally thought of using it in some way but it ended up not being of significant use.
// OperatorPlcChan a channel used to pass operator policies ready for update. | ||
OperatorPlcChan chan *policyv1beta1.OperatorPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I've seen this in other controllers, but I don't think it's used. Until I see a possible use for it, I'd rather keep it out of here.
main.go
Outdated
if err = OpReconciler.SetupWithManager(mgr); err != nil { | ||
log.Error(err, "Unable to create controller", "controller", "OperatorPolicy") | ||
os.Exit(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this behind a feature flag? Something that would default to false in parseOpts
. Then when we want to test stuff with it, we can turn it on by editing the deployment or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this.
main.go
Outdated
go func() { | ||
OpReconciler.PeriodicallyExecOperatorPolicies(terminatingCtx, opts.frequency, mgr.Elected()) | ||
managerCancel() | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also go behind the feature flag.
Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
Looks good enough to me! Some minor things you could address, but if you want to just move forward with the next steps, feel free to unhold this.
OperatorControllerName string = "configuration-policy-controller" | ||
OperatorCRDName string = "configurationpolicies.policy.open-cluster-management.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: OperatorCRDName
isn't used, and OperatorControllerName
should probably be "operator-policy-controller"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that's slightly embarrassing
Scheme *runtime.Scheme | ||
Recorder record.EventRecorder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't know if this reconciler will need these, I think the recorder in particular emits events asynchronously (and unreliably).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick scan through the other controller repos and they all seem to include a scheme in the controller, but proceed not to use it within the reconcile logic (some of the test code does use it, however). The kubebuilder docs have schemes in their controllers as well, but they actually use them for some function calls. Either way, the inclusion/exclusion does not seem to affect the functionality of the controller in any way that I can tell, so it seems to be safe to remove for now, and may be brought back for testing later once that is a concern.
It looks like it's failing test coverage, probably because with the feature flag, the new code never runs 😆 sorry about that. Try adding the new flag to be true in https://github.com/open-cluster-management-io/config-policy-controller/blob/main/main_test.go#L19, that should get it running in the tests, which might just push it over the coverage threshold. |
ref: https://issues.redhat.com/browse/ACM-6596 Signed-off-by: Jeffrey Luo <jeluo@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/unhold
LGTM!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JeffeyL, JustinKuli The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4f3c09a
into
open-cluster-management-io:main
ref: https://issues.redhat.com/browse/ACM-6596