-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uninstall scenario improvements #303
Uninstall scenario improvements #303
Conversation
With the move to being event-driven by default, finalizers were not being removed during the uninstall process because those policies were not triggered to be reconciled. Now the `trigger-uninstall` process will add an annotation to policies with finalizers, which the main controller now watches for. The value of the uninstall annotation is now a timestamp, as opposed to just being set to "true" - this helps provide visibility, and lets the `trigger-uninstall` process to update the annotation multiple times to ensure a reconcile is triggered. Refs: - https://issues.redhat.com/browse/ACM-14707 Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@@ -678,6 +678,9 @@ func handleTriggerUninstall() { | |||
triggerUninstallFlagSet.StringVar( | |||
&policyNamespace, "policy-namespace", "", "The namespace of where ConfigurationPolicy objects are stored", | |||
) | |||
triggerUninstallFlagSet.StringVar( | |||
&additionalNamespace, "additional-namespace", "", "An additional namespace for ConfigurationPolicy objects", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should config-controller watch this namespace too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trigger-uninstall
flag is meant to match when the main controller is watching "open-cluster-management-policies": https://github.com/open-cluster-management-io/config-policy-controller/blob/main/main.go#L259
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to add it in the addon-controller
configPolicies, err := policyClient.List(ctx, metav1.ListOptions{}) | ||
if err != nil { | ||
if k8serrors.IsServerTimeout(err) || k8serrors.IsTimeout(err) { | ||
klog.Infof("Retrying listing the ConfigurationPolicy objects due to error: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this retrying? Since it uses continue, it skips the current namespace in this iteration."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, I need to make it continue
the bigger loop I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe just mark cleanedUp = false
... but now that I'm looking at it closer, I don't quite follow why we're specially handling these specific errors. @mprahl do you remember?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JustinKuli I think it was to not continuously retry if the error was unrecoverable and let Kubernetes restart the pod in those cases. I think we probably could just always return the error and let Kubernetes restart the pod.
/hold |
This is intended to help remove finalizers on policies in the "open-cluster-management-policies" namespace. A new argument must be added to the `trigger-uninstall` process to utilize this. Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
fbcd0d0
to
83a423c
Compare
/unhold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JustinKuli, yiraeChristineKim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3508c8f
into
open-cluster-management-io:main
See each commit for details.
The previous test for the uninstall scenario was incorrectly passing, I think because there were leftover re-reconciles on the policies it was using. But generally those policies would not be reconciled when the uninstall starts.