Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninstall scenario improvements #303

Conversation

JustinKuli
Copy link
Member

See each commit for details.

The previous test for the uninstall scenario was incorrectly passing, I think because there were leftover re-reconciles on the policies it was using. But generally those policies would not be reconciled when the uninstall starts.

With the move to being event-driven by default, finalizers were not
being removed during the uninstall process because those policies were
not triggered to be reconciled. Now the `trigger-uninstall` process will
add an annotation to policies with finalizers, which the main controller
now watches for.

The value of the uninstall annotation is now a timestamp, as opposed to
just being set to "true" - this helps provide visibility, and lets the
`trigger-uninstall` process to update the annotation multiple times to
ensure a reconcile is triggered.

Refs:
 - https://issues.redhat.com/browse/ACM-14707

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@@ -678,6 +678,9 @@ func handleTriggerUninstall() {
triggerUninstallFlagSet.StringVar(
&policyNamespace, "policy-namespace", "", "The namespace of where ConfigurationPolicy objects are stored",
)
triggerUninstallFlagSet.StringVar(
&additionalNamespace, "additional-namespace", "", "An additional namespace for ConfigurationPolicy objects",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should config-controller watch this namespace too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trigger-uninstall flag is meant to match when the main controller is watching "open-cluster-management-policies": https://github.com/open-cluster-management-io/config-policy-controller/blob/main/main.go#L259

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to add it in the addon-controller

configPolicies, err := policyClient.List(ctx, metav1.ListOptions{})
if err != nil {
if k8serrors.IsServerTimeout(err) || k8serrors.IsTimeout(err) {
klog.Infof("Retrying listing the ConfigurationPolicy objects due to error: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this retrying? Since it uses continue, it skips the current namespace in this iteration."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, I need to make it continue the bigger loop I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just mark cleanedUp = false... but now that I'm looking at it closer, I don't quite follow why we're specially handling these specific errors. @mprahl do you remember?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli I think it was to not continuously retry if the error was unrecoverable and let Kubernetes restart the pod in those cases. I think we probably could just always return the error and let Kubernetes restart the pod.

@JustinKuli
Copy link
Member Author

/hold

This is intended to help remove finalizers on policies in the
"open-cluster-management-policies" namespace. A new argument must be
added to the `trigger-uninstall` process to utilize this.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli force-pushed the 14707-watch-for-uninstall branch from fbcd0d0 to 83a423c Compare October 9, 2024 16:54
@JustinKuli
Copy link
Member Author

/unhold

Copy link

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JustinKuli,yiraeChristineKim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants