-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm operator constantly creates secret for a failed Operand upgrade #6494
Comments
Hi @sukhil-suresh, I am working to more precisely identify the source of the weird behavior you raised in this issue. I was able to confirm the behavior you explained thanks to the sample project provided. However, work is going on to narrow down the root cause. As I look into this, we would recommend restricting requests which could not be completed such as resizing down a PVC being sent to the We will provide updates as we gather more information on the issue. Feel free to keep us updated with any new findings. |
Thanks for looking into this @OchiengEd
Is it possible to have a validating webhook with an Operator generated using the I raised the ticket because this happened in our helm-operator-based product when a customer reduced the PVC size which is configurable via the helm value. |
@sukhil-suresh from my review, it does not seem possible to use webhooks for helm operators. For the sake of others interested in the issue, it seems when a helm rollback is executed with the However, due to the quick succession of these actions, the new service account would be recreated before the secrets associated with the deleted service account are garbage collected. This consequently causes more secrets to be created and for each retry a new set of secrets would be generated. The same behavior is observed when using the helm-operator however since the custom resource does not get to the desired state, the reconciliation process does not stop. |
@OchiengEd sorry about the late response - I was not at work last week.
Thanks for confirming.
The service account is not deleted and recreated, it is repeatedly modified. I am basing this on the watch event logs of a service account captured through a successful Operand install and an upgrade-rollback loop. The logs show that the service account
Since the service account never gets deleted, garbage collection may not be relevant. Thank you for finding the significance of helm rollback with the However, the continued constant increase of the secrets for a service account is still specific to the helm-operator since the helm-controller clears any helm revision which does not have a successful deployment status. operator-sdk/internal/helm/release/manager.go Lines 105 to 115 in b6b3744
|
Is there a reason why the helm-controller does a rollback with the operator-sdk/internal/helm/release/manager.go Line 220 in 888ea7f
Interestingly, the helm-controller upgrade process does not have operator-sdk/internal/helm/controller/reconcile.go Lines 313 to 314 in 888ea7f
I built a debug helm-operator image with If the Maintainers are open to having the Operator rollback |
Skimming through the helm code:
Given that resources can be updated after they are created using the helm-generated manifest, a three-way merge should be the default behaviour for a helm-controller-based rollback operation (similar to the helm CLI). A force if required at all, could be made configurable using an annotation on the Operand (similar to the helm-controller upgrade). A helm rollback with a |
I can bring up the issue in the operator-sdk meeting and see what the collective opinion is on making the change. |
@sukhil-suresh Brought this up in a grooming meeting; Making the rollback configurable the user was welcomed. Let me know if you have something already worked out. If not, we can have an issue open for a feature request. |
Thanks for the update, @OchiengEd.
Shouldn't the default rollback be changed not to use the
I haven't put in any effort yet. |
The default could still remain to be true for I should be able to push out at least a draft PR early next week if we won't have anything to expose the option to configure the rollback |
I am curious as to why the same behaviour has to be maintained when we know it is incorrect and is a bug. The bug was demonstrated with the sample app. A helm rollback with a
Thank you very much, @OchiengEd. |
It makes more sense to keep Changing the default value of I brought this up in the community meeting and this approach seems to have more support. |
Thanks for the clarification, @OchiengEd. I understand and look forward to the fix. |
Bug Report
When an Operand upgrade fails and the bundled helm chart has one or more PersistentVolumeClaim and ServiceAccount, the secrets generated by the
openshift-controller-manager
for each ServiceAccount increase constantly until a successful Operand update is made.Sample helm operator demonstrating the problem: https://github.com/sukhil-suresh/openshift-operator-bug
What did you do?
I successfully deployed an Operand using a helm-based Operator and then proceeded to edit the Operand to reduce the size for a PersistentVolumeClaim. (Customers inadvertently reduce storage size).
What did you expect to see?
I expected to see the Operand upgrade fail (since reducing the storage is not allowed) and then a successful rollback should happen.
What did you see instead? Under which circumstances?
The helm-operator controller stays stuck in a loop of constant upgrade/rollback failure and the service account secrets increase constantly.
The screenshot below shows secrets increasing for the
sample-sa
service account:The helm revision history constantly toggles between listing only Revision 1 and listing Revision 1, 2 and 3.
Revision 1 is for
deployed
status withInstall complete
as description.Revision 2 is for the upgrade failure:
Revision 3 is for the rollback failure:
Environment
Helm-Operator
/language helm
Kubernetes cluster type:
OpenShift 4.12.32
$ operator-sdk version
$ kubectl version
Additional context
This problem does not happen with a regular helm install/upgrade on OpenShift and is specific to the helm-based Operator.
The text was updated successfully, but these errors were encountered: