-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My helm operator hang after I upgraded helm-operator from v1.33.0 to v1.34.0 #6690
Comments
|
@kschanrtp |
@acornett21 Same problem with 1.34.0
|
@acornett21 |
@sudhir-kelkar I have not looked at this, I was just relating all the issues that came in, and asking if this still existed in 1.34.1, since 1.34.0 release was incomplete. I personally will not have time to look at this for a few weeks, I'm only a contributor to this project, not a dedicated maintainer. |
Could you please share the structure of your CR? The most likely reason something like this happens is if your RBAC is incorrect and the controller doesn't have permissions to see all the resources it needs to. Could you please post the output of your subscription (if you're using OLM). |
relates #6651 |
Any udpate on this, even after using 1.34.1 not able to see any pods after cr deployment, |
@acornett21 any inputs why 1.34.1 is not working? bumping back to 1.33.0 is perfectly working perfectly fine. |
Something broke when we cut 1.34. We're not sure what exactly but are currently investigating. |
+1 for this issue, moving from 1.33 to 1.34.1 has stopped any process of reconciliation |
I have verified the 1.34.2 has resolved my issue. |
I am still having problem with 1.34.2. Same problem. It does not do the reconcilation.
|
Sorry, I spoke too soon. It does appear there is no reconciliation occurring. |
@jberkhahn Is it possible to create 1.33.1 based on 1.33.0 but compile with latest ubi 8 image to pick up security fixes in the ubi 8 image? |
Hi @kschanrtp You're in control of your operator controller image and it's updates, if you want/need to update you can update the
Or if you only want to update the libraries with CVE's you can do those individually. |
@acornett21 I thought I have done that and it did not work. I will try again. May be my order of the update is not correct. |
The CVEs are on the go module side of the helm-operator. |
@joelanford Any idea when we will have the fix, |
Still not working using helm-operator 1.35.0. |
@kschanrtp What process are you following to update? Are you just updating the controller image? or are you also updating the version that is used for scaffolding/bundling? |
@acornett21 I just update the controller image
|
You probably need to update the to version |
We're seeing the same problem with 1.35.0. We also use the |
Like mentioned, in the previous message, you probably need to re-scaffold the project...see the release notes below https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.35.0/ |
@acornett21 didnt get this re-scaffold part, ususally we are using FROM and getting the version, |
@malli31 There is an operator-sdk binary that was used to create your operator project. I'm talking about making sure the binary to scaffold the project, and bundle it, matches the |
@acornett21 i get how to scaffold , but from documentation I see Backwards Compatibility when Upgrading Operator-sdk version Currently we dont want any new features, from documention it says jus by upgrading/bumping up helm operator version should work OOTB, but its not working, The problem we dont want to upgrade is same helm charts we use for operator,helm,yaml generation, adding all these extra new things will complicate our build and deployments, Can you suggest if by taking new rbac and few folders from config [scaffold that are generated] can i deploy my operator, is this recomennded? Also we dont want to complicate our deployment, till now we dont need operator-sdk to deploy our helm operator Simply below two commands are sufficient for installation |
There were many versions of operator-sdk that were broken for helm, so if you want it to work, it would be best to re-scaffold, the backwards compatibility is out the window if there is a bug. This isn't a new feature, it's a bug fix. My recommendation is to rescaffold if you want to use the latest image at runtime, this is the only thing that has been tested. |
I was facing same issue with helm-operator SDK version 1.35.0. |
We tried re-scaffolding our operator project, but considering the scope and complexity, it involves a lot of work. After doing that, we found it did not solve the issue for us. After removing WATCH_NAMESPACE, we saw that operator reconciliation started working, but it also required many RBAC role updates. Initially, we thought we only needed to provide Role permission to access our CR across namespaces, but while going through the testing exercise, we found there were many role updates that we needed to do. Initially we got issue with CR itself like
Once the main CR issue resolved after adding require Roles permission, now we are getting the same issue for other multiple resources. We should not end up providing permission to each resource as it can invite lots of vulnerabilities.
We had an internal discussion with our team and concluded that since the problem is now narrowed down to WATCH_NAMESPACE, it would be better if we got a fix from the operator SDK team for this. |
@acornett21 Could you please check these 2 comments [ #6690 (comment) , #6690 (comment) ] & suggest. |
Confirming we have the outcome as this. Once removing WATCH_NAMESPACE, the operator reconciles and seems to be functioning but has many errors in the log relating to APIs access as mentioned. |
As I mentioned in my earlier update, the current problem is narrowed down to the WATCH_NAMESPACE. Parallelly, we have also pursued efforts to make the operator work by removing WATCH_NAMESPACE and adding the required permissions to the resources through the necessary cluster role. After adding these cluster roles, I no longer see the "failed to list" and "failed to watch" errors. The operator is successfully reconciled and installed. However, I am observing this trace in the operator logs, and I am not sure whether we can safely ignore it or not. Please guide.
|
@acornett21 I don't see this fixed in the latest version, 1.36.0, either. |
@joelanford @acornett21 Can we reopen this issue, since based on the recent comments #6769 doesn't seem to have fixed it completely. |
Bug Report
What did you do?
I upgraded the helm-operator version from v1.33.0 to v1.34.0
What did you expect to see?
My helm operator deploy helm chart successfully
What did you see instead? Under which circumstances?
My helm operator hang doing new install.
I did notice there is great jump of version for helm-operator-plugins. Not sure if this related or not
I have anonymized the log output below.
Working helm operator log running v1.33.0
helm operator log running v1.34.0
Environment
Operator type:
Kubernetes cluster type:
$ operator-sdk version
operator-sdk-v1.12.0+git
$ go version
(if language is Go)go: 1.21.1
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.10+28ed2d7", GitCommit:"c725f2ce5164bf4165b22d6c28dd0ace4b3b7e9b", GitTreeState:"clean", BuildDate:"2024-01-23T03:16:21Z", GoVersion:"go1.20.12 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered: