-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to run bundle: install plan is not available for the subscription #5410
Comments
@pohly do you have prometheus metrics enabled in your operator? If so, is the prometheus stack set up in the cluster. Just checking to see if that is the cause of the issue. Also can you please mention the logs from the installplan if it exists in the cluster. |
Hi @pohly, Could you please let us know? a) Have we enabled the metrics in your project? if yes, have you installed Prometheus on the cluster? |
The operator supports metrics collection and has a "metrics" port - see the manual deployment YAML. Is that what you mean with "enabled the metrics"? Prometheus is not installed.
The project is https://github.com/intel/pmem-csi. First bring up some Kubernetes cluster without OLM installed and set Then run:
This needs a Docker registry. You can use some external one like quay.io. In this example, I am running one on port 5001 of my build machine, which can be reached via 172.17.42.1 from inside the cluster. It doesn't use TLS. Once
Which details do you need? |
This is the installplans.operators.coreos.com CRD, right? There is no object of that kind after the failure and also none while operator-sdk is running. |
Hi @pohly, Following are some comments inline:
You enable the metrics in the config/default/kustomization https://github.com/operator-framework/operator-sdk/blob/master/testdata/go/v3/memcached-operator/config/default/kustomization.yaml#L24-L25 with its layout. However, by looking at your project you deviated from the proposed layout. Also, please be aware of: https://sdk.operatorframework.io/docs/faqs/#can-i-customize-the-projects-initialized-with-operator-sdk
If the Operator is integrated with OLM and the bundle has a If your bundle are you shipping the ServiceMonitor such as https://github.com/operator-framework/operator-sdk/blob/master/testdata/go/v3/memcached-operator/bundle/manifests/memcached-operator-controller-manager-metrics-monitor_monitoring.coreos.com_v1_servicemonitor.yaml#L1-L2 then the installPlan would fail. That would happen with any CRD/API required for the operator works that do not exist on the cluster. Could you please provide the bundle that are you using + ensure that the Operator image is published in a public space so we are able to easily reproduce the issue by running |
This project existed before we started adding an operator. That determined the layout. We understand that this not how the SDK is normally meant to be used, but rewriting the entire project wasn't ideal either. Thanks for any assistance that you can provide here despite the unusual approach. Looking at what you said about enabling metrics my conclusion is that we don't enable those. Here's the bundle content:
I pushed the image to |
Hi @pohly, Would possible to just add the zip/dir with the bundle content? Otherwise, we need to copy and paste and manually generate it to try to test and see if we can help you out. |
Do you still need that when the generated image is available (see But I can of course also attach the original bundle files: bundle-1.0.1.tar.gz |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale @camilamacedo86 is there anything further that I can do to investigate this? We've not had less luck lately in our periodic CI runs with OLM 0.18.3 because of new failure ( |
No particular reason for it, just staying up-to-date. operator-sdk cannot be updated because of intel#1069 = operator-framework/operator-sdk#5574 OLM cannot be updated because of operator-framework/operator-sdk#5410
Was a subscription actually created? I'm not super familiar with run bundle, but I assume that it's basically generating a catalog source at runtime and then creating a subscription pointing to the catalog it generated. if no installplan was generated, it could be a resolution error or a problem getting content from that catalog. |
I don't know. How do I check? Our CI jobs capture the output of all pods, perhaps that would help? Unfortunately the ones with the more recent OLM expired. Let me kick one off once more... |
The only real way to install an operator is to create a Subscription resource -- that's the entrypoint API used to install an operator with OLM. I'm not an sdk developer (I work on OLM), but I am making the assumption that this run bundle error is happening because that Subscription is failing. So, if we look at the status of the subscription, it may give us a better idea of why the install is failing. |
They provide the bundle files here: #5410 (comment) @rashmigottipati @jmrodri could we try to check this one? WDYT about we add this one in a milestone to be checked with the latest release? |
I tried with operator-sdk 1.19.1 and OLM 0.20.0, with Kubernetes 1.19 and 1.22. The "install plan is not available for the subscription pmem-csi-operator-v100-0-0-sub" occurred for both. Attached is the log output from https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1071/2/artifact/joblog-jenkins-pmem-csi-PR-1071-2-test-1.22.log Beware that some output gets dumped repeatedly, for example:
|
I also tried with operator-sdk 1.18.0 and OLM 0.20.0. That worked once (Kubernetes 1.22) and failed once (1.19). It seems to be a bit random, but usually it fails reliably. |
Update: I've been successful when installing OLM 0.20.0 on a fresh test cluster. The failure only seems to occur when the cluster has been in use for a while, i.e. several other tests not involving OLM ran earlier. |
We have the same issue with |
I could also reproduce with: (SDK master branch)
Following the steps
And then, by checking the bundle logs: (kubectl logs pod/quay-io-operatorhubio-hive-operator-v2-5-3508-6cb94c6)
Also, we found the same above issue by using the It seems to be an issue associated with OPM. |
We could find a way to reproduce the issue with OPM and without SDK. So, we raise an issue for we get it fixed in OPM: operator-framework/operator-registry#952 |
This issue shows to be the scenario clarified and tracked via: #5773. To avoid duplication and centralize the info, it seems like we can close this one in favour of #5773. Note that some workarounds were also proposed in the issue: #5773. Please, ensure that you check if the proposed workarounds can help you out. If you check that your problem is not the same scenario, we would like to ask for you re-open this issue. Thank you for your attention and collaboration. |
That is not the root cause of the failure that I ran into with PMEM-CSI. If I install OLM on a fresh test cluster, running the bundle works. If I do the same thing after the cluster has been in use for a while, it fails for the same bundle. I can't tell from the log files (see my earlier comments) what might be going wrong. As I have a workaround (run OLM tests first), I am not going to reopen this issue unless it pops up again. |
Bug Report
I also reported this in operator-framework/operator-lifecycle-manager#2454 but as this might also be an issue in operator-sdk, let me also file an issue here.
What did you do?
operator-sdk olm install
operator-sdk run bundle
What did you expect to see?
The operator should start to run.
What did you see instead? Under which circumstances?
This only happens with OLM 1.19.1. The same commands work when installing OLM 0.18.3 with
operator-sdk olm install --version=v0.18.3
. UPDATE: there is some randomness involved and it may depend on cluster load and/or state, see #5410 (comment) and #5410 (comment).Environment
Operator type:
/language go
Kubernetes cluster type:
kubeadm in VMs with Kubernetes 1.21.1
$ operator-sdk version
operator-sdk version: "v1.15.0", commit: "f6326e832a8a5e5453d0ad25e86714a0de2c0fc8", kubernetes version: "1.21", go version: "go1.16.10", GOOS: "linux", GOARCH: "amd64"
$ go version
(if language is Go)go version go1.17.2 linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-27T08:53:39Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Additional context
I encountered this in PMEM-CSI, tracked there as intel/pmem-csi#1050
More diagnostics:
Note the odd "AllCatalogSourcesHealthy: False". The catalog-operator pod here might be responsible for it (not sure) and reports an error (
E1123 15:36:52.688776 1 queueinformer_operator.go:290] sync {"update" "default/pmem-csi-operator-v100-0-0-sub"} failed: Operation cannot be fulfilled on subscriptions.operators.coreos.com "pmem-csi-operator-v100-0-0-sub": the object has been modified; please apply your changes to the latest version and try again
):This repeats a few times but then not anymore. Deleting that pod doesn't help, the recreated one has the same problem.
For comparison, here is the output with OLM 0.18.3. It has the same update error, so that might be a red herring:
The text was updated successfully, but these errors were encountered: