-
Notifications
You must be signed in to change notification settings - Fork 276
FatalInitializationError osm upgrade #2491
Comments
OSM version v0.7.0 supports the latest versions of SMI CRDs. Because Helm does not manage CRDs beyond the initial installation, special care needs to be taken during upgrades when CRDs are changed. To upgrade osm, refer to this upgrade guide. Specifically, to upgrade from an older version of SMI CRDs to the latest, delete outdated CRDs. If you've already upgraded without deleting the CRDs, you can fix your deployment by following this this troubleshooting guide |
Reopen this issue to discuss what is the desirable experience when CRDs are out of date. Until the conversion webhook is in place, should the controller crash? |
The controller exits because the K8s API version for the resources it expects is not available in the cluster. This is not a crash, but a voluntary exit by the controller because it can't function without the necessary newer CRDs. Currently, we do not have the capability to support multiple API versions for a resource, so the controller not exiting and silently running is not an option. Documentation around CRD upgrades: https://github.com/openservicemesh/osm/blob/main/docs/content/docs/upgrade_guide.md#crd-upgrades @ritazh, what do you think? |
FWIW, I think deployment failure is the right experience when the new version of the CRD is not in the cluster. Otherwise operator won't know there is an issue. The error in the controller pod log helped me understand I'm missing the right version. In addition to users fishing thru the pod log, do we generate any K8s events for this type of errors? A separate question is what if I'm not using the SMI policy mode feature? should this still fail? |
K8s events are generated for Fatals event from the controller, such as this one.
Per current design, yes, because informer resource initialization happens irrespective of the traffic policy mode. Even if we deferred resource initialization based on the traffic policy mode, one could update the mode in the ConfigMap and cause the controller to exit. This approach is simple and applies to other K8s resources as well - Ingress, cert-manager.io, etc. If there is a use case to defer SMI resource initialization, we could consider it. |
Closing based on #2491 (comment) and clarification provided in #2491 (comment). With #2737, we will hopefully never run into this issue. But the desired behavior at the moment is to ensure all components within osm-controller can initialize correctly at startup, if not the controller will exit with a |
Bug description:
Upgrading osm from v0.6.1 to v0.7.0, getting the following error on pod start:
You may also see errors like this from the osm controller pod:
Failed to list *v1alpha4.TCPRoute: the server could not find the requested resource (get tcproutes.specs.smi-spec.io)
Affected area (please mark with X where applicable):
Expected behavior:
Successfully start osm deployment
Steps to reproduce the bug (as precisely as possible):
Helm install osm chart v0.6.1. upgrade to osm chart v0.7.0.
How was OSM installed?:
Helm
Anything else we need to know?:
Environment:
osm version
):kubectl version
):The text was updated successfully, but these errors were encountered: