-
Notifications
You must be signed in to change notification settings - Fork 519
Upgrade older cluster from 1.16.4 to 1.16.11 fails with CSE exit code 35 #3618
Comments
Hi @chreichert, could you retry this upgrade, and make sure that these are the api model configuration values inside
|
Thanks @jackfrancis, this helped. After modifying kubernetesImageBase and adding kubernetesImageBaseType in our apimodel.json as mentioned by you above, I could successfully upgrade our cluster to 1.16.11. One thing to mention: kubernetes-dashboard was not cleanly reconciled during the upgrade. I ended up with two versions of the dashboard, old in namespace kube-system and new in namespace kubernetes-dashboard. I deleted the old deployment in namespace kube-system manually to clean things up. Hope that was enough to get rid of all old dashboard artefacts? |
@chreichert Glad that got you through. This is a bug, btw, that I'll look into today. In the meanwhile you have a workaround :/ Correct about post-upgrade cleanup. Depending on the version-to-version path, and the initial cluster configuration, there may be leftover cruft, in your example you've observed dashboard. metrics-server, and other components may also need a nudge. You're doing the right thing to audit your cluster after upgrade, hopefully the set of things that need manual poking is consistent across your fleet of clusters and so that poking can be automated? |
@jackfrancis We're currently on finding our way to upgrade to latest 1.17 or 1.18 by testing this with our test cluster, before doing our production cluster. This was the first step with going to latest 1.16. After testing our apps, I will continue with upgrading to 1.17.7 and so on. Still some manual steps involved, but its still manageable. |
@chreichert are you able to paste the original values of kubernetesImageBase and kubernetesImageBaseType in your api model before you changed them? That will help to ensure that PR #3625 has the proper fixes so that manual step is not needed. |
@jackfrancis This were the original settings before upgrade:
kubernetesImageBaseType was not present before. You can find the full apimodel in the original post above (folded). |
Describe the bug
Upgrading an older cluster, that was initially created with ACS-Engine 0.21.2, from 1.16.4 to 1.16.11 stops while deploying first upgraded master node with error: "VM has reported a failure when processing extension 'cse-master-0'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=35"
Steps To Reproduce
Latest Upgrade of the cluster has been done with AKS_Engine Version 0.45.0.
Resulting API-Model:
api-model
Upgrade this cluster using AKS-Engine 0.53.0 to 1.16.11 with command:
Produces the following error:
Expected behavior
Cluster can be upgraded to 1.16.11 without errors.
AKS Engine version
0.53.0
Kubernetes version
1.16.4
Additional context
Looking at /var/log/azure/cluster-provision.log on the failing master node, it shows, that the hyperkube image could not be pulled:
The text was updated successfully, but these errors were encountered: