Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

Upgrade older cluster from 1.16.4 to 1.16.11 fails with CSE exit code 35 #3618

Closed
chreichert opened this issue Jul 20, 2020 · 6 comments · Fixed by #3625
Closed

Upgrade older cluster from 1.16.4 to 1.16.11 fails with CSE exit code 35 #3618

chreichert opened this issue Jul 20, 2020 · 6 comments · Fixed by #3625
Labels
bug Something isn't working

Comments

@chreichert
Copy link

chreichert commented Jul 20, 2020

Describe the bug
Upgrading an older cluster, that was initially created with ACS-Engine 0.21.2, from 1.16.4 to 1.16.11 stops while deploying first upgraded master node with error: "VM has reported a failure when processing extension 'cse-master-0'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=35"

Steps To Reproduce

Latest Upgrade of the cluster has been done with AKS_Engine Version 0.45.0.
Resulting API-Model:

api-model

{
  "apiVersion": "vlabs",
  "location": "northeurope",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.16",
      "orchestratorVersion": "1.16.4",
      "kubernetesConfig": {
        "kubernetesImageBase": "k8s.gcr.io/",
        "mcrKubernetesImageBase": "mcr.microsoft.com/k8s/core/",
        "clusterSubnet": "10.244.0.0/16",
        "dnsServiceIP": "10.0.0.10",
        "serviceCidr": "10.0.0.0/16",
        "networkPolicy": "calico",
        "networkPlugin": "kubenet",
        "containerRuntime": "docker",
        "dockerBridgeSubnet": "172.17.0.1/16",
        "mobyVersion": "3.0.8",
        "useInstanceMetadata": true,
        "enableRbac": true,
        "enableSecureKubelet": true,
        "enableAggregatedAPIs": true,
        "privateCluster": {
          "enabled": true
        },
        "gchighthreshold": 85,
        "gclowthreshold": 80,
        "etcdVersion": "3.3.15",
        "etcdDiskSizeGB": "1024",
        "enablePodSecurityPolicy": true,
        "addons": [
          {
            "name": "blobfuse-flexvolume",
            "enabled": false
          },
          {
            "name": "smb-flexvolume",
            "enabled": false
          },
          {
            "name": "keyvault-flexvolume",
            "enabled": false
          },
          {
            "name": "cluster-autoscaler",
            "enabled": false
          },
          {
            "name": "heapster",
            "enabled": true,
            "containers": [
              {
                "name": "heapster",
                "image": "k8s.gcr.io/heapster-amd64:v1.5.4",
                "cpuRequests": "88m",
                "memoryRequests": "204Mi",
                "cpuLimits": "88m",
                "memoryLimits": "204Mi"
              },
              {
                "name": "heapster-nanny",
                "image": "k8s.gcr.io/addon-resizer:1.8.5",
                "cpuRequests": "88m",
                "memoryRequests": "204Mi",
                "cpuLimits": "88m",
                "memoryLimits": "204Mi"
              }
            ]
          },
          {
            "name": "tiller",
            "enabled": true,
            "containers": [
              {
                "name": "tiller",
                "image": "gcr.io/kubernetes-helm/tiller:v2.13.1",
                "cpuRequests": "50m",
                "memoryRequests": "150Mi",
                "cpuLimits": "50m",
                "memoryLimits": "150Mi"
              }
            ],
            "config": {
              "max-history": "0"
            }
          },
          {
            "name": "aci-connector",
            "enabled": false
          },
          {
            "name": "kubernetes-dashboard",
            "enabled": true,
            "containers": [
              {
                "name": "kubernetes-dashboard",
                "image": "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1",
                "cpuRequests": "300m",
                "memoryRequests": "150Mi",
                "cpuLimits": "300m",
                "memoryLimits": "150Mi"
              }
            ]
          },
          {
            "name": "rescheduler",
            "enabled": false
          },
          {
            "name": "metrics-server",
            "enabled": true,
            "containers": [
              {
                "name": "metrics-server",
                "image": "k8s.gcr.io/metrics-server-amd64:v0.3.4"
              }
            ]
          },
          {
            "name": "nvidia-device-plugin",
            "enabled": false
          },
          {
            "name": "container-monitoring",
            "enabled": false
          },
          {
            "name": "azure-cni-networkmonitor",
            "enabled": false
          },
          {
            "name": "azure-npm-daemonset",
            "enabled": false
          },
          {
            "name": "ip-masq-agent",
            "enabled": true,
            "containers": [
              {
                "name": "ip-masq-agent",
                "image": "k8s.gcr.io/ip-masq-agent-amd64:v2.5.0",
                "cpuRequests": "50m",
                "memoryRequests": "50Mi",
                "cpuLimits": "50m",
                "memoryLimits": "250Mi"
              }
            ],
            "config": {
              "enable-ipv6": "false",
              "non-masq-cni-cidr": "",
              "non-masquerade-cidr": "10.244.0.0/16",
              "secondary-non-masquerade-cidr": ""
            }
          },
          {
            "name": "dns-autoscaler",
            "enabled": false
          },
          {
            "name": "calico-daemonset",
            "enabled": true,
            "containers": [
              {
                "name": "calico-typha",
                "image": "calico/typha:v3.8.0"
              },
              {
                "name": "calico-cni",
                "image": "calico/cni:v3.8.0"
              },
              {
                "name": "calico-node",
                "image": "calico/node:v3.8.0"
              },
              {
                "name": "calico-pod2daemon",
                "image": "calico/pod2daemon-flexvol:v3.8.0"
              },
              {
                "name": "calico-cluster-proportional-autoscaler",
                "image": "k8s.gcr.io/cluster-proportional-autoscaler-amd64:1.1.2-r2"
              }
            ]
          },
          {
            "name": "cloud-node-manager",
            "enabled": false
          },
          {
            "name": "aad-pod-identity",
            "enabled": false
          },
          {
            "name": "appgw-ingress",
            "enabled": false
          },
          {
            "name": "azuredisk-csi-driver",
            "enabled": false
          },
          {
            "name": "azurefile-csi-driver",
            "enabled": false
          },
          {
            "name": "azure-policy",
            "enabled": false
          },
          {
            "name": "node-problem-detector",
            "enabled": false
          },
          {
            "name": "kube-dns",
            "enabled": false
          },
          {
            "name": "coredns",
            "enabled": true,
            "containers": [
              {
                "name": "coredns",
                "image": "k8s.gcr.io/coredns:1.6.5"
              }
            ],
            "config": {
              "clusterIP": "10.0.0.10",
              "domain": "cluster.local"
            }
          },
          {
            "name": "kube-proxy",
            "enabled": true,
            "containers": [
              {
                "name": "kube-proxy",
                "image": "k8s.gcr.io/hyperkube-amd64:v1.16.4"
              }
            ],
            "config": {
              "cluster-cidr": "10.244.0.0/16",
              "featureGates": "{}",
              "proxy-mode": "iptables"
            }
          }
        ],
        "kubeletConfig": {
          "--address": "0.0.0.0",
          "--anonymous-auth": "false",
          "--authentication-token-webhook": "true",
          "--authorization-mode": "Webhook",
          "--azure-container-registry-config": "/etc/kubernetes/azure.json",
          "--cgroups-per-qos": "true",
          "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--cluster-dns": "10.0.0.10",
          "--cluster-domain": "cluster.local",
          "--enforce-node-allocatable": "pods",
          "--event-qps": "0",
          "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
          "--feature-gates": "PodPriority=true,RotateKubeletServerCertificate=true",
          "--image-gc-high-threshold": "85",
          "--image-gc-low-threshold": "80",
          "--image-pull-progress-deadline": "30m",
          "--keep-terminated-pod-volumes": "false",
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--max-pods": "110",
          "--network-plugin": "cni",
          "--node-status-update-frequency": "10s",
          "--non-masquerade-cidr": "0.0.0.0/0",
          "--pod-infra-container-image": "k8s.gcr.io/pause-amd64:3.1",
          "--pod-manifest-path": "/etc/kubernetes/manifests",
          "--pod-max-pids": "-1",
          "--read-only-port": "0",
          "--rotate-certificates": "true",
          "--streaming-connection-idle-timeout": "5m",
          "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
          "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
          "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key"
        },
        "controllerManagerConfig": {
          "--allocate-node-cidrs": "true",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--cluster-cidr": "10.244.0.0/16",
          "--cluster-name": "qa-qknows-k8s-8164",
          "--cluster-signing-cert-file": "/etc/kubernetes/certs/ca.crt",
          "--cluster-signing-key-file": "/etc/kubernetes/certs/ca.key",
          "--configure-cloud-routes": "true",
          "--controllers": "*,bootstrapsigner,tokencleaner",
          "--feature-gates": "LocalStorageCapacityIsolation=true,ServiceNodeExclusion=true",
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--leader-elect": "true",
          "--node-monitor-grace-period": "40s",
          "--pod-eviction-timeout": "5m0s",
          "--profiling": "false",
          "--root-ca-file": "/etc/kubernetes/certs/ca.crt",
          "--route-reconciliation-period": "10s",
          "--service-account-private-key-file": "/etc/kubernetes/certs/apiserver.key",
          "--terminated-pod-gc-threshold": "5000",
          "--use-service-account-credentials": "true",
          "--v": "2"
        },
        "cloudControllerManagerConfig": {
          "--allocate-node-cidrs": "true",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--cluster-cidr": "10.244.0.0/16",
          "--cluster-name": "qa-qknows-k8s-8164",
          "--configure-cloud-routes": "true",
          "--controllers": "*",
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--leader-elect": "true",
          "--route-reconciliation-period": "10s",
          "--v": "2"
        },
        "apiServerConfig": {
          "--advertise-address": "<advertiseAddr>",
          "--allow-privileged": "true",
          "--anonymous-auth": "false",
          "--audit-log-maxage": "30",
          "--audit-log-maxbackup": "10",
          "--audit-log-maxsize": "100",
          "--audit-log-path": "/var/log/kubeaudit/audit.log",
          "--audit-policy-file": "/etc/kubernetes/addons/audit-policy.yaml",
          "--authorization-mode": "Node,RBAC",
          "--bind-address": "0.0.0.0",
          "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--enable-admission-plugins": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,ValidatingAdmissionWebhook,ResourceQuota,ExtendedResourceToleration",
          "--enable-bootstrap-token-auth": "true",
          "--etcd-cafile": "/etc/kubernetes/certs/ca.crt",
          "--etcd-certfile": "/etc/kubernetes/certs/etcdclient.crt",
          "--etcd-keyfile": "/etc/kubernetes/certs/etcdclient.key",
          "--etcd-servers": "https://127.0.0.1:2379",
          "--insecure-port": "8080",
          "--kubelet-client-certificate": "/etc/kubernetes/certs/client.crt",
          "--kubelet-client-key": "/etc/kubernetes/certs/client.key",
          "--oidc-client-id": "***",
          "--oidc-groups-claim": "groups",
          "--oidc-issuer-url": "***",
          "--oidc-username-claim": "oid",
          "--profiling": "false",
          "--proxy-client-cert-file": "/etc/kubernetes/certs/proxy.crt",
          "--proxy-client-key-file": "/etc/kubernetes/certs/proxy.key",
          "--requestheader-allowed-names": "",
          "--requestheader-client-ca-file": "/etc/kubernetes/certs/proxy-ca.crt",
          "--requestheader-extra-headers-prefix": "X-Remote-Extra-",
          "--requestheader-group-headers": "X-Remote-Group",
          "--requestheader-username-headers": "X-Remote-User",
          "--secure-port": "443",
          "--service-account-key-file": "/etc/kubernetes/certs/apiserver.key",
          "--service-account-lookup": "true",
          "--service-cluster-ip-range": "10.0.0.0/16",
          "--storage-backend": "etcd3",
          "--tls-cert-file": "/etc/kubernetes/certs/apiserver.crt",
          "--tls-cipher-suites": "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA",
          "--tls-private-key-file": "/etc/kubernetes/certs/apiserver.key",
          "--v": "4"
        },
        "schedulerConfig": {
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--leader-elect": "true",
          "--profiling": "false",
          "--v": "2"
        },
        "cloudProviderBackoffMode": "v2",
        "cloudProviderBackoff": true,
        "cloudProviderBackoffRetries": 6,
        "cloudProviderBackoffJitter": 1,
        "cloudProviderBackoffDuration": 5,
        "cloudProviderBackoffExponent": 1.5,
        "cloudProviderRateLimit": false,
        "cloudProviderRateLimitQPS": 3,
        "cloudProviderRateLimitQPSWrite": 30,
        "cloudProviderRateLimitBucket": 10,
        "cloudProviderRateLimitBucketWrite": 300,
        "cloudProviderDisableOutboundSNAT": false,
        "loadBalancerSku": "Basic",
        "maximumLoadBalancerRuleCount": 250,
        "kubeProxyMode": "iptables"
      }
    },
    "masterProfile": {
      "count": 3,
      "dnsPrefix": "qa-qknows-k8s-8164",
      "subjectAltNames": null,
      "vmSize": "Standard_D2s_v3",
      "osDiskSizeGB": 128,
      "vnetSubnetID": "/subscriptions/***/resourceGroups/qa_004_QKNOWS_K8s/providers/Microsoft.Network/virtualNetworks/kubernetes-vnet/subnets/kubernetes-subnet",
      "vnetCidr": "10.239.0.0/16",
      "firstConsecutiveStaticIP": "10.239.255.10",
      "storageProfile": "ManagedDisks",
      "oauthEnabled": false,
      "preProvisionExtension": null,
      "extensions": [],
      "distro": "ubuntu",
      "kubernetesConfig": {
        "kubeletConfig": {
          "--address": "0.0.0.0",
          "--anonymous-auth": "false",
          "--authentication-token-webhook": "true",
          "--authorization-mode": "Webhook",
          "--azure-container-registry-config": "/etc/kubernetes/azure.json",
          "--cgroups-per-qos": "true",
          "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--cluster-dns": "10.0.0.10",
          "--cluster-domain": "cluster.local",
          "--enforce-node-allocatable": "pods",
          "--event-qps": "0",
          "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
          "--feature-gates": "PodPriority=true,RotateKubeletServerCertificate=true",
          "--image-gc-high-threshold": "85",
          "--image-gc-low-threshold": "80",
          "--image-pull-progress-deadline": "30m",
          "--keep-terminated-pod-volumes": "false",
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--max-pods": "110",
          "--network-plugin": "cni",
          "--node-status-update-frequency": "10s",
          "--non-masquerade-cidr": "0.0.0.0/0",
          "--pod-infra-container-image": "k8s.gcr.io/pause-amd64:3.1",
          "--pod-manifest-path": "/etc/kubernetes/manifests",
          "--pod-max-pids": "-1",
          "--read-only-port": "0",
          "--rotate-certificates": "true",
          "--streaming-connection-idle-timeout": "5m",
          "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
          "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
          "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key"
        },
        "cloudProviderBackoffMode": ""
      },
      "availabilityProfile": "AvailabilitySet",
      "platformFaultDomainCount": 2,
      "cosmosEtcd": false
    },
    "agentPoolProfiles": [
      {
        "name": "dynamic",
        "count": 1,
        "vmSize": "Standard_E16s_v3",
        "osDiskSizeGB": 128,
        "osType": "Linux",
        "availabilityProfile": "VirtualMachineScaleSets",
        "storageProfile": "ManagedDisks",
        "vnetSubnetID": "/subscriptions/***/resourceGroups/qa_004_QKNOWS_K8s/providers/Microsoft.Network/virtualNetworks/kubernetes-vnet/subnets/kubernetes-subnet",
        "distro": "ubuntu",
        "kubernetesConfig": {
          "kubeletConfig": {
            "--address": "0.0.0.0",
            "--anonymous-auth": "false",
            "--authentication-token-webhook": "true",
            "--authorization-mode": "Webhook",
            "--azure-container-registry-config": "/etc/kubernetes/azure.json",
            "--cgroups-per-qos": "true",
            "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
            "--cloud-config": "/etc/kubernetes/azure.json",
            "--cloud-provider": "azure",
            "--cluster-dns": "10.0.0.10",
            "--cluster-domain": "cluster.local",
            "--enforce-node-allocatable": "pods",
            "--event-qps": "0",
            "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
            "--feature-gates": "PodPriority=true,RotateKubeletServerCertificate=true",
            "--image-gc-high-threshold": "85",
            "--image-gc-low-threshold": "80",
            "--image-pull-progress-deadline": "30m",
            "--keep-terminated-pod-volumes": "false",
            "--kubeconfig": "/var/lib/kubelet/kubeconfig",
            "--max-pods": "110",
            "--network-plugin": "cni",
            "--node-status-update-frequency": "10s",
            "--non-masquerade-cidr": "0.0.0.0/0",
            "--pod-infra-container-image": "k8s.gcr.io/pause-amd64:3.1",
            "--pod-manifest-path": "/etc/kubernetes/manifests",
            "--pod-max-pids": "-1",
            "--read-only-port": "0",
            "--rotate-certificates": "true",
            "--streaming-connection-idle-timeout": "5m",
            "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
            "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
            "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key"
          },
          "cloudProviderBackoffMode": ""
        },
        "acceleratedNetworkingEnabled": true,
        "acceleratedNetworkingEnabledWindows": false,
        "vmssOverProvisioningEnabled": false,
        "auditDEnabled": false,
        "fqdn": "",
        "preProvisionExtension": null,
        "extensions": [],
        "singlePlacementGroup": true,
        "platformFaultDomainCount": null,
        "enableVMSSNodePublicIP": false
      },
      {
        "name": "graph",
        "count": 1,
        "vmSize": "Standard_E32s_v3",
        "osDiskSizeGB": 128,
        "osType": "Linux",
        "availabilityProfile": "VirtualMachineScaleSets",
        "storageProfile": "ManagedDisks",
        "vnetSubnetID": "/subscriptions/***/resourceGroups/qa_004_QKNOWS_K8s/providers/Microsoft.Network/virtualNetworks/kubernetes-vnet/subnets/kubernetes-subnet",
        "distro": "ubuntu",
        "kubernetesConfig": {
          "kubeletConfig": {
            "--address": "0.0.0.0",
            "--anonymous-auth": "false",
            "--authentication-token-webhook": "true",
            "--authorization-mode": "Webhook",
            "--azure-container-registry-config": "/etc/kubernetes/azure.json",
            "--cgroups-per-qos": "true",
            "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
            "--cloud-config": "/etc/kubernetes/azure.json",
            "--cloud-provider": "azure",
            "--cluster-dns": "10.0.0.10",
            "--cluster-domain": "cluster.local",
            "--enforce-node-allocatable": "pods",
            "--event-qps": "0",
            "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
            "--feature-gates": "PodPriority=true,RotateKubeletServerCertificate=true",
            "--image-gc-high-threshold": "85",
            "--image-gc-low-threshold": "80",
            "--image-pull-progress-deadline": "30m",
            "--keep-terminated-pod-volumes": "false",
            "--kubeconfig": "/var/lib/kubelet/kubeconfig",
            "--max-pods": "110",
            "--network-plugin": "cni",
            "--node-status-update-frequency": "10s",
            "--non-masquerade-cidr": "0.0.0.0/0",
            "--pod-infra-container-image": "k8s.gcr.io/pause-amd64:3.1",
            "--pod-manifest-path": "/etc/kubernetes/manifests",
            "--pod-max-pids": "-1",
            "--read-only-port": "0",
            "--rotate-certificates": "true",
            "--streaming-connection-idle-timeout": "5m",
            "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
            "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
            "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key"
          },
          "cloudProviderBackoffMode": ""
        },
        "acceleratedNetworkingEnabled": true,
        "acceleratedNetworkingEnabledWindows": false,
        "vmssOverProvisioningEnabled": false,
        "auditDEnabled": false,
        "fqdn": "",
        "preProvisionExtension": null,
        "extensions": [],
        "singlePlacementGroup": true,
        "platformFaultDomainCount": null,
        "enableVMSSNodePublicIP": false
      },
      {
        "name": "static",
        "count": 1,
        "vmSize": "Standard_E16s_v3",
        "osDiskSizeGB": 128,
        "osType": "Linux",
        "availabilityProfile": "VirtualMachineScaleSets",
        "storageProfile": "ManagedDisks",
        "vnetSubnetID": "/subscriptions/***/resourceGroups/qa_004_QKNOWS_K8s/providers/Microsoft.Network/virtualNetworks/kubernetes-vnet/subnets/kubernetes-subnet",
        "distro": "ubuntu",
        "kubernetesConfig": {
          "kubeletConfig": {
            "--address": "0.0.0.0",
            "--anonymous-auth": "false",
            "--authentication-token-webhook": "true",
            "--authorization-mode": "Webhook",
            "--azure-container-registry-config": "/etc/kubernetes/azure.json",
            "--cgroups-per-qos": "true",
            "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
            "--cloud-config": "/etc/kubernetes/azure.json",
            "--cloud-provider": "azure",
            "--cluster-dns": "10.0.0.10",
            "--cluster-domain": "cluster.local",
            "--enforce-node-allocatable": "pods",
            "--event-qps": "0",
            "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
            "--feature-gates": "PodPriority=true,RotateKubeletServerCertificate=true",
            "--image-gc-high-threshold": "85",
            "--image-gc-low-threshold": "80",
            "--image-pull-progress-deadline": "30m",
            "--keep-terminated-pod-volumes": "false",
            "--kubeconfig": "/var/lib/kubelet/kubeconfig",
            "--max-pods": "110",
            "--network-plugin": "cni",
            "--node-status-update-frequency": "10s",
            "--non-masquerade-cidr": "0.0.0.0/0",
            "--pod-infra-container-image": "k8s.gcr.io/pause-amd64:3.1",
            "--pod-manifest-path": "/etc/kubernetes/manifests",
            "--pod-max-pids": "-1",
            "--read-only-port": "0",
            "--rotate-certificates": "true",
            "--streaming-connection-idle-timeout": "5m",
            "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
            "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
            "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key"
          },
          "cloudProviderBackoffMode": ""
        },
        "acceleratedNetworkingEnabled": true,
        "acceleratedNetworkingEnabledWindows": false,
        "vmssOverProvisioningEnabled": false,
        "auditDEnabled": false,
        "fqdn": "",
        "preProvisionExtension": null,
        "extensions": [],
        "singlePlacementGroup": true,
        "platformFaultDomainCount": null,
        "enableVMSSNodePublicIP": false
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "***"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "***",
      "secret": "***"
    },
    "certificateProfile": {
      "caCertificate": "***",
      "caPrivateKey": "***",
      "apiServerCertificate": "***",
      "apiServerPrivateKey": "***",
      "clientCertificate": "***",
      "clientPrivateKey": "***",
      "kubeConfigCertificate": "***",
      "kubeConfigPrivateKey": "***",
      "etcdServerCertificate": "***",
      "etcdServerPrivateKey": "***",
      "etcdClientCertificate": "***",
      "etcdClientPrivateKey": "***",
      "etcdPeerCertificates": [
        "***",
        "***",
        "***"
      ],
      "etcdPeerPrivateKeys": [
        "***",
        "***",
        "***"
      ]
    },
    "aadProfile": {
      "clientAppID": "***",
      "serverAppID": "***",
      "tenantID": "***"
    },
    "telemetryProfile": {
      "applicationInsightsKey": "***"
    }
  }
}

Upgrade this cluster using AKS-Engine 0.53.0 to 1.16.11 with command:

aks-engine upgrade --subscription-id --resource-group --location northeurope --api-model deployment-20191115_131752/arm-deploy/apimodel.json --upgrade-version 1.16.11 --auth-method client_secret --client-id --client-secret --debug

Produces the following error:

INFO[0612] Finished ARM Deployment (master-20-07-20T11.40.58-410958522). Error: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'cse-master-0'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=35\n[stdout]\nMon Jul 20 11:42:19 UTC 2020,k8s-master-11480702 0\n\n[stderr]\n\"\r\n\r\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot "\r\n }\r\n ]\r\n }\r\n}"}]
INFO[0612] Error creating upgraded master VM: k8s-master-11480702-0

Expected behavior
Cluster can be upgraded to 1.16.11 without errors.

AKS Engine version
0.53.0

Kubernetes version
1.16.4

Additional context
Looking at /var/log/azure/cluster-provision.log on the failing master node, it shows, that the hyperkube image could not be pulled:

  • timeout 1200 docker pull k8s.gcr.io/oss/kubernetes/hyperkube:v1.16.11
    Error response from daemon: manifest for k8s.gcr.io/oss/kubernetes/hyperkube:v1.16.11 not found: manifest unknown: Failed to fetch "v1.16.11" from request "/v2/oss/kubernetes/hyperkube/manifests/v1.16.11".
  • '[' 60 -eq 60 ']'
  • echo Executed '"docker' pull 'k8s.gcr.io/oss/kubernetes/hyperkube:v1.16.11"' 60 times
    Executed "docker pull k8s.gcr.io/oss/kubernetes/hyperkube:v1.16.11" 60 times
  • return 1
  • exit 35
@chreichert chreichert added the bug Something isn't working label Jul 20, 2020
@jackfrancis
Copy link
Member

Hi @chreichert, could you retry this upgrade, and make sure that these are the api model configuration values inside kubernetesConfig:

"kubernetesImageBase": "mcr.microsoft.com/",
"kubernetesImageBaseType": "mcr",

@chreichert
Copy link
Author

chreichert commented Jul 21, 2020

Thanks @jackfrancis, this helped. After modifying kubernetesImageBase and adding kubernetesImageBaseType in our apimodel.json as mentioned by you above, I could successfully upgrade our cluster to 1.16.11.

One thing to mention: kubernetes-dashboard was not cleanly reconciled during the upgrade. I ended up with two versions of the dashboard, old in namespace kube-system and new in namespace kubernetes-dashboard. I deleted the old deployment in namespace kube-system manually to clean things up. Hope that was enough to get rid of all old dashboard artefacts?

@jackfrancis
Copy link
Member

@chreichert Glad that got you through. This is a bug, btw, that I'll look into today. In the meanwhile you have a workaround :/

Correct about post-upgrade cleanup. Depending on the version-to-version path, and the initial cluster configuration, there may be leftover cruft, in your example you've observed dashboard. metrics-server, and other components may also need a nudge. You're doing the right thing to audit your cluster after upgrade, hopefully the set of things that need manual poking is consistent across your fleet of clusters and so that poking can be automated?

@chreichert
Copy link
Author

@jackfrancis We're currently on finding our way to upgrade to latest 1.17 or 1.18 by testing this with our test cluster, before doing our production cluster. This was the first step with going to latest 1.16. After testing our apps, I will continue with upgrading to 1.17.7 and so on. Still some manual steps involved, but its still manageable.
Thanks again for your valuable help, as always :-)

@jackfrancis
Copy link
Member

@chreichert are you able to paste the original values of kubernetesImageBase and kubernetesImageBaseType in your api model before you changed them? That will help to ensure that PR #3625 has the proper fixes so that manual step is not needed.

@chreichert
Copy link
Author

@jackfrancis This were the original settings before upgrade:

    "kubernetesImageBase": "k8s.gcr.io/",
    "mcrKubernetesImageBase": "mcr.microsoft.com/k8s/core/",

kubernetesImageBaseType was not present before.

You can find the full apimodel in the original post above (folded).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants