kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

terrywang · 2019-03-27T10:54:39Z

Is this a BUG REPORT or FEATURE REQUEST?

Bug Report

Versions

kubeadm version (use kubeadm version):
v.1.14.0

Environment:

Kubernetes version (use kubectl version): v1.13.5
Cloud provider or hardware configuration: AWS EC2
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux k8s-node-1 4.4.0-143-generic Always enable RBAC so the cluster-info ConfigMap can be exposed #169-Ubuntu SMP Thu Feb 7 07:56:38 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Others: kubeadm provisioned single master k8s cluster (3 nodes), this cluster was created using kubeadm when k8s was at v1.9.0.

What happened?

Use kubeadm to ugrade the cluster, v1.13.4 to v1.13.5 was successful. To v1.14.0 failed becaues kubeadm upgrade plan pre-flight checks trying to connec to etcd using the node's private IP (assigned to NIC eth0) instead of the loopback address etcd is binding to.

Error

root@k8s-node-1:~# sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.100.21:2379: connect: connection refused

As per the kubeadm init workflow, single master k8s cluster etcd pod is created via static pod manifests. By looking at the manifest, etcd binds 127.0.0.1 and is not exposed to external world.

root@k8s-node-1:/etc/kubernetes/manifests# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://127.0.0.1:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://127.0.0.1:2380
    - --initial-cluster=k8s-node-1=https://127.0.0.1:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379
    - --listen-peer-urls=https://127.0.0.1:2380
    - --name=k8s-node-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd
    resources: {}
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
status: {}

What you expected to happen?

kubeadm upgrade plan should work as expected to output the details, just like v1.13.4 to v.1.3.5.

ubuntu@k8s-node-1:~$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.4
[upgrade/versions] kubeadm version: v1.13.5
I0327 09:52:14.655319   12224 version.go:237] remote version is much newer: v1.14.0; falling back to: stable-1.13
[upgrade/versions] Latest stable version: v1.13.5
[upgrade/versions] Latest version in the v1.13 series: v1.13.5

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     2 x v1.13.4   v1.13.5
            1 x v1.13.5   v1.13.5

Upgrade to the latest version in the v1.13 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.13.4   v1.13.5
Controller Manager   v1.13.4   v1.13.5
Scheduler            v1.13.4   v1.13.5
Kube Proxy           v1.13.4   v1.13.5
CoreDNS              1.2.6     1.2.6
Etcd                 3.2.24    3.2.24

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.13.5

_____________________________________________________________________

ubuntu@k8s-node-1:~$ sudo kubeadm upgrade apply v1.13.5

How to reproduce it (as minimally and precisely as possible)?

Follow the upgrade guide, upgrade any v.1.13.x cluster (created using kubeadm) to v1.14.0.

I've tried to change the bind address but it has so many dependencies that breaks more than it fixes. Also tried to expose the pod as NodePort service, tried using iptables rules to forward traffic destined to the IP address (192.168.100.12 in this case) port 2379 to loopback with no luck.

Is there a way to override the etcd endpoint when running kubeadm upgrade plan that'll be the easiest solution.

Anything else we need to know?

Hmm...

The text was updated successfully, but these errors were encountered:

neolit123 · 2019-03-27T13:15:15Z

thanks for the report.
i will try to reproduce your problem.

neolit123 · 2019-03-27T13:45:25Z

$ kubectl get nodes
NAME         STATUS   ROLES    AGE    VERSION
luboitvbox   Ready    master   6m9s   v1.13.5

$ kubeadm version --output=short
v1.14.0

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.5
[upgrade/versions] kubeadm version: v1.14.0

Awesome, you're up-to-date! Enjoy!

here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.

but my etcd manifest looks like this:

$ sudo cat /etc/kubernetes/manifests/etcd.yaml
...
containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.0.102:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.0.102:2380
    - --initial-cluster=luboitvbox=https://192.168.0.102:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.0.102:2379
    - --listen-peer-urls=https://192.168.0.102:2380
    - --name=luboitvbox
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
...

did you happen to create this cluster using 1.12 before upgrading to 1.13?

i remember that we did some changes to the etcd addresses related to HA setups.
try making your manifest like the above.

$ sudo kubeadm upgrade apply v1.14.0
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.0". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

...
# (upgrade kubelet)
$ sudo systemctl restart kubelet
$ kubectl get nodes
NAME         STATUS   ROLES    AGE   VERSION
luboitvbox   Ready    master   23m   v1.14.0

the upgrade worked for me.

neolit123 · 2019-03-27T13:57:30Z

here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.

logged:
#1470

terrywang · 2019-03-28T02:06:33Z

@neolit123 Thanks for the comments. The cluster was initially created using kubeadm v1.9.x. Later on rebuilt, definitely used version before v.1.12.0. No wonder the etcd static pod manifest is different.

What exactly has changed for etcd manifest (since v1.12.0)? Tried to search for that but no luck.

I'll try to generate new static pod manifests using latest version from a different machine and see if I can figure it out (also the dependencies).

neolit123 · 2019-03-28T10:41:43Z

@terrywang
it was done here so that we can properly support stacked etcd members in an HA setup:
kubernetes/kubernetes#69486

more details here:
#1123

that said i think we had a way to handle this type of upgrade transparently between 1.12 and 1.13, so your 1.13 etcd manifest should have been auto-converted to use the network interface address. possibly something went wrong in the process, but also this is the first report we are seeing related to this.

please let me know if you remember anything like modifying the etcd manifests manually, which could have broke our 1.12->1.13 logic.

neolit123 · 2019-03-28T18:32:46Z

closing in favor of: #1471

terrywang · 2019-04-01T00:58:29Z

@neolit123 Thanks again for the info. Good to know.

I've regenerated the static pod manifests using latest version of kubeadm to run the phase on a different VM, compared the differences and made necessary changes to the one in my cluster.

NOTE: in my case - --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.21:2379.

However, running kubeadm upgrade plan after etcd pod was started, I got the following error:

root@k8s-node-1:~# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded

Will follow #1471 to regenerate the certificates for etcd when I have time.

vdboor · 2019-04-02T12:02:40Z

Thanks terry for the info, I had the same problem. Also following #1471

My original cluster originated from Kubernetes 1.8, and rebuild during the 1.11 upgrade because broke everything. My etcd also listens to localhost only:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://127.0.0.1:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://127.0.0.1:2380
    - --initial-cluster=phenomenal.edoburu.nl=https://127.0.0.1:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379
    - --listen-peer-urls=https://127.0.0.1:2380
    - --name=phenomenal.edoburu.nl
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd
    resources: {}
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
status: {}

jeanfabrice · 2019-04-03T22:04:25Z

+1
Same problem here, while trying to upgrade from 1.13.4 to 1.14.0:

# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.10.2:2379: connect: connection refused
#

neolit123 · 2019-04-03T22:06:16Z

fix should be up in 1.14.1
(to be released soon)

terrywang · 2019-04-23T08:42:56Z

Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...

Luckily, simply by following the steps in #1471 mentioned by @mauilion I was able to leverage kubeadm phase (etcd-server and etcd) regenerate etcd TLS certificate to cover k8s node IP, reconfigure etcd with new listen-client-urls and start etcd, subsequently run the kubeadm upgrade plan.

The reason why kubeadm upgrade plan failed with the following error was because of the etcd server TLS certificate's SAN did not cover the k8s node's (on which etcd was running) IP, it simply failed to start.

[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded

The certificate SAN should look like below

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Alternative Name:
                DNS:k8s-node-1, DNS:localhost, IP Address:192.168.100.21, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:10.192.0.2

neolit123 · 2019-04-23T12:42:28Z

Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...

hm, it should have. the PR that @fabriziopandini created was merged and tested by at least a couple of people.

The certificate SAN should look like below

and your existing cert was missing 192.168.100.21?

terrywang · 2019-04-23T13:02:11Z

Yes, existing etcd server certificate SAN was missing 192.168.100.21.

I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.

Anyway, the problem is well solved.

Really appreciate your input and assistance, enjoyed the learning experience ;-)

neolit123 · 2019-04-23T13:06:08Z

I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.

yes, that may be the cause.

neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/upgrades help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 27, 2019

neolit123 mentioned this issue Mar 27, 2019

1.14: "upgrade plan" does not work correctly #1470

Closed

neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 28, 2019

neolit123 mentioned this issue Mar 28, 2019

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

Closed

neolit123 closed this as completed Mar 28, 2019

neolit123 mentioned this issue Apr 3, 2019

Issue with k8s.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-14/ kubernetes/website#13597

Closed

runam0K mentioned this issue Apr 4, 2019

kubeadm : fix-kubeadm-upgrade-12-13-14 kubernetes/kubernetes#75956

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

terrywang commented Mar 27, 2019 •

edited

Loading

neolit123 commented Mar 27, 2019

neolit123 commented Mar 27, 2019

neolit123 commented Mar 27, 2019

terrywang commented Mar 28, 2019

neolit123 commented Mar 28, 2019

neolit123 commented Mar 28, 2019

terrywang commented Apr 1, 2019 •

edited

Loading

vdboor commented Apr 2, 2019

jeanfabrice commented Apr 3, 2019

neolit123 commented Apr 3, 2019

terrywang commented Apr 23, 2019 •

edited

Loading

neolit123 commented Apr 23, 2019

terrywang commented Apr 23, 2019

neolit123 commented Apr 23, 2019

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

Comments

terrywang commented Mar 27, 2019 • edited Loading

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

neolit123 commented Mar 27, 2019

neolit123 commented Mar 27, 2019

neolit123 commented Mar 27, 2019

terrywang commented Mar 28, 2019

neolit123 commented Mar 28, 2019

neolit123 commented Mar 28, 2019

terrywang commented Apr 1, 2019 • edited Loading

vdboor commented Apr 2, 2019

jeanfabrice commented Apr 3, 2019

neolit123 commented Apr 3, 2019

terrywang commented Apr 23, 2019 • edited Loading

neolit123 commented Apr 23, 2019

terrywang commented Apr 23, 2019

neolit123 commented Apr 23, 2019

terrywang commented Mar 27, 2019 •

edited

Loading

terrywang commented Apr 1, 2019 •

edited

Loading

terrywang commented Apr 23, 2019 •

edited

Loading