Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.15 - kubeadm join --control-plane fails on clusters created with <= 1.12 #1950

Closed
blurpy opened this issue Dec 2, 2019 · 26 comments
Closed
Labels
area/HA help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@blurpy
Copy link

blurpy commented Dec 2, 2019

Versions

kubeadm version (use kubeadm version): v1.15.6

Environment: Dev

  • Kubernetes version (use kubectl version): v1.15.6
  • Cloud provider or hardware configuration: Virtualbox
  • OS (e.g. from /etc/os-release): CentOS 7.7
  • Kernel (e.g. uname -a): 3.10.0-957.1.3.el7.x86_64

What happened?

I have several clusters created with kubeadm v1.10 to v1.12, that have been upgraded along the way. Currently on 1.14 and 1.15. I'm experimenting with adding more masters to setup HA. Adding masters on clusters created with kubeadm 1.15 is working fine, but when adding masters to older clusters upgraded to 1.15 it fails waiting for etcd nodes to join.

This is a continuation of #1269, which doesn't seem to be properly resolved.
The original issue relates to etcd not listening on a host port, so it's not possible for the new node to connect. That was fixed. However, the etcd member list seems to be untouched, so it looks as follows:

/ # export ETCDCTL_API=3
/ # etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list -w table
+-----------------+---------+-----------------+------------------------+----------------------------+
|       ID        | STATUS  |      NAME       |       PEER ADDRS       |        CLIENT ADDRS        |
+-----------------+---------+-----------------+------------------------+----------------------------+
| a874c87fd42044f | started | demomaster1test | https://127.0.0.1:2380 | https://192.168.33.10:2379 |
+-----------------+---------+-----------------+------------------------+----------------------------+

First master: demomaster1test (192.168.33.10).
Second master: demomaster2test (192.168.33.20). (To be added)

From the join on the second control plane node we can see that it successfully adds the second etcd member to the cluster using the correct address, then receives a member list with the localhost address of the first member, and then eventually times out:

[root@demomaster2test ~]# kubeadm join --v 5 --discovery-token ... --discovery-token-ca-cert-hash sha256:... --certificate-key ... --control-plane --apiserver-bind-port 443 demomaster1test:443
...
[check-etcd] Checking that the etcd cluster is healthy
I1202 10:53:45.999198    7391 local.go:66] [etcd] Checking etcd cluster health
I1202 10:53:45.999206    7391 local.go:69] creating etcd client that connects to etcd pods
I1202 10:53:46.009155    7391 etcd.go:106] etcd endpoints read from pods: https://192.168.33.10:2379
I1202 10:53:46.019954    7391 etcd.go:147] etcd endpoints read from etcd: https://192.168.33.10:2379
I1202 10:53:46.020014    7391 etcd.go:124] update etcd endpoints: https://192.168.33.10:2379
I1202 10:53:46.038590    7391 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I1202 10:53:46.094663    7391 kubelet.go:131] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
I1202 10:53:46.157154    7391 kubelet.go:148] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1202 10:53:46.752724    7391 kubelet.go:166] [kubelet-start] preserving the crisocket information for the node
I1202 10:53:46.752743    7391 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "demomaster2test" as an annotation
I1202 10:54:07.768179    7391 local.go:118] creating etcd client that connects to etcd pods
I1202 10:54:07.773539    7391 etcd.go:106] etcd endpoints read from pods: https://192.168.33.10:2379
I1202 10:54:07.785011    7391 etcd.go:147] etcd endpoints read from etcd: https://192.168.33.10:2379
I1202 10:54:07.785033    7391 etcd.go:124] update etcd endpoints: https://192.168.33.10:2379
I1202 10:54:07.785094    7391 local.go:127] Adding etcd member: https://192.168.33.20:2380
[etcd] Announced new etcd member joining to the existing etcd cluster
I1202 10:54:07.822398    7391 local.go:133] Updated etcd member list: [{demomaster1test https://127.0.0.1:2380} {demomaster2test https://192.168.33.20:2380}]
I1202 10:54:07.822411    7391 local.go:135] Creating local etcd static pod manifest file
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
I1202 10:54:07.823387    7391 etcd.go:351] [etcd] attempting to see if all cluster endpoints ([https://192.168.33.10:2379 https://192.168.33.20:2379]) are available 1/8
I1202 10:54:12.849166    7391 etcd.go:356] [etcd] Attempt timed out
I1202 10:54:12.849182    7391 etcd.go:348] [etcd] Waiting 5s until next retry
I1202 10:54:17.849270    7391 etcd.go:351] [etcd] attempting to see if all cluster endpoints ([https://192.168.33.10:2379 https://192.168.33.20:2379]) are available 2/8
I1202 10:54:22.882184    7391 etcd.go:356] [etcd] Attempt timed out
I1202 10:54:22.882199    7391 etcd.go:348] [etcd] Waiting 5s until next retry
[kubelet-check] Initial timeout of 40s passed.
...
I1202 10:55:13.089881    7391 etcd.go:356] [etcd] Attempt timed out
I1202 10:55:13.089899    7391 etcd.go:348] [etcd] Waiting 5s until next retry
I1202 10:55:18.090404    7391 etcd.go:351] [etcd] attempting to see if all cluster endpoints ([https://192.168.33.10:2379 https://192.168.33.20:2379]) are available 8/8
I1202 10:55:23.110043    7391 etcd.go:356] [etcd] Attempt timed out
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

From the logs of the first etcd we can see the second etcd joining, then the first etcd starts leader election, not getting contact with the second etcd, and then shutting down:

2019-11-29 13:37:10.254481 I | etcdserver/membership: added member da895d82fb090550 [https://192.168.33.20:2380] to cluster c9be114fc2da2776
2019-11-29 13:37:10.254516 I | rafthttp: starting peer da895d82fb090550...
2019-11-29 13:37:10.254535 I | rafthttp: started HTTP pipelining with peer da895d82fb090550
2019-11-29 13:37:10.258325 I | rafthttp: started peer da895d82fb090550
2019-11-29 13:37:10.258475 I | rafthttp: added peer da895d82fb090550
...
2019-11-29 13:37:12.164521 W | raft: a874c87fd42044f stepped down to follower since quorum is not active
...
2019-11-29 13:37:15.259627 W | rafthttp: health check for peer da895d82fb090550 could not connect: dial tcp 192.168.33.20:2380: connect: connection refused (prober "ROUND_TRIPPER_SNAPSHOT")
...
2019-11-29 13:38:27.923943 N | pkg/osutil: received terminated signal, shutting down...

On the second etcd we get this:

{"log":"2019-11-29 13:40:36.795497 I | etcdmain: etcd Version: 3.3.10\n","stream":"stderr","time":"2019-11-29T13:40:36.79609435Z"}
{"log":"2019-11-29 13:40:36.795961 I | etcdmain: Git SHA: 27fc7e2\n","stream":"stderr","time":"2019-11-29T13:40:36.796147993Z"}
{"log":"2019-11-29 13:40:36.795967 I | etcdmain: Go Version: go1.10.4\n","stream":"stderr","time":"2019-11-29T13:40:36.796154842Z"}
{"log":"2019-11-29 13:40:36.795970 I | etcdmain: Go OS/Arch: linux/amd64\n","stream":"stderr","time":"2019-11-29T13:40:36.796158628Z"}
{"log":"2019-11-29 13:40:36.795973 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2\n","stream":"stderr","time":"2019-11-29T13:40:36.796161734Z"}
{"log":"2019-11-29 13:40:36.796401 N | etcdmain: the server is already initialized as member before, starting as etcd member...\n","stream":"stderr","time":"2019-11-29T13:40:36.796559343Z"}
{"log":"2019-11-29 13:40:36.796454 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = \n","stream":"stderr","time":"2019-11-29T13:40:36.796579176Z"}
{"log":"2019-11-29 13:40:36.797542 I | embed: listening for peers on https://192.168.33.20:2380\n","stream":"stderr","time":"2019-11-29T13:40:36.797652533Z"}
{"log":"2019-11-29 13:40:36.797609 I | embed: listening for client requests on 127.0.0.1:2379\n","stream":"stderr","time":"2019-11-29T13:40:36.79773103Z"}
{"log":"2019-11-29 13:40:36.797710 I | embed: listening for client requests on 192.168.33.20:2379\n","stream":"stderr","time":"2019-11-29T13:40:36.797769883Z"}
{"log":"2019-11-29 13:40:36.799843 W | etcdserver: could not get cluster response from https://127.0.0.1:2380: Get https://127.0.0.1:2380/members: dial tcp 127.0.0.1:2380: connect: connection refused\n","stream":"stderr","time":"2019-11-29T13:40:36.799928537Z"}
{"log":"2019-11-29 13:40:36.800480 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls\n","stream":"stderr","time":"2019-11-29T13:40:36.800579653Z"}

The second etcd keeps trying to connect to the first etcd on localhost.

What we can see from the generated etcd.yaml manifest on the second master is this:

        - etcd
        - --advertise-client-urls=https://192.168.33.20:2379
        - --cert-file=/etc/kubernetes/pki/etcd/server.crt
        - --client-cert-auth=true
        - --data-dir=/var/lib/etcd
        - --initial-advertise-peer-urls=https://192.168.33.20:2380
        - --initial-cluster=demomaster1test=https://127.0.0.1:2380,demomaster2test=https://192.168.33.20:2380
        - --initial-cluster-state=existing
        - --key-file=/etc/kubernetes/pki/etcd/server.key
        - --listen-client-urls=https://127.0.0.1:2379,https://192.168.33.20:2379
        - --listen-peer-urls=https://192.168.33.20:2380
        - --name=demomaster2test

It's configured demomaster1test at https://127.0.0.1:2380, which results in "connection refused" as we can see from the logs. Trying to change that value to https://192.168.33.10:2380 results in the following in the logs instead:

{"log":"2019-11-29 14:05:40.168103 I | etcdmain: etcd Version: 3.3.10\n","stream":"stderr","time":"2019-11-29T14:05:40.173075927Z"}
{"log":"2019-11-29 14:05:40.168157 I | etcdmain: Git SHA: 27fc7e2\n","stream":"stderr","time":"2019-11-29T14:05:40.173107421Z"}
{"log":"2019-11-29 14:05:40.168161 I | etcdmain: Go Version: go1.10.4\n","stream":"stderr","time":"2019-11-29T14:05:40.173111048Z"}
{"log":"2019-11-29 14:05:40.168163 I | etcdmain: Go OS/Arch: linux/amd64\n","stream":"stderr","time":"2019-11-29T14:05:40.17311336Z"}
{"log":"2019-11-29 14:05:40.168166 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2\n","stream":"stderr","time":"2019-11-29T14:05:40.173115555Z"}
{"log":"2019-11-29 14:05:40.168203 N | etcdmain: the server is already initialized as member before, starting as etcd member...\n","stream":"stderr","time":"2019-11-29T14:05:40.173117812Z"}
{"log":"2019-11-29 14:05:40.168377 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = \n","stream":"stderr","time":"2019-11-29T14:05:40.173121239Z"}
{"log":"2019-11-29 14:05:40.169662 I | embed: listening for peers on https://192.168.33.20:2380\n","stream":"stderr","time":"2019-11-29T14:05:40.173123704Z"}
{"log":"2019-11-29 14:05:40.169707 I | embed: listening for client requests on 127.0.0.1:2379\n","stream":"stderr","time":"2019-11-29T14:05:40.173125763Z"}
{"log":"2019-11-29 14:05:40.169732 I | embed: listening for client requests on 192.168.33.20:2379\n","stream":"stderr","time":"2019-11-29T14:05:40.173127846Z"}
{"log":"2019-11-29 14:05:40.195800 C | etcdmain: error validating peerURLs {ClusterID:c9be114fc2da2776 Members:[\u0026{ID:4654b06da302d871 RaftAttributes:{PeerURLs:[https://192.168.33.20:2380]} Attributes:{Name: ClientURLs:[]}} \u0026{ID:a874c87fd42044f RaftAttributes:{PeerURLs:[https://127.0.0.1:2380]} Attributes:{Name:demomaster1test ClientURLs:[https://192.168.33.10:2379]}}] RemovedMemberIDs:[]}: unmatched member while checking PeerURLs (\"https://127.0.0.1:2380\"(resolved from \"https://127.0.0.1:2380\") != \"https://192.168.33.10:2380\"(resolved from \"https://192.168.33.10:2380\"))\n","stream":"stderr","time":"2019-11-29T14:05:40.195988804Z"}

The configuration of the address in the manifest doesn't match the member list and it aborts.
The result in any case is that etcd on both control plane nodes shut down, and the apiserver is unavailable as a consequence, bricking the entire cluster.

A possible fix is to change the etcd member peer address before adding a second master, like this:

/ # etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member update a874c87fd42044f --peer-urls=https://192.168.33.10:2380
Member  a874c87fd42044f updated in cluster c9be114fc2da2776

/ # etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list -w table
+-----------------+---------+-----------------+----------------------------+----------------------------+
|       ID        | STATUS  |      NAME       |         PEER ADDRS         |        CLIENT ADDRS        |
+-----------------+---------+-----------------+----------------------------+----------------------------+
| a874c87fd42044f | started | demomaster1test | https://192.168.33.10:2380 | https://192.168.33.10:2379 |
+-----------------+---------+-----------------+----------------------------+----------------------------+

After doing so I was able to add a second master.

What you expected to happen?

The peer address of the first etcd should have been updated to host ip either as part of an etcd upgrade or when adding the second control plane node.

How to reproduce it (as minimally and precisely as possible)?

Adapted from the instructions at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

  1. Find or setup a 1.12 cluster.
  2. Upgrade all the way to 1.15.
  3. Add controlPlaneEndpoint with ip and port of a load balancer to kubeadm-config-file and upload to configmap in kube-system.
  4. Recreate apiserver certificates so they include load balancer ip.
  5. Restart apiserver.
  6. Upload certificates to kube-system secret.
  7. Join a second control plane node.
  8. Watch your cluster go down, never to recover again (?).

Anything else we need to know?

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

/kind bug
/area HA

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/HA labels Dec 2, 2019
@neolit123
Copy link
Member

neolit123 commented Dec 2, 2019

@blurpy

FYI, 1.12 is no longer supported by the kubeadm team, which forces me to close the issue, but we can continue the discussion.

/close

with the release of 1.17 you need to have at least 1.15 to be in the support skew.

This is a continuation of #1269, which doesn't seem to be properly resolved.

did you try the workaround here: #1269

also instead of joining before upgrade, did you try upgrading to 1.13 and then joining a new member?

ping @fabriziopandini

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

@blurpy

FYI, 1.12 is no longer supported by the kubeadm team, which forces me to close the issue, but we can continue the discussion.

/close

with the release of 1.17 you need to have at least 1.15.

This is a continuation of #1269, which doesn't seem to be properly resolved.

did you try the workaround here: #1269

also instead of joining before upgrade, did you try upgrading to 1.13 and then joining a new member?

ping @fabriziopandini

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

@neolit123 This bug report is about adding more control plane nodes in a 1.15.6 cluster, initially created by kubeadm 1.12 or older, and upgraded continually. That should be within the support window right?

As I noted, there is a workaround (changing the peer url by hand), but I'm thinking that kubeadm should handle it more transparently, since there's probably a lot more clusters like ours out there.

@neolit123
Copy link
Member

neolit123 commented Dec 2, 2019

@neolit123 This bug report is about adding more control plane nodes in a 1.15.6 cluster, initially created by kubeadm 1.12 or older, and upgraded continually. That should be within the support window right?

looks like i misunderstood that part.

As I noted, there is a workaround (changing the peer url by hand), but I'm thinking that kubeadm should handle it more transparently, since there's probably a lot more clusters like ours out there.

this is the second report we got about this since 1.12.

for the first one this is what we've added in the docs:
https://github.com/kubernetes/website/pull/12145/files

IIRC, kubeadm upgrade had a special case to fix this in 1.13 (which for some reason might have not worked in your case). @fabriziopandini should confirm if he remembers.

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

There are no instructions on how to setup HA for an existing cluster, and HA is only beta in 1.15, so there might be more than me having this issue eventually, if instructions are posted on the website. I'm trying to make my own experience easier for the time when I get to upgrading production.

I'm not sure what was done in the last round, but I think it was only making etcd listen on host ip, not adjusting the member address.

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

Related #1471

@neolit123
Copy link
Member

neolit123 commented Dec 2, 2019

There are no instructions on how to setup HA for an existing cluster, and HA is only beta in 1.15, so there might be more than me having this issue eventually, if instructions are posted on the website. I'm trying to make my own experience easier for the time when I get to upgrading production.

we did a survey which suggested that the kubeadm user base tries to stay in the support skew - i.e. upgrade much faster.

Related #1471

looks like a lot more affected users were present there.

@neolit123
Copy link
Member

I'm not sure what was done in the last round, but I think it was only making etcd listen on host ip, not adjusting the member address.

you are suggesting that https://github.com/kubernetes/kubernetes/pull/75956/files was not sufficient?

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

we did a survey which suggested that the kubeadm user base tries to stay in the support skew - i.e. upgrade much faster.

Note that we are not upgrading from 1.12 to 1.15 right now, our clusters are almost 600 days old and been upgraded every few months. We try our best to be within the support period. But we have been waiting for a time where HA is more mature before upgrading our single master to multi master.

you are suggesting that https://github.com/kubernetes/kubernetes/pull/75956/files was not sufficient?

My go-reading-skills aren't the best, but to me it looks like it's all related to certificates. And that part works fine. But it's only half a solution for adding more masters, since the inital etcd node thinks it's listening on localhost only and tells that to the second etcd joining.

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

I've taken a look at the state before and after the upgrade from 1.13 to 1.14.

In 1.13 the etcd manifest looked like this:

    - command:
        - etcd
        - --advertise-client-urls=https://127.0.0.1:2379
        - --cert-file=/etc/kubernetes/pki/etcd/server.crt
        - --client-cert-auth=true
        - --data-dir=/var/lib/etcd
        - --initial-advertise-peer-urls=https://127.0.0.1:2380
        - --initial-cluster=demomaster1test=https://127.0.0.1:2380
        - --key-file=/etc/kubernetes/pki/etcd/server.key
        - --listen-client-urls=https://127.0.0.1:2379
        - --listen-peer-urls=https://127.0.0.1:2380
        - --name=demomaster1test
        - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
        - --peer-client-cert-auth=true
        - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
        - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        - --snapshot-count=10000
        - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      image: k8s.gcr.io/etcd:3.2.24

And it was updated to this in 1.14 (and still like this in 1.15):

    - command:
        - etcd
        - --advertise-client-urls=https://192.168.33.10:2379
        - --cert-file=/etc/kubernetes/pki/etcd/server.crt
        - --client-cert-auth=true
        - --data-dir=/var/lib/etcd
        - --initial-advertise-peer-urls=https://192.168.33.10:2380
        - --initial-cluster=demomaster1test=https://192.168.33.10:2380
        - --key-file=/etc/kubernetes/pki/etcd/server.key
        - --listen-client-urls=https://127.0.0.1:2379,https://192.168.33.10:2379
        - --listen-peer-urls=https://192.168.33.10:2380
        - --name=demomaster1test
        - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
        - --peer-client-cert-auth=true
        - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
        - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        - --snapshot-count=10000
        - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      image: k8s.gcr.io/etcd:3.3.10

So it fixed the --listen-peer-urls parameter among others in the manifest, but it still looked like this in the member list:

[root@demomaster1test ~]# kubectl -n kube-system exec -ti etcd-demomaster1test sh
/ # export ETCDCTL_API=3
/ # etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list -w table
+-----------------+---------+-----------------+------------------------+----------------------------+
|       ID        | STATUS  |      NAME       |       PEER ADDRS       |        CLIENT ADDRS        |
+-----------------+---------+-----------------+------------------------+----------------------------+
| a874c87fd42044f | started | demomaster1test | https://127.0.0.1:2380 | https://192.168.33.10:2379 |
+-----------------+---------+-----------------+------------------------+----------------------------+

And the "peer addrs" there seems to be what's being used later on, when joining more master nodes.

@neolit123
Copy link
Member

So it fixed the --listen-peer-urls parameter among others in the manifest, but it still looked like this in the member list:

this seems to me as if the etcd pod was not restarted. odd.

@blurpy
Copy link
Author

blurpy commented Dec 2, 2019

I think it's a persisted configuration, so it's only read from the manifest during initial setup of etcd. Changing it requires running commands with etcdctl or similar, like kubeadm does when adding a second node to the etcd cluster. That's why I was thinking that it could possibly be fixed automatically by kubeadm when it's already in the process of making changes to etcd state.

@neolit123
Copy link
Member

changes in the etcd static pod manifest would trigger a pod restart, which is an etcd restart with a new argument for --listen-peer-urls. i would expect etcd to not persist this configuration and register the new peer URL.

are you sure this is persisted? i don't see how - something is not right here.

@neolit123 neolit123 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 2, 2019
@neolit123 neolit123 reopened this Dec 2, 2019
@neolit123 neolit123 added this to the v1.18 milestone Dec 2, 2019
@blurpy
Copy link
Author

blurpy commented Dec 3, 2019

I did some tests:

  1. Reboot master. No changes.
  2. Kill etcd pod. No changes.
  3. Kill etcd container. No changes.

Then had a look at the etcd documentation, where I found what I was suspecting:

Why doesn’t changing --listen-peer-urls or --initial-advertise-peer-urls update the advertised peer URLs in etcdctl member list?
A member’s advertised peer URLs come from --initial-advertise-peer-urls on initial cluster boot. Changing the listen peer URLs or the initial advertise peers after booting the member won’t affect the exported advertise peer URLs since changes must go through quorum to avoid membership configuration split brain. Use etcdctl member update to update a member’s peer URLs.

The etcdctl member update command works, as noted in my initial report, but nothing else seems to have any effect.

@blurpy
Copy link
Author

blurpy commented Dec 3, 2019

Here is a log from joining a second control plane node after using etcdctl member update to change PEER ADDRS from https://127.0.0.1:2380 to https://192.168.33.10:2380:

[check-etcd] Checking that the etcd cluster is healthy
I1203 07:06:46.761803    7273 local.go:66] [etcd] Checking etcd cluster health
I1203 07:06:46.761810    7273 local.go:69] creating etcd client that connects to etcd pods
I1203 07:06:46.771371    7273 etcd.go:106] etcd endpoints read from pods: https://192.168.33.10:2379
I1203 07:06:46.779614    7273 etcd.go:147] etcd endpoints read from etcd: https://192.168.33.10:2379
I1203 07:06:46.779633    7273 etcd.go:124] update etcd endpoints: https://192.168.33.10:2379
I1203 07:06:46.794477    7273 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I1203 07:06:46.851379    7273 kubelet.go:131] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
I1203 07:06:46.915773    7273 kubelet.go:148] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1203 07:06:47.505223    7273 kubelet.go:166] [kubelet-start] preserving the crisocket information for the node
I1203 07:06:47.505243    7273 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "demomaster2test" as an annotation
I1203 07:07:08.019626    7273 local.go:118] creating etcd client that connects to etcd pods
I1203 07:07:08.031476    7273 etcd.go:106] etcd endpoints read from pods: https://192.168.33.10:2379
I1203 07:07:08.045314    7273 etcd.go:147] etcd endpoints read from etcd: https://192.168.33.10:2379
I1203 07:07:08.045334    7273 etcd.go:124] update etcd endpoints: https://192.168.33.10:2379
I1203 07:07:08.045341    7273 local.go:127] Adding etcd member: https://192.168.33.20:2380
[etcd] Announced new etcd member joining to the existing etcd cluster
I1203 07:07:08.108883    7273 local.go:133] Updated etcd member list: [{demomaster1test https://192.168.33.10:2380} {demomaster2test https://192.168.33.20:2380}]
I1203 07:07:08.108897    7273 local.go:135] Creating local etcd static pod manifest file
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
I1203 07:07:08.110046    7273 etcd.go:351] [etcd] attempting to see if all cluster endpoints ([https://192.168.33.10:2379 https://192.168.33.20:2379]) are available 1/8
I1203 07:07:13.135666    7273 etcd.go:356] [etcd] Attempt timed out
I1203 07:07:13.135684    7273 etcd.go:348] [etcd] Waiting 5s until next retry
I1203 07:07:18.137146    7273 etcd.go:351] [etcd] attempting to see if all cluster endpoints ([https://192.168.33.10:2379 https://192.168.33.20:2379]) are available 2/8
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node demomaster2test as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node demomaster2test as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created

With the exception of this completing successfully, there is only one difference in the logs between the original attempt that timed out and failed, and this last one that succeeded.

From the original:

I1202 10:54:07.822398 7391 local.go:133] Updated etcd member list: [{demomaster1test https://127.0.0.1:2380} {demomaster2test https://192.168.33.20:2380}]

From this last one:

I1203 07:07:08.108883 7273 local.go:133] Updated etcd member list: [{demomaster1test https://192.168.33.10:2380} {demomaster2test https://192.168.33.20:2380}]

I've also compared logs of etcd, before and after changing PEER ADDRS using etcdctl (and restarting etcd), and it looks identical. Relevant output:

2019-12-03 06:50:02.366715 I | embed: listening for peers on https://192.168.33.10:2380
2019-12-03 06:50:02.366769 I | embed: listening for client requests on 127.0.0.1:2379
2019-12-03 06:50:02.366793 I | embed: listening for client requests on 192.168.33.10:2379
2019-12-03 06:50:02.480507 I | etcdserver: advertise client URLs = https://192.168.33.10:2379

To summarize how it looks to me:

  1. Changing --listen-peer-urls in the manifest changes the address etcd uses for listening to connections.
  2. Changing PEER ADDRS using etcdctl changes the address etcd tells other clients to use for connecting to it.

@neolit123
Copy link
Member

Then had a look at the etcd documentation, where I found what I was suspecting:

^
@fabriziopandini this might interest you.

@SataQiu
Copy link
Member

SataQiu commented Dec 11, 2019

I have sent a PR for this. Not sure if it's right.
PTAL @neolit123 @fabriziopandini @blurpy

@neolit123 neolit123 added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 31, 2019
@neolit123
Copy link
Member

going back to this issue and the PR kubernetes/kubernetes#86150
i would much rather have a WARNING in the "kubeadm upgrade" docs that explain how to use etcdctl to update a member that came from 1.12.

otherwise we have to backport the PR to all branches in the current support skew 1.15, 1.16, 1.17.
and we cannot backport to versions older than that because they are out of support.

aside from this bug, i don't think MemberUpdate() is currently needed and for upgrades it can be considered reconfiguration, which is something upgrades should not do.

@neolit123
Copy link
Member

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 20, 2020
@neolit123 neolit123 removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Feb 12, 2020
@neolit123 neolit123 modified the milestones: v1.18, v1.19 Mar 8, 2020
@sfgroups-k8s
Copy link

I am also having same issue in CentOS 8, Containerd setting up new cluster. first master works, when I add second master etcd pod dies in first master.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2020
@neolit123 neolit123 modified the milestones: v1.19, v1.20 Jul 27, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2020
@neolit123
Copy link
Member

@blurpy

hi, please send a PR to the kubeadm troubleshooting guide with some steps of how to recover from this:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/https://github.com/kubernetes/website/blob/master/content/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md
(decided that not including this in the upgrade page might be better)

you know what is best here and i don't have the bandwidth to reproduce this and document what has to be done.

thank you.

@neolit123 neolit123 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 3, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/HA help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants