Control plane nodes failing to join when specific IP Address provided in etcd.local.extraArgs #1468
Labels
kind/design
Categorizes issue or PR as related to design.
kind/feature
Categorizes issue or PR as related to a new feature.
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
This issue appears to be the same as #1359 . Only found out when filling the title (sorry about that, I had looked for open issues before creating this, didn't think of look for closed ones), but in any case that issue is closed (not sure why).
On that issue, a suggestion is not to override etcd extraArgs but that doesn't work out if:
Versions
kubeadm version (use
kubeadm version
):kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Environment:
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration:
DigitalOCean, hw configuration not relevant for the issue
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
UBUNTU_CODENAME=bionic
Kernel (e.g.
uname -a
):Linux HOSTNAME 4.15.0-46-generic should kubeadm have an assets abstraction? #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Others:
What happened?
This issue happens when all below conditions are met:
extraArgs
that setetcd
related URLs (listen-client-urls
,advertise-client-urls
,listen-peer-urls
,initial-advertise-peer-urls
,listen-metrics-urls
) to a specific IP Address (notlocalhost
or0.0.0.0
, but the IP of the first master for instance)Under that scenario, local etcd for the joining masters will fail to start since it will attempt to bind an IP not belonging to the corresponding host.
What you expected to happen?
Each local etcd should use the appropiate IP for listening(one that actually belongs to the host the pod is running on) so that the binding doesn't fail and that extra control plane nodes can actually join the cluster.
This could something the user can configure (for instance by adding some etcd configuration to the
JoinConfiguration
) or maybe if not specified we could try to find the "best match" IP address given the address that was set in the master (configuration approach sounds much more reliable though)How to reproduce it (as minimally and precisely as possible)?
Given a scenario with 2 VMs, where:
Run:
kubeadm init --config config_principal.yaml
whereconfig_principal.yaml
refers to above yamlRun
kubeadm join --config config_extra_cp.yaml
whereconfig_extra_cp.yaml
refers to above yamlJoining the cluster should fail since etcd on the new joining control plane is not able to start (in fact, since we are creating an etcd cluster, our k8s cluster is completely down since etcd on the principal is waiting to its other member to come up)
Logs can be retrieved on the joining node:
Anything else we need to know?
I think the issue comes from
getEtcdCommand
function (https://github.com/kubernetes/kubernetes/blob/7dfcacd1cfcbdfe74b28f2473fb107e9a47ec905/cmd/kubeadm/app/phases/etcd/local.go#L179) whereClusterConfiguration
is used but there is a singleClusterConfiguration
that contains only the "principal" master IPs.This configuration is used when calling
kubeadmutil.BuildArgumentListFromMap
(https://github.com/kubernetes/kubernetes/blob/7dfcacd1cfcbdfe74b28f2473fb107e9a47ec905/cmd/kubeadm/app/phases/etcd/local.go#L212) which overrides default arguments with etcd.local.extraArgs arguments that contain such IP address
The text was updated successfully, but these errors were encountered: