Skip to content
This repository has been archived by the owner on Oct 10, 2023. It is now read-only.

IPv6: Some host network pods get pod IP set to kube-vip control plane endpoint address #2098

Closed
mcwumbly opened this issue Apr 11, 2022 · 0 comments · Fixed by #2103
Closed
Labels
kind/bug PR/Issue related to a bug needs-triage Indicates an issue or PR needs to be triaged

Comments

@mcwumbly
Copy link
Contributor

mcwumbly commented Apr 11, 2022

Bug description

Follow up to #1480 and #1964

On IPv6 clusters, three pods on the first control plane node come up the kube-vip control plane endpoint IP address as their pod IP address. In clusters with multiple control plane nodes, the kube-vip address then fails over to another node, but these pods continue to retain that address as their pod IP. This could cause issues for anything that needs to reach them via that IP.

These pods consistently get assigned the wrong address: etcd, kube-apiserver, kube-proxy.

The following shows the issue on a cluster with 3 control plane nodes with those pods using the cluster endpoint IP(2013:930::161) as their pod IP:

capv@153-master-0-control-plane-gsd55 [ ~ ]$ kubectl get pod -A -o wide
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS        AGE     IP                NODE                               NOMINATED NODE   READINESS GATES
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-8df76564d-n6mdg        1/1     Running   1 (3m39s ago)   7m9s    2016:930:0:1::4   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-5b9577cd5c-cvd8w   1/1     Running   0               7m5s    2016:930:0:2::5   153-worker-0-78ddc7544c-98dfc      <none>           <none>
capi-system                         capi-controller-manager-9b5fc4874-8k2vk                          1/1     Running   0               7m12s   2016:930:0:2::4   153-worker-0-78ddc7544c-98dfc      <none>           <none>
capv-system                         capv-controller-manager-755779d5b8-pk6zg                         1/1     Running   0               7m2s    2016:930:0:1::5   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
cert-manager                        cert-manager-654fc8bbbc-7ppb6                                    1/1     Running   0               7m43s   2016:930:0:2::3   153-worker-0-78ddc7544c-98dfc      <none>           <none>
cert-manager                        cert-manager-cainjector-6c6d94f8c8-5sb2f                         1/1     Running   0               7m43s   2016:930:0:1::2   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
cert-manager                        cert-manager-webhook-5fcdb7665c-vxr92                            1/1     Running   0               7m43s   2016:930:0:1::3   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
kube-system                         antrea-agent-ksg6d                                               2/2     Running   0               3m49s   2013:930::7f      153-worker-0-78ddc7544c-vhwtg      <none>           <none>
kube-system                         antrea-agent-mw85l                                               2/2     Running   0               4m12s   2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         antrea-agent-q4sjf                                               2/2     Running   0               3m24s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         antrea-agent-q7mdd                                               2/2     Running   0               4m36s   2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         antrea-agent-xwpbc                                               2/2     Running   0               3m37s   2013:930::6c      153-worker-0-78ddc7544c-98dfc      <none>           <none>
kube-system                         antrea-controller-76f6dc9976-nn5ww                               1/1     Running   0               4m36s   2013:930::6c      153-worker-0-78ddc7544c-98dfc      <none>           <none>
kube-system                         coredns-67cdb6d6ff-4mddb                                         1/1     Running   0               13m     2016:930::3       153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         coredns-67cdb6d6ff-mr6kr                                         1/1     Running   0               13m     2016:930::4       153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         etcd-153-master-0-control-plane-g89df                            1/1     Running   0               7m52s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         etcd-153-master-0-control-plane-gsd55                            1/1     Running   1 (14m ago)     14m     2013:930::161     153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         etcd-153-master-0-control-plane-wnxhq                            1/1     Running   0               10m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         kube-apiserver-153-master-0-control-plane-g89df                  1/1     Running   1 (7m46s ago)   8m13s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         kube-apiserver-153-master-0-control-plane-gsd55                  1/1     Running   0               14m     2013:930::161     153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         kube-apiserver-153-master-0-control-plane-wnxhq                  1/1     Running   0               11m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         kube-controller-manager-153-master-0-control-plane-g89df         1/1     Running   0               8m14s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         kube-controller-manager-153-master-0-control-plane-gsd55         1/1     Running   1 (11m ago)     14m     2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         kube-controller-manager-153-master-0-control-plane-wnxhq         1/1     Running   0               11m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         kube-proxy-jkxm4                                                 1/1     Running   0               12m     2013:930::6c      153-worker-0-78ddc7544c-98dfc      <none>           <none>
kube-system                         kube-proxy-m7gg7                                                 1/1     Running   0               8m14s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         kube-proxy-nkf9r                                                 1/1     Running   0               12m     2013:930::7f      153-worker-0-78ddc7544c-vhwtg      <none>           <none>
kube-system                         kube-proxy-z9jt2                                                 1/1     Running   0               13m     2013:930::161     153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         kube-proxy-zcspg                                                 1/1     Running   0               11m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         kube-scheduler-153-master-0-control-plane-g89df                  1/1     Running   0               8m13s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         kube-scheduler-153-master-0-control-plane-gsd55                  1/1     Running   1 (11m ago)     14m     2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         kube-scheduler-153-master-0-control-plane-wnxhq                  1/1     Running   0               11m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         kube-vip-153-master-0-control-plane-g89df                        1/1     Running   0               8m13s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         kube-vip-153-master-0-control-plane-gsd55                        1/1     Running   2 (10m ago)     14m     2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         kube-vip-153-master-0-control-plane-wnxhq                        1/1     Running   0               11m     2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         metrics-server-64979b746d-xfmh4                                  1/1     Running   0               5m37s   2016:930:0:1::6   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
kube-system                         vsphere-cloud-controller-manager-2lqzf                           1/1     Running   0               5m5s    2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         vsphere-cloud-controller-manager-vgc9f                           1/1     Running   0               4m58s   2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         vsphere-cloud-controller-manager-zdqw5                           1/1     Running   0               5m2s    2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         vsphere-csi-controller-7bf48d9846-4g578                          6/6     Running   0               5m34s   2016:930:0:3::2   153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         vsphere-csi-controller-7bf48d9846-65lqr                          6/6     Running   0               5m34s   2016:930::5       153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         vsphere-csi-controller-7bf48d9846-9nmkv                          6/6     Running   0               5m34s   2016:930:0:4::2   153-master-0-control-plane-g89df   <none>           <none>
kube-system                         vsphere-csi-node-4dx97                                           3/3     Running   1 (5m7s ago)    5m34s   2013:930::4e      153-master-0-control-plane-g89df   <none>           <none>
kube-system                         vsphere-csi-node-cndfv                                           3/3     Running   2 (5m10s ago)   5m34s   2013:930::6c      153-worker-0-78ddc7544c-98dfc      <none>           <none>
kube-system                         vsphere-csi-node-cnpv9                                           3/3     Running   2 (5m7s ago)    5m34s   2013:930::8d      153-master-0-control-plane-wnxhq   <none>           <none>
kube-system                         vsphere-csi-node-mdq4s                                           3/3     Running   2 (5m5s ago)    5m34s   2013:930::7e      153-master-0-control-plane-gsd55   <none>           <none>
kube-system                         vsphere-csi-node-n8l9f                                           3/3     Running   2 (5m6s ago)    5m34s   2013:930::7f      153-worker-0-78ddc7544c-vhwtg      <none>           <none>
tanzu-system                        secretgen-controller-6568b54868-p4gs9                            1/1     Running   0               4m18s   2016:930:0:1::7   153-worker-0-78ddc7544c-vhwtg      <none>           <none>
tkg-system                          kapp-controller-84cc78f478-hdbp9                                 1/1     Running   2 (10m ago)     12m     2013:930::7f      153-worker-0-78ddc7544c-vhwtg      <none>           <none>
tkg-system                          tanzu-addons-controller-manager-97b6755b4-9f6fs                  1/1     Running   0               9m2s    2013:930::6c      153-worker-0-78ddc7544c-98dfc      <none>           <none>
tkg-system                          tanzu-capabilities-controller-manager-54c564c7c6-lkhbx           1/1     Running   0               12m     2016:930::2       153-master-0-control-plane-gsd55   <none>           <none>
tkg-system                          tanzu-featuregates-controller-manager-bfdf46d77-s6tss            1/1     Running   0               6m43s   2016:930:0:2::6   153-worker-0-78ddc7544c-98dfc      <none>           <none>
tkr-system                          tkr-controller-manager-7bc9c979f5-9fnxx                          1/1     Running   1 (7m9s ago)    13m     2016:930:0:2::2   153-worker-0-78ddc7544c-98dfc      <none>           <none>

When the control plane node is first coming up, cloud-provider-vsphere is not yet running, so the feature added in #1480 and #1964 to exclude the kube-vip address from node.Addresses is not yet used. During this phase, other fallback logic is used by kubelet to determine the node IP. (see kubernetes/kubernetes#96670).

At this time, kube-vip, which gets deployed as a static manifest, is already running, so the kube-vip IP has been added to the network device. In IPv4, this has apparently never caused this problem. But in IPv6, we see it consistently. We believe this is due to the fact that IP addresses are returned in the order they are added for IPv4, but in reverse order for IPv6 (via serverfault).

The workaround suggested in the kubernetes issue is to use the --node-ip flag for kubelet. However, kubeadm doesn't support doing so in a first-class way, so the suggestion there is to use KUBELET_EXTRA_ARGS (see kubernetes/kubeadm#203).

Doing so via ytt and ClusterAPI is difficult as we do not know the node IP in advance and determining it from cloud-init metadata is messy, so we need to use some heuristic to detect the IP address.

This overlay has been tested in a few cases, but we need to ensure it robust enough to include in the product:

$ cat pkg/v1/providers/ytt/03_customizations/kube_vip.yaml
#@ load("@ytt:data", "data")
#@ load("@ytt:overlay", "overlay")

#@ if data.values.PROVIDER_TYPE in ["vsphere"]:
#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
spec:
  #@overlay/match-child-defaults missing_ok=True
  kubeadmConfigSpec:
    files:
      #@overlay/append
      - content: ""
        owner: root:root
        path: /etc/sysconfig/kubelet
        permissions: "0640"
    preKubeadmCommands:
      #@overlay/append
      - echo "KUBELET_EXTRA_ARGS=--node-ip=$(ip addr show dev eth0 scope global | grep /128 | cut -d ' ' -f6 | cut -d '/' -f1)" >> /etc/sysconfig/kubelet
#@ end

Expected behavior

  1. deploy a cluster on vSphere with TKG_IP_FAMILY=ipv6 and VSPHERE_CONTROL_PLANE_ENDPOINT=$some-ip
  2. list all pods w/ their IP addresses kubectl get pods -A -o wide
  3. observe that no pods have $some-ip as their IP address

Steps to reproduce the bug

  1. deploy a cluster on vSphere with TKG_IP_FAMILY=ipv6 and VSPHERE_CONTROL_PLANE_ENDPOINT=$some-ip
  2. list all pods w/ their IP addresses kubectl get pods -A -o wide
  3. observe that some pods have $some-ip as their IP address

Version (include the SHA if the version is not obvious)

v0.11.4 and v0.20.0

Environment where the bug was observed (cloud, OS, etc)

IPv6 vSphere

Note: this most likely also affects TKG_IP_FAMILY=ipv6,ipv4

@mcwumbly mcwumbly added kind/bug PR/Issue related to a bug needs-triage Indicates an issue or PR needs to be triaged labels Apr 11, 2022
mcwumbly added a commit that referenced this issue Apr 11, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: #2098

Co-authored-by: Christian Ang <angc@vmware.com>
vuil pushed a commit to vuil/tanzu-framework that referenced this issue Apr 25, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: vmware-tanzu#2098

Co-authored-by: Christian Ang <angc@vmware.com>
vuil pushed a commit to vuil/tanzu-framework that referenced this issue Apr 25, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: vmware-tanzu#2098

Co-authored-by: Christian Ang <angc@vmware.com>
mcwumbly added a commit that referenced this issue Apr 26, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider-vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: #2098

Co-authored-by: Christian Ang <angc@vmware.com>
mcwumbly added a commit to mcwumbly/tanzu-framework that referenced this issue Apr 26, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider-vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: vmware-tanzu#2098

Backport of: vmware-tanzu#2103

Co-authored-by: Christian Ang <angc@vmware.com>
chandrareddyp pushed a commit to chandrareddyp/tanzu-framework that referenced this issue Apr 28, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider-vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: vmware-tanzu#2098

Co-authored-by: Christian Ang <angc@vmware.com>
vuil pushed a commit that referenced this issue Apr 29, 2022
On IPv6 only clusters or dual stack clusters with IPv6 as the primary IP
family, configure `--node-ip` for kubelet on the control plane nodes to
the first detected IP address prior to launching kube-vip.

Otherwise, prior to cloud-provider-vsphere taking over, kubelet will use
the kube-vip address as the node IP and host network pods that start up
early will get this address set as their pod IP.

See: #2098

Backport of: #2103

Co-authored-by: Christian Ang <angc@vmware.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug PR/Issue related to a bug needs-triage Indicates an issue or PR needs to be triaged
Projects
None yet
1 participant