Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual stack: unable to communicate between nodes via ipv6 #8794

Closed
kyrofa opened this issue Nov 6, 2023 · 20 comments
Closed

Dual stack: unable to communicate between nodes via ipv6 #8794

kyrofa opened this issue Nov 6, 2023 · 20 comments

Comments

@kyrofa
Copy link

kyrofa commented Nov 6, 2023

Environmental Info:
K3s Version:
$ k3s -v
k3s version v1.27.6+k3s1 (bd04941)
go version go1.20.8

Node(s) CPU architecture, OS, and Version:
Debian 12 (Bookworm)
Linux s1 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

Cluster Configuration:
3 servers

Describe the bug:
The cluster is dual stack: both ipv4 and ipv6. Each node has two NICs: one public (bond1), one private (bond0). I'm using --node-ip and --node-external-ip to specify which is which. I'm also using --flannel-ipv6-masq, since the ipv6 block I'm using isn't publicly routable (I can make it that way, I'm just experimenting at this point). As an example, here's the args for one of my nodes:

--flannel-iface bond0
--node-ip fda5:8888:9999:310::10,10.3.1.10
--node-external-ip 2603:1111:2222:2e:10::10,50.100.200.228
--cluster-cidr fda5:8888:9999:311::0/64,10.3.128.0/17
--kube-controller-manager-arg node-cidr-mask-size-ipv4=24
--kube-controller-manager-arg node-cidr-mask-size-ipv6=80
--service-cidr fda5:8888:9999:312::0/112,10.3.64.0/18
--cluster-dns fda5:8888:9999:312::10
--flannel-ipv6-masq
--bind-address fda5:8888:9999:310::10
--advertise-address fda5:8888:9999:310::10

While the pods can communicate with each other via ipv4, they cannot via ipv6. To simplify, let's talk about nodes 1 and 2:

  • Node 1:
    • flannel.1: 10.3.128.0/32
    • flannel-v6.1: fda5:8888:9999:311::/128
    • cni0: fda5:8888:9999:311::1/80,10.3.128.1/24
  • Node 2:
    • flannel.1: 10.3.130.0/32
    • flannel-v6.1: fda5:8888:9999:311:2::/128
    • cni0: fda5:8888:9999:311:2::1/80,10.3.130.1/24

Ignoring pods entirely, from node 1, I can ping node 2's flannel.1 IP address:

$ ping 10.3.130.0
PING 10.3.130.0 (10.3.130.0) 56(84) bytes of data.
64 bytes from 10.3.130.0: icmp_seq=1 ttl=64 time=0.390 ms
64 bytes from 10.3.130.0: icmp_seq=2 ttl=64 time=0.261 ms

However, I cannot ping node 2's flannel-v6.1 IP address:

$ ping fda5:8888:9999:311:2::0
PING fda5:8888:9999:311:2::0(fda5:8888:9999:311:2::) 56 data bytes
From fda5:8888:9999:310::1 icmp_seq=1 Destination unreachable: Address unreachable
From fda5:8888:9999:310::1 icmp_seq=2 Destination unreachable: Address unreachable

Interestingly, note the response is coming from fda5:8888:9999:310::1, which is bond0's gateway. It seems like this should stay within flannel, no? ipv4 does:

$ ip route get 10.3.130.0
10.3.130.0 via 10.3.130.0 dev flannel.1 src 10.3.128.0 uid 1007 
    cache

But ipv6, obviously, does not:

$ ip route get fda5:8888:9999:311:2::0
fda5:8888:9999:311:2:: from :: via fda5:8888:9999:310::1 dev bond0 src fda5:8888:9999:310::10 metric 1024 pref medium

Here are the routes on Node 1:

# ipv4
$ sudo route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         50.100.200.225  0.0.0.0         UG    0      0        0 bond1
10.0.0.0        10.3.1.1        255.0.0.0       UG    0      0        0 bond0
10.3.1.0        0.0.0.0         255.255.255.0   U     0      0        0 bond0
10.3.128.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.3.129.0      10.3.129.0      255.255.255.0   UG    0      0        0 flannel.1
10.3.130.0      10.3.130.0      255.255.255.0   UG    0      0        0 flannel.1
50.100.200.224  0.0.0.0         255.255.255.240 U     0      0        0 bond1

# ipv6
$ sudo route -6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref  Use If
2603:1111:2222:2e::/64         [::]                       U    256 1      0 bond1
fda5:8888:9999:310::/64        [::]                       U    256 91      0 bond0
fda5:8888:9999:311::/128       [::]                       U    256 1      0 flannel-v6.1
fda5:8888:9999:311::/80        [::]                       U    256 29      0 cni0
fda5:8888:9999::/48            fda5:8888:9999:310::1      UG   1024 18      0 bond0
fe80::/64                      [::]                       U    256 1      0 bond0
fe80::/64                      [::]                       U    256 1      0 bond1
fe80::/64                      [::]                       U    256 1      0 flannel.1
fe80::/64                      [::]                       U    256 1      0 flannel-v6.1
fe80::/64                      [::]                       U    256 1      0 cni0
fe80::/64                      [::]                       U    256 1      0 veth436f4c82
fe80::/64                      [::]                       U    256 1      0 veth50dfadc6
fe80::/64                      [::]                       U    256 1      0 veth60050681
[::]/0                         2603:1111:2222:2e::1       UGH  1024 3      0 bond1
localhost/128                  [::]                       Un   0   7      0 lo
2603:1111:2222:2e::/128        [::]                       Un   0   3      0 bond1
2603:1111:2222:2e:10::10/128   [::]                       Un   0   6      0 bond1
fda5:8888:9999:310::/128       [::]                       Un   0   3      0 bond0
fda5:8888:9999:310::10/128     [::]                       Un   0   92      0 bond0
fda5:8888:9999:311::/128       [::]                       Un   0   3      0 flannel-v6.1
fda5:8888:9999:311::/128       [::]                       Un   0   3      0 cni0
fda5:8888:9999:311::1/128      [::]                       Un   0   90      0 cni0
fe80::/128                     [::]                       Un   0   5      0 bond0
fe80::/128                     [::]                       Un   0   3      0 bond1
fe80::/128                     [::]                       Un   0   3      0 cni0
fe80::/128                     [::]                       Un   0   3      0 veth50dfadc6
fe80::/128                     [::]                       Un   0   3      0 flannel.1
fe80::/128                     [::]                       Un   0   3      0 veth60050681
fe80::/128                     [::]                       Un   0   3      0 flannel-v6.1
fe80::/128                     [::]                       Un   0   3      0 veth436f4c82
fe80::e6:4aff:fe96:a2a5/128    [::]                       Un   0   66      0 cni0
fe80::343b:94ff:fe66:f6c8/128  [::]                       Un   0   2      0 veth50dfadc6
fe80::4adf:37ff:fe61:c10/128   [::]                       Un   0   10      0 bond0
fe80::68c6:2ff:fe92:441d/128   [::]                       Un   0   2      0 flannel.1
fe80::a047:29ff:fefb:7c49/128  [::]                       Un   0   2      0 veth60050681
fe80::ba83:3ff:fe50:5eb2/128   [::]                       Un   0   6      0 bond1
fe80::dc0e:4ff:fe75:67af/128   [::]                       Un   0   2      0 veth436f4c82
fe80::f8f9:a5ff:feac:9747/128  [::]                       Un   0   3      0 flannel-v6.1
ff00::/8                       [::]                       U    256 6      0 bond0
ff00::/8                       [::]                       U    256 2      0 bond1
ff00::/8                       [::]                       U    256 1      0 flannel.1
ff00::/8                       [::]                       U    256 1      0 flannel-v6.1
ff00::/8                       [::]                       U    256 7      0 cni0
ff00::/8                       [::]                       U    256 1      0 veth436f4c82
ff00::/8                       [::]                       U    256 1      0 veth50dfadc6
ff00::/8                       [::]                       U    256 1      0 veth60050681
[::]/0                         [::]                       !n   -1  1      0 lo

It seems pretty clear that flannel isn't putting all the routes in here that it should. I expect I've made a mistake in my configuration, but I'm not sure how to debug this any further. Note that I have no firewall enabled. Does anyone have some insight into what's happening here?

@brandond
Copy link
Member

brandond commented Nov 7, 2023

@manuelbuil any ideas?

@manuelbuil
Copy link
Contributor

Hey, could you provide the following output:
1 - kubectl get node $NODE2 -o yaml
2 - In $NODE2: cat /run/flannel/subnet.env
3 - ip -6 route in both $NODE1 and $NODE2

@kyrofa
Copy link
Author

kyrofa commented Nov 8, 2023

Hey @manuelbuil, of course, thank you for your help. For completeness, I'll include what you ask for both nodes.

Node YAML

Node 1

$ kubectl get node s1 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.3.1.10
    etcd.k3s.cattle.io/node-address: fda5:8888:9999:310::10
    etcd.k3s.cattle.io/node-name: s1-cff20337
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"6a:c6:02:92:44:1d"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/backend-v6-data: '{"VNI":1,"VtepMAC":"fa:f9:a5:ac:97:47"}'
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 10.3.1.10
    flannel.alpha.coreos.com/public-ipv6: fda5:8888:9999:310::10
    k3s.io/external-ip: 2603:1111:2222:2e:10::10,50.100.200.228
    k3s.io/hostname: s1
    k3s.io/internal-ip: fda5:8888:9999:310::10,10.3.1.10
    k3s.io/node-args: '["server","--node-name","","s1","--flannel-iface","bond0","--node-ip","fda5:8888:9999:310::10,10.3.1.10","--node-external-ip","2603:1111:2222:2e:10::10,50.100.200.228","--cluster-cidr","fda5:8888:9999:311::0/64,10.3.128.0/17","--kube-controller-manager-arg","node-cidr-mask-size-ipv4=24","--kube-controller-manager-arg","node-cidr-mask-size-ipv6=80","--service-cidr","fda5:8888:9999:312::0/112,10.3.64.0/18","--cluster-dns","fda5:8888:9999:312::10","--flannel-ipv6-masq","--bind-address","fda5:8888:9999:310::10","--advertise-address","fda5:8888:9999:310::10","--kube-cloud-controller-manager-arg","webhook-bind-address=fda5:8888:9999:310::10","--debug","--disable","traefik"]'
    k3s.io/node-config-hash: JUDA3F3H42MZRSKLNBYYPJEZER24NZ5ZWLCF6LL774YDZ2B6ZHAQ====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/3dfc950bd39d2e2b435291ab8c1333aa6051fcaf46325aee898819f3b99d4b21","K3S_TOKEN":"********"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-11-06T17:28:00Z"
  finalizers:
  - wrangler.cattle.io/managed-etcd-controller
  - wrangler.cattle.io/node
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: k3s
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: s1
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
    node.kubernetes.io/instance-type: k3s
  name: s1
  resourceVersion: "895924"
  uid: f8169599-5737-49b1-bdc5-ad086a31b5e0
spec:
  podCIDR: fda5:8888:9999:311::/80
  podCIDRs:
  - fda5:8888:9999:311::/80
  - 10.3.128.0/24
  providerID: k3s://s1
status:
  addresses:
  - address: 10.3.1.10
    type: InternalIP
  - address: 2603:1111:2222:2e:10::10
    type: ExternalIP
  - address: 50.100.200.228
    type: ExternalIP
  - address: s1
    type: Hostname
  allocatable:
    cpu: "88"
    ephemeral-storage: "46414214108"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 527989644Ki
    pods: "110"
  capacity:
    cpu: "88"
    ephemeral-storage: 47711980Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 527989644Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2023-11-08T18:06:04Z"
    lastTransitionTime: "2023-11-06T20:41:43Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2023-11-08T18:06:04Z"
    lastTransitionTime: "2023-11-06T20:41:43Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2023-11-08T18:06:04Z"
    lastTransitionTime: "2023-11-06T20:41:43Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2023-11-08T18:06:04Z"
    lastTransitionTime: "2023-11-06T20:41:43Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/rancher/mirrored-metrics-server@sha256:c2dfd72bafd6406ed306d9fbd07f55c496b004293d13d3de88a4567eacc36558
    - docker.io/rancher/mirrored-metrics-server:v0.6.3
    sizeBytes: 29943298
  - names:
    - docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
    - docker.io/rancher/mirrored-coredns-coredns:1.10.1
    sizeBytes: 16190137
  - names:
    - docker.io/rancher/local-path-provisioner@sha256:5bb33992a4ec3034c28b5e0b3c4c2ac35d3613b25b79455eb4b1a95adc82cdc0
    - docker.io/rancher/local-path-provisioner:v0.0.24
    sizeBytes: 14887612
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 301463
  nodeInfo:
    architecture: amd64
    bootID: 5c0e0728-3f04-4cd2-bf07-610815de16f7
    containerRuntimeVersion: containerd://1.7.6-k3s1.27
    kernelVersion: 6.1.0-13-amd64
    kubeProxyVersion: v1.27.6+k3s1
    kubeletVersion: v1.27.6+k3s1
    machineID: 50859da1a99c45a6a1d2d33bc0b2a4e5
    operatingSystem: linux
    osImage: Debian GNU/Linux 12 (bookworm)
    systemUUID: 37383638-3330-4d32-3238-343230325339

Node 2

$ kubectl get node s2 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.3.1.20
    etcd.k3s.cattle.io/node-address: fda5:8888:9999:310::20
    etcd.k3s.cattle.io/node-name: s2-8cc6b407
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"b6:6f:32:60:13:14"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/backend-v6-data: '{"VNI":1,"VtepMAC":"42:3f:d7:49:d4:10"}'
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 10.3.1.20
    flannel.alpha.coreos.com/public-ipv6: fda5:8888:9999:310::20
    k3s.io/external-ip: 2603:1111:2222:2e:10::20,50.100.200.229
    k3s.io/hostname: s2
    k3s.io/internal-ip: fda5:8888:9999:310::20,10.3.1.20
    k3s.io/node-args: '["server","--node-name","","s2","--flannel-iface","bond0","--node-ip","fda5:8888:9999:310::20,10.3.1.20","--node-external-ip","2603:1111:2222:2e:10::20,50.100.200.229","--cluster-cidr","fda5:8888:9999:311::0/64,10.3.128.0/17","--kube-controller-manager-arg","node-cidr-mask-size-ipv4=24","--kube-controller-manager-arg","node-cidr-mask-size-ipv6=80","--service-cidr","fda5:8888:9999:312::0/112,10.3.64.0/18","--cluster-dns","fda5:8888:9999:312::10","--flannel-ipv6-masq","--bind-address","fda5:8888:9999:310::20","--advertise-address","fda5:8888:9999:310::20","--kube-cloud-controller-manager-arg","webhook-bind-address=fda5:8888:9999:310::20","--debug","--disable","traefik"]'
    k3s.io/node-config-hash: 6XV6RZIRYIQO65BXBP2LHW56FWENGZ2GPA3EEC5BZUR7UNIZ3ZHQ====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/3dfc950bd39d2e2b435291ab8c1333aa6051fcaf46325aee898819f3b99d4b21","K3S_TOKEN":"********"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-11-06T17:28:30Z"
  finalizers:
  - wrangler.cattle.io/managed-etcd-controller
  - wrangler.cattle.io/node
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: k3s
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: s2
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
    node.kubernetes.io/instance-type: k3s
  name: s2
  resourceVersion: "897408"
  uid: 1d441af1-f804-4321-bf02-c5e427bfbcd9
spec:
  podCIDR: fda5:8888:9999:311:2::/80
  podCIDRs:
  - fda5:8888:9999:311:2::/80
  - 10.3.130.0/24
  providerID: k3s://s2
status:
  addresses:
  - address: 10.3.1.20
    type: InternalIP
  - address: 2603:1111:2222:2e:10::20
    type: ExternalIP
  - address: 50.100.200.229
    type: ExternalIP
  - address: s2
    type: Hostname
  allocatable:
    cpu: "88"
    ephemeral-storage: "46414214108"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 527989644Ki
    pods: "110"
  capacity:
    cpu: "88"
    ephemeral-storage: 47711980Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 527989644Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2023-11-08T18:10:55Z"
    lastTransitionTime: "2023-11-06T17:28:30Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2023-11-08T18:10:55Z"
    lastTransitionTime: "2023-11-06T17:28:30Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2023-11-08T18:10:55Z"
    lastTransitionTime: "2023-11-06T17:28:30Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2023-11-08T18:10:55Z"
    lastTransitionTime: "2023-11-06T17:28:31Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/wbitt/network-multitool@sha256:d1137e87af76ee15cd0b3d4c7e2fcd111ffbd510ccd0af076fc98dddfc50a735
    - docker.io/wbitt/network-multitool:latest
    sizeBytes: 25281012
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 301463
  nodeInfo:
    architecture: amd64
    bootID: 67fcd7f8-63d1-46e0-90b8-817843db126b
    containerRuntimeVersion: containerd://1.7.6-k3s1.27
    kernelVersion: 6.1.0-13-amd64
    kubeProxyVersion: v1.27.6+k3s1
    kubeletVersion: v1.27.6+k3s1
    machineID: f7e6a1c6728c4a8781d09f3f86f76210
    operatingSystem: linux
    osImage: Debian GNU/Linux 12 (bookworm)
    systemUUID: 37383638-3330-4d32-3238-343230325346

/run/flannel/subnet.env

Node 1

$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.3.128.0/17
FLANNEL_SUBNET=10.3.128.1/24
FLANNEL_IPV6_NETWORK=fda5:8888:9999:311::/64
FLANNEL_IPV6_SUBNET=fda5:8888:9999:311::1/80
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Node 2

$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.3.128.0/17
FLANNEL_SUBNET=10.3.130.1/24
FLANNEL_IPV6_NETWORK=fda5:8888:9999:311::/64
FLANNEL_IPV6_SUBNET=fda5:8888:9999:311:2::1/80
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Routes

Node 1

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999:311:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:9999:311::/80 dev cni0 proto kernel metric 256 pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev veth436f4c82 proto kernel metric 256 pref medium
fe80::/64 dev veth50dfadc6 proto kernel metric 256 pref medium
fe80::/64 dev veth60050681 proto kernel metric 256 pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

Node 2

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999:311:2:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:9999:311:2::/80 dev cni0 proto kernel metric 256 linkdown pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 linkdown pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

@manuelbuil
Copy link
Contributor

Thanks for the output! One thing that stands out is that both are k3s servers but I don't see how they are being connected together to create an HA control plane. How did you deploy both nodes?
Please refer to our docs on how to create an HA control plane:

@kyrofa
Copy link
Author

kyrofa commented Nov 9, 2023

I have an ansible playbook that does it, using an approach inspired by this. One node runs k3s server --cluster-init --token my-token (with those other params discussed above), the other nodes run k3s server --server=<node 1> --token my-token (also with the params above). Once the cluster is happy, that bootstrapping service is ripped down and replaced with systemd unit files that don't include the --cluster-init, or --server args. The docs to which you refer say those are ignored once the cluster is configured anyway. This strategy appears to work for ipv4, it's just ipv6 that seems unhappy, but maybe I'm missing something there?

@manuelbuil
Copy link
Contributor

manuelbuil commented Nov 10, 2023

I have an ansible playbook that does it, using an approach inspired by this. One node runs k3s server --cluster-init --token my-token (with those other params discussed above), the other nodes run k3s server --server=<node 1> --token my-token (also with the params above). Once the cluster is happy, that bootstrapping service is ripped down and replaced with systemd unit files that don't include the --cluster-init, or --server args.

When installing k3s you get a systemd service running with the configured parameters. If I understand correctly, you stop that service, change the config parameters and restart it or create a different systemd service, right? Why don't you stay with the created systemd service and the original config parameters?

The docs to which you refer say those are ignored once the cluster is configured anyway. This strategy appears to work for ipv4, it's just ipv6 that seems unhappy, but maybe I'm missing something there?

I have asked internally and if etcd db files exist on disk, those parameters get indeed ignored. Do you see any flannel log that provides extra information? It seems the flannel instance of the node is not aware of being part of a cluster

@kyrofa
Copy link
Author

kyrofa commented Nov 10, 2023

@manuelbuil right, it's at the end of that ha-embedded doc:

If an etcd datastore is found on disk either because that node has either initialized or joined a cluster already, the datastore arguments (--cluster-init, --server, --datastore-endpoint, etc) are ignored.

As for your question:

Why don't you stay with the created systemd service and the original config parameters?

Because I don't like to treat any of my nodes as special, so they all end up with exactly the same systemd unit. Leaving bootstrap-only options like --cluster-init and --server will just cause confusion down the road for my colleagues less familiar with k3s (or me, when I've forgotten this stuff!). If you fast-forward into the future, can you imagine if node 1 (the one where I ran --cluster-init) went down, and was replaced by one that had --server <node 2>, but nodes 2 and 3 still had --server <node 1>, which no longer exists? It's just asking for misunderstanding and trouble. I personally would like to see k3s grow bootstrapping commands instead of following this pattern, but I digress.

Note that kubectl shows all the nodes:

$ kubectl get nodes
NAME   STATUS   ROLES                       AGE     VERSION
s1     Ready    control-plane,etcd,master   3d21h   v1.27.6+k3s1
s2     Ready    control-plane,etcd,master   3d21h   v1.27.6+k3s1
s3     Ready    control-plane,etcd,master   3d21h   v1.27.6+k3s1

And again, if I swap the order of ipv6,ipv4 in these params (such that ipv4 becomes the primary), this works just fine (although ipv6 still doesn't work, to be clear 😛 ). I'm happy to try adding the bootstrapping options back though, if you feel like it will change anything.

Do you see any flannel log that provides extra information? It seems the flannel instance of the node is not aware of being part of a cluster.

The only place I know to look is the journal for my systemd unit, can you confirm? I enabled debug mode, and when I fire it up on node 1 I see this:

Nov 06 17:06:18 s1 k3s[20078]: time="2023-11-06T17:06:18-08:00" level=debug msg="Creating the flannel configuration for backend vxlan in file /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json"
Nov 06 17:06:18 s1 k3s[20078]: time="2023-11-06T17:06:18-08:00" level=debug msg="The flannel configuration is {\n\t\"Network\": \"10.3.128.0/17\",\n\t\"EnableIPv6\": true,\n\t\"EnableIPv4\": true,\n\t\"IPv6Network\": \"fda5:8888:9999:311::/64\",\n\t\"Backend\": {\n\t\"Type\": \"vxlan\"\n}\n}\n"
Nov 06 17:06:20 s1 k3s[20078]: time="2023-11-06T17:06:20-08:00" level=info msg="Starting flannel with backend vxlan"
Nov 06 17:06:20 s1 k3s[20078]: time="2023-11-06T17:06:20-08:00" level=debug msg="The interface bond0 will be used by flannel"
Nov 06 17:06:20 s1 k3s[20078]: time="2023-11-06T17:06:20-08:00" level=info msg="The interface bond0 with ipv4 address 10.3.1.10 will be used by flannel"
Nov 06 17:06:20 s1 k3s[20078]: time="2023-11-06T17:06:20-08:00" level=info msg="Using dual-stack mode. The ipv6 address fda5:8888:9999:310::10 will be used by flannel"
Nov 06 17:06:21 s1 k3s[20078]: time="2023-11-06T17:06:21-08:00" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
Nov 06 17:06:21 s1 k3s[20078]: time="2023-11-06T17:06:21-08:00" level=info msg="Running flannel backend."

It doesn't appear to be unhappy. Is there a way to get more information from flannel?

@manuelbuil
Copy link
Contributor

I was trying to reproduce the issue but I have been unsuccessful so far. In my env I always see the route your are missing in ipv6. Even when I stop k3s, remove cluster-init and server parameters and restart it again.

BTW, I'd recommend you moving to v1.27.7+k3s1 because we added several improvements in dualStack, especially for your case where IPv6 comes first in the config (e.g. you'll see the IPv6 address when listing the pods). Among others:

@kyrofa
Copy link
Author

kyrofa commented Nov 10, 2023

Okay I upgraded to v1.27.7+k3s2, but I'm afraid I must report no change. From node 1 I can ping node 2's flannel.1 address, but not node 2's flannel-v6.1 address (that traffic is still going to the gateway). Here are the routes:

node 1:

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999:311:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:9999:311::/80 dev cni0 proto kernel metric 256 pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev vethff1ef2a7 proto kernel metric 256 pref medium
fe80::/64 dev vethb9fbaef2 proto kernel metric 256 pref medium
fe80::/64 dev veth3c9a8168 proto kernel metric 256 pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

node 2:

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999:311:2:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

@kyrofa
Copy link
Author

kyrofa commented Nov 10, 2023

I wonder if maybe this could have something to do with the routes we add to our private interface. Our public interface is the one with the gateway, so we use routes on the private interface. Could flannel be picking up on those and deciding not to install a more specific route?

Here are my network configs, in case it's useful:

bond0 (private interface)

auto bond0
iface bond0 inet static
  address 10.3.1.10/24an
  bond-slaves eno5 eno6
  bond-mode active-backup
  bond-primary eno5
  bond-miimon 100

  # The public network will have a gateway, so we can't add one here.
  # Use routes instead.
  up ip route add 10.0.0.0/8 via 10.3.1.1
  down ip route del 10.0.0.0/8 via 10.3.1.1

iface bond0 inet6 static
  address fda5:8888:9999:310::10/64
  up ip -6 route add fda5:8888:9999::/48 via fda5:8888:9999:310::1
  down ip -6 route del fda5:8888:9999::/48 via fda5:8888:9999:310::1

bond1 (public interface)

auto bond1
iface bond1 inet static
  address 50.100.200.228/28
  gateway 50.100.200.225
  bond-slaves eno1 eno2
  bond-mode active-backup
  bond-primary eno1
  bond-miimon 100

iface bond1 inet6 static
  address 2603:1111:2222:2e:10::10/64
  gateway 2603:1111:2222:2e::1

@manuelbuil
Copy link
Contributor

manuelbuil commented Nov 10, 2023

The flannel IP routes for multinode communication are not there. You should see something like this:

fda5:8888:9999:311::/80 via fda5:8888:9999:311:: dev flannel-v6.1 metric 1024 onlink pref medium

Something weird is happening when executing https://github.com/flannel-io/flannel/blob/master/pkg/backend/vxlan/vxlan_network.go#L141.

Could you check dmesg and see if there is something that tells if adding an ip route failed please?

What OS are you running? I can't reproduce the issue with opensuse or ubuntu 22.

@manuelbuil
Copy link
Contributor

Could you also try adding the route manually? Let's see if we get more information.

In node1:

sudo ip route add fda5:8888:9999:311:2::/80 via fda5:8888:9999:311:2:: dev flannel-v6.1 onlink

In node2:

sudo ip route add fda5:8888:9999:311::/80 via fda5:8888:9999:311:: dev flannel-v6.1 onlink

@kyrofa
Copy link
Author

kyrofa commented Nov 10, 2023

Could you check dmesg and see if there is something that tells if adding an ip route failed please?

Not that I see. I'm not sure what to grep for, though. "route" gave me no results. Neither did "flannel". Here's some stuff about cni that doesn't look unhappy:

[   28.062783] cni0: port 1(vethb11291f0) entered blocking state
[   28.062795] cni0: port 1(vethb11291f0) entered disabled state
[   28.062956] device vethb11291f0 entered promiscuous mode
[   28.063082] cni0: port 1(vethb11291f0) entered blocking state
[   28.063088] cni0: port 1(vethb11291f0) entered forwarding state
[   28.063579] cni0: port 1(vethb11291f0) entered disabled state
[   28.073198] IPv6: ADDRCONF(NETDEV_CHANGE): vethb11291f0: link becomes ready
[   28.073286] cni0: port 1(vethb11291f0) entered blocking state
[   28.073293] cni0: port 1(vethb11291f0) entered forwarding state
[   28.097689] cni0: port 2(veth2d7ce291) entered blocking state
[   28.097704] cni0: port 2(veth2d7ce291) entered disabled state
[   28.098023] device veth2d7ce291 entered promiscuous mode
[   28.098139] cni0: port 2(veth2d7ce291) entered blocking state
[   28.098148] cni0: port 2(veth2d7ce291) entered forwarding state
[   28.117362] IPv6: ADDRCONF(NETDEV_CHANGE): veth2d7ce291: link becomes ready
[   28.118535] cni0: port 3(veth6d8bcf01) entered blocking state
[   28.118548] cni0: port 3(veth6d8bcf01) entered disabled state
[   28.118632] device veth6d8bcf01 entered promiscuous mode
[   28.118664] cni0: port 3(veth6d8bcf01) entered blocking state
[   28.118667] cni0: port 3(veth6d8bcf01) entered forwarding state
[   28.125404] IPv6: ADDRCONF(NETDEV_CHANGE): veth6d8bcf01: link becomes ready

What OS are you running? I can't reproduce the issue with opensuse or ubuntu 22.

Sorry about that, I should have included it in the OP (I've updated it now): Debian 12 (Bookworm).

Could you also try adding the route manually?

Actually no, this is interesting. Here's the result of trying to do that on node 1:

$ sudo ip route add fda5:8888:9999:311:2::/80 via fda5:8888:9999:311:2:: dev flannel-v6.1 onlink
Error: Nexthop has invalid gateway or device mismatch.

That's coming from here in the kernel. I don't quite know what it means, I'm researching.

@kyrofa
Copy link
Author

kyrofa commented Nov 11, 2023

Okay I still can't claim full understanding of what's happening here, but I made some good progress today. That line in the kernel had two important commits involved in it that added some good context in its commit messages (I love a good commit message). It really made me start wondering about that up ip -6 route add fda5:8888:9999::/48 via fda5:8888:9999:310::1 going out bond0, and manually adding the route was trying to get a subset of that same network going through flannel-v6.1. The kernel seemed upset by that somehow, so on a whim I tried to use a completely separate network for the cluster and service CIDRs. Using fda5:8888:eeee instead of fda5:8888:9999, this immediately started working. Flannel was able to add the routes you would expect, I could ping as expected, and start generally using the cluster as I would expect (almost: see #8809, but I suspect that's unrelated).

Node 1

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fda5:8888:eeee:311:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:eeee:311::/80 dev cni0 proto kernel metric 256 pref medium
fda5:8888:eeee:311:1::/80 via fda5:8888:eeee:311:1:: dev flannel-v6.1 metric 1024 onlink pref medium
fda5:8888:eeee:311:2::/80 via fda5:8888:eeee:311:2:: dev flannel-v6.1 metric 1024 onlink pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev vethc7624cb9 proto kernel metric 256 pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

Node 2

$ ip -6 route
2603:1111:2222:2e::/64 dev bond1 proto kernel metric 256 pref medium
fda5:8888:9999:310::/64 dev bond0 proto kernel metric 256 pref medium
fda5:8888:9999::/48 via fda5:8888:9999:310::1 dev bond0 metric 1024 pref medium
fda5:8888:eeee:311::/80 via fda5:8888:eeee:311:: dev flannel-v6.1 metric 1024 onlink pref medium
fda5:8888:eeee:311:1::/80 via fda5:8888:eeee:311:1:: dev flannel-v6.1 metric 1024 onlink pref medium
fda5:8888:eeee:311:2:: dev flannel-v6.1 proto kernel metric 256 pref medium
fda5:8888:eeee:311:2::/80 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev bond1 proto kernel metric 256 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev vethdbcd52ef proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev veth4fe6feba proto kernel metric 256 pref medium
fe80::/64 dev veth6527d910 proto kernel metric 256 pref medium
fe80::/64 dev veth6c83e6b1 proto kernel metric 256 pref medium
fe80::/64 dev veth704298f9 proto kernel metric 256 pref medium
default via 2603:1111:2222:2e::1 dev bond1 metric 1024 onlink pref medium

So what is the deal, here? Why can't I use a subset of my private network for the cluster/service networks? Assuming there's some technical limitation that I don't understand, is there something flannel could do to better communicate this issue to me, instead of just not adding routes?

@manuelbuil
Copy link
Contributor

Thanks for the investigation and sharing it here :)

So what is the deal, here? Why can't I use a subset of my private network for the cluster/service networks? Assuming there's some technical limitation that I don't understand, is there something flannel could do to better communicate this issue to me, instead of just not adding routes?

I already discussed with a colleague that we should improve flannel logs because when strange kernel stuff happens we are blind

Copy link
Contributor

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@kyrofa
Copy link
Author

kyrofa commented Jan 20, 2024

This is still an issue.

Copy link
Contributor

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@kyrofa
Copy link
Author

kyrofa commented Mar 11, 2024

@manuelbuil this is still an issue, but I have a workaround so I won't stand in your way if you want to ignore it.

Copy link
Contributor

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants