-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 21.04 - vxlan failing to route #4188
Comments
Can you try disabling IP checksum offload on your nodes? Its possible that the tx checksum offload bug hasn't been fixed on the kernel that Ubuntu is shipping in 21.04. sudo ethtool -K cni0 tx-checksum-ip-generic off
sudo ethtool -K flannel.1 tx-checksum-ip-generic off |
Nope same result.
|
Hmm, it looks like I'm seeing the same thing on 21.10 as well. I see the responses come back to the originating host on the wire but they're dropped for some reason. Are you able to use host-gw or some other flannel backend instead, until this can be tracked down? |
|
host-gw only works if all your nodes are on the same subnet, which from looking at your node addresses it appears they are not. The wireguard backend might be another good option for you, although you'd need to manually install the wireguard package on your nodes before using it. I've just confirmed that it works fine on my nodes, so it does appear to be something specific to vxlan. |
I could reproduce this. The problem is that the mac addresses of flannel.1 interfaces are wrong in the bridge tables. In my case:
However, in the bridge forwarding table of host1, the mac address of flannel.1_mac_host2 is As a consequence, I can see the traffic from pod_host1 to pod_host2 encapsulated and reaching eth0 on host2. But when it decapsulates, it searches for the wrong mac and the packet gets dropped. I need to dig more but I'd say the bug is in flannel or bridge CNI binary (note that flannel binary uses the bridge binary for almost everything) |
@manuelbuil can you confirm whether the MAC is correct or not on the node annotations? I'm wondering if we have a race or something that is failing to update the annotation properly |
I can confirm that the error is in flanneld, not in the binaries. Good point... I have just restarted the flanneld daemonset and it managed to write the correct macs, so it indeed might be a race |
Is this by any chance a mac address randomization issue? Sounds kind of like systemd/systemd#13642 - this is for wifi interfaces but I wonder if for some reason it's doing the same thing to the vxlan interface. Perhaps a more relevant link: |
I can confirm that's the problem |
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
According to what @brandond and @manuelbuil wrote above, I consider that as a flannel but. Here is the attempt to fix it: |
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Because #3863 linked to this issue: On Debian Bullseye
|
Strange, I thought Debian 11 would be already using nftables. Can you confirm it does not? |
As I said, Debian 11 uses nftables and I had to set it to legacy to work properly in my k3s clusters |
This is kindof documented here: https://rancher.com/docs/k3s/latest/en/advanced/#enabling-legacy-iptables-on-raspbian-buster I'd argue that this recommendation should be be moved from "Advanced Options and Configurations" to "FAQ" or "Known Issues", and expand that this applies to all of modern versions of "Raspian", "Ubuntu" and "Debian". |
I'm also getting hit by this when trying to deploy some workload that tries to reach public registry. In this specific case, this is the error I'm encountering when trying to deploy failed to resolve reference "docker.io/pihole/pihole:2021.10.1": failed to do request: Head "https://registry-1.docker.io/v2/pihole/pihole/manifests/2021.10.1": dial tcp: lookup registry-1.docker.io: Try again It also breaks DNS across the cluster, here's output from one of the nodes before/after the deployment: 64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=10 ttl=107 time=23.8 ms
64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=11 ttl=107 time=23.3 ms
64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=12 ttl=107 time=23.6 ms
64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=13 ttl=107 time=33.4 ms
64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=14 ttl=107 time=24.1 ms
64 bytes from wd-in-f138.1e100.net (172.253.120.138): icmp_seq=15 ttl=107 time=24.3 ms
64 bytes from 172.253.120.138: icmp_seq=16 ttl=107 time=31.5 ms
64 bytes from 172.253.120.138: icmp_seq=17 ttl=107 time=25.8 ms
64 bytes from 172.253.120.138: icmp_seq=18 ttl=107 time=26.9 ms
64 bytes from 172.253.120.138: icmp_seq=19 ttl=107 time=24.3 ms
64 bytes from 172.253.120.138: icmp_seq=20 ttl=107 time=26.0 ms If I stop that, and retry pinging google, it will fail to resolve name resolution. As soon as I remove that deployment, DNS resolution starts working again. Please let me know if I can provide any more output that could help with solving this. Thank you. |
So, even though Debian 11 uses nftables as default, k3s does not work properly with nftables and thus you must change iptables to legacy in order for k3s to work? If that's the case, could you please open a different issue? Thanks! And sorry for not understanding the issue :( |
Thanks for this |
the workaround with iptables runs pretty well, so this is not an mentionable issue (for me |
Could you open a different issue for this please? |
Validated the fix with k3s master commit: 86c6924 and performed same steps as #4259 (comment) |
Commented on #4259 This is still broken for me. root@k3s-831b:~# systemd --version root@k3s-831b:~# cat /etc/os-release root@k3s-831b:~# uname -a Then deployed the noted yaml from above. kube: clemair:clemenko k3s ( 167.99.124.208:6443 ) $ kubectl get nodes,pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES and the pings clemair:clemenko k3s ( 167.99.124.208:6443 ) $ kubectl exec -it othertest-deploy-2glbg -- bash --- 10.42.2.7 ping statistics --- nginx@othertest-deploy-2glbg:/$ ping -c 1 -t 1 10.42.0.9 --- 10.42.0.9 ping statistics --- One of the things I noticed is that you are NOT using the upstream kernel. You are using the aws compiled one. Wonder if that has a fix. I am on DigitalOcean. This problem still applies to 21.10 as well. |
QA marked this as fixed in the version that we're about to release, not the version that you're currently using. Please try again once we actually release the fixed version. |
v1.22.3+k3s1 works! |
Done #4486 Thank you. |
For anyone who stumbles on this issue while using the latest version of ubuntu 21.10, vxlan modules were moved by upstream to a separate package: |
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> (cherry picked from commit 0198d5d)
* vxlan: Generate MAC address before creating a link systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> (cherry picked from commit 0198d5d) * Concern only about flannel ip addresses Currently flannel interface ip addresses are checked on startup when using vxlan and ipip backends. If multiple addresses are found, startup fails fatally. If only one address is found and is not the currently leased one, it will be assumed that it comes from a previous lease and be removed. This criteria seems arbitrary both in how it is done and in its timing. It may cause failures in situations where it might not be strictly necessary like for example if the node is running a dhcp client that is assigning link local addresses to all interfaces. It also might fail at flannel unexpected restarts which are completly unrelated to the external event that caused the unexpected modification in the flannel interface. This patch proposes to concern and check only ip address within the flannel network and takes the simple approach to ignore any other ip addresses assuming these would pose no problem on flannel operation. A discarded but more agressive alternative would be to remove all addresses that are not the currently leased one. Fixes flannel-io#1060 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> (cherry picked from commit 33a2fac) * Fix flannel hang if lease expired (cherry picked from commit 78035d0) * subnets: move forward the cursor to skip illegal subnet This PR fixs an issue when flannel gets illegal subnet event in watching leases, it doesn't move forward the etcd cursor and will stuck in the same invalid event forever. (cherry picked from commit 1a1b6f1) * fix cherry-pick glitches and test failures * disable udp backend tests since we don't actually have the udp backend in our fork Co-authored-by: Michal Rostecki <mrostecki@opensuse.org> Co-authored-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Co-authored-by: Chun Chen <ramichen@tencent.com> Co-authored-by: huangxuesen <hxs625job@outlook.com>
Environmental Info:
K3s Version:
1.21.5+k3s2
Node(s) CPU architecture, OS, and Version:
Linux k3s-85fc 5.11.0-18-generic #19-Ubuntu SMP Fri May 7 14:22:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
As per above. 3 nodes, 1 master and 2 workers. This is a fresh install of k3s on Ubuntu 21.04.
Describe the bug:
With a fresh install of k3s on Ubuntu 21.04 vxlan is not working. AKA pods are not able to talk to each other across flannel. I have stopped and disabled ufw.
Steps To Reproduce:
k3sup install --ip $server --user $user --k3s-extra-args '--no-deploy traefik --debug' --cluster --k3s-channel $k3s_channel --local-path ~/.kube/config
Expected behavior:
pods talk / ping across the nodes.
Actual behavior:
no ping
Additional context / logs:
journalct does not show any logs.
net.ipv4.ip_forward = 1
is enabled.pings are not working.
root@k3s-8fea:~# ip a list cni0 | grep -w inet inet 10.42.0.1/24 brd 10.42.0.255 scope global cni0
Updated both servers
Backporting
no . This seems to be tide to Ubuntu 21.04. This works on 20.10.
The text was updated successfully, but these errors were encountered: