-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flannel fails to communicate between pods after node reboot #1474
Comments
I'm seeing the same issue here. Deleting flannel pods fixes the issue for me but is super annoying.
Using Ubuntu 20.04.3 nodes (VMs), Kubernetes 1.22.1 and I was successfully running Flannel on old version of Kubernetes <=v1.19 for several quarters, never noticed this behavior, it all started after upgrading Kubernetes and Flannel, not sure which one is the culprit. |
Oh, thanks for reporting! I'm glad I'm not the only one :) The problem appears on my Debian Bullseye machine as well: Kubernetes: v1.21.3
I'm running v1.21.3 and Kubelet talks to No issues on Debian Buster with |
Same issue with Debian Buster + backport kernel (5.10.46-4~bpo10+1) and Kubernetes 1.19.4, using the 'extension' backend. |
Also having this issue with Fedora CoreOS 34.20210808.3.0. Works fine if I restart all flannel pods, but very troublesome that I have to do this every time I need to take a node offline for maintenance. |
I had this issue and got tips about it be connected to MACAddressPolicy. Default for Fedora is MACAddressPolicy=persistent. By setting MACAddressPolicy=none for flannel interface, connection between nodes works fine after reboot.
https://www.freedesktop.org/software/systemd/man/systemd.link.html#MACAddressPolicy= |
That didn't fix it for me. |
Hey @mpartel, would you happen to have specifics about you implemented this? Also having this problem and it is really annoying having to restart flannel all the time. |
Sorry, I can't share the code (it'd be tangled with stuff specific to our setup anyway). A bit more detail: I run a daemonset that loops doing |
@DrEngi I put this together based off @mpartel 's description:
I haven't tested it though. |
I am experiencing the same issue after upgrading nodes in a kubernetes cluster from Debian buster to bullseye. The version of the flannell image is v0.13.0-rancher1 . |
Same issue with
|
Problem appears on Debian Bullseye (Debian 11) with Kernel It seems to be a problem with a |
Looks like this PR may fix this issue: #1485 |
Yes, that PR should fix it. I have just created a release v0.15.1. Let's close the issue |
@manuelbuil what's the best way to upgrade to the latest version on an existing cluster? |
I'd edit the daemonset and point to the new image |
It seems the Images are not yet publicly available, Last version on Quay is flannel:v0.15.0 |
@rajatchopra could you please push the v0.15.1 images to the repo? |
Hello @manuelbuil , still unable to update Flannel to 0.15.1. Seems the image isn't pushed yet |
@rajatchopra is the person with the permissions to push the image |
I tested out |
No interpod communication works after nodes are restarted. Requires docer running on each node to be manually stoped & started.
Expected Behavior
Pods should work fine
Current Behavior
DNS and all other connections timeout when trying to reach other pods
Possible Solution
Not sure, that's why I'm here!
Steps to Reproduce (for bugs)
Full steps from fresh Ubuntu install and details are here: kubernetes/kubernetes#104645 but TL;DR:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
. It workskubectl exec -i -t dnsutils -- nslookup kubernetes.default
in the pod on the Node that restarted. It fails with;; connection timed out; no servers could be reached
Context
New to Kubernetes and this was really annoying to figure out. Went down so many wrong paths. Took ages to figure out what was going on. Learned a lot though. I have tried these solutions with no success:
Flannel logs (See line entry:
I0828 09:00:22.327495
):Your Environment
vxlan
is a word I see in the logs, so guessing that one.The text was updated successfully, but these errors were encountered: