-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods in different networks are unable to communicate between them. #1824
Comments
After a bit of investigation I am able to shed more light onto this issue. So, it turns out the problem is indeed in flannel. Even though node have private and public ip correctly reported in
node public-ip which contains node's private ip. After adding an additional annotation to the node
and restarting k3s-agent, I can see in logs
After that everything begins to work again. |
For now I've put up a deployable workaround (https://github.com/alekc-go/flannel-fixer) which involves launching a listener deployment which fixes this annotation on existing and any new node joining the cluster. Will try to chase it/debug on flannel side of things. |
@alekc did you try setting that address with |
Yes, see the first post. That's how the fix is working atm,I am fetching
external IP from the node label and assigning it to the flannel overwrite
annotation
…On Mon, 25 May 2020, 23:46 Brandon Davidson, ***@***.***> wrote:
@alekc <https://github.com/alekc> did you try setting that address with
--node-external-ip?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1824 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACJ5V5IQSQIP3TJS4A7YT3RTLYNLANCNFSM4NITGJGQ>
.
|
Ah OK, I missed that at the end of the command there. Looks like there's an open issue for RKE to do the same thing: rancher/rancher#22190 Honestly I'm surprised that Flannel invented their own annotation when kuberenetes already has an ExternalIP address type that can be set on nodes. This is actually what you're seeing in the What do you get from:
|
Yeah, I was surprised as well once i got to the root of the issue. I ran this command on a "fixed" cluster, should not matter match since my fix involves only the annotations
|
One of your nodes has a public address for it's InternalIP and no ExternalIP. If you fix that, does it work without any extra annotations? |
It would probably work, but the issue is that the third node is on another network, and private network range common to 2 other nodes is not routable for him. |
right, but if you set all the internal and external IPs correctly, I think it should figure out how to set things up properly without any hacks. That's the whole point of kubernetes having a field for the external address. The overlay should use those addresses when building the mesh to carry the pod and service network traffic. |
Let me give some more details on the matter and why I can't do any other way around (at least until flannel doesn't fix this behaviour). 2 Servers are running on Scaleway, each of those have 1 single interface with private ip assigned to them. There is no way (if you are not calling the equivalent of aws 169.254.169.254 endpoint) for those servers to know their public ip, thats why in the stage of installation, I am passing Third server is running on Online.net provider, it has only 1 interface with public ip assigned to it. It doesn't have a private ip, and private ips from Scaleway are not routable. So, in this situation I cannot really do anything more, and generally speaking in my opinion since flannel has the label public-ip it should give priority to the public address of the node if it's set and ignore the interface (I believe that currently thats how it does it). This ticket can probably be closed since this issue is not related to the k3s but to flannel, and until the underlying behaviour is changed there is not a lot we can do (well, maybe set the |
OK, even if the node doesn't have a separate private IP, couldn't you still set Also, couldn't you just do add Setting that only works at registration time so you can't add it now, but it's an easy thing to do if you're scripting builds. |
Oh I am adding Deployment only adds I will try to add a --node-external-ip to the server which doesn't have one (because it's private ip is already public) but I doubt it will change a great deal of things because if nothing is done, on other 2 nodes flannel sets unroutable private ip, so the initial handshake cannot be performed. |
I wonder what you would have seen from kubectl get node -o 'jsonpath={range.items[*].metadata.annotations}{.flannel\.alpha\.coreos\.com/public-ip}{"\n"}{end}' I see that you've fixed that up with your deployment, but if that's getting set to private addresses that would explain why you have to override it. |
That annotation had the private address of the node in it. |
Yeah, that'd do it. I bet setting the external ip on the command line will correct that. |
Nope, this is the whole point of that fixing deployment. Without it the node looks like this
|
I'm on Scaleway too. I have the same problem as @alekc . Adding the annotation "flannel.alpha.coreos.com/public-ip-overwrite" fix the problem for me too. |
I had something similar with this, although it wasnt really a problem for me but i still wanted it fixed, im using DO (digitalocean) and have a private ips for the cluster that it can talk with, the problem was that flannel would default to the external adress instead of the internal one, which caused it to use the public interface. My workaround was to use A thought i had in mind, could flannel just default to the first interface it finds which causes this kind of issue? |
Any update on this? Both internal-ip and external-ip are set correctly on all nodes but flannel communication does not work because external flannel address (public-ip flannel param) is set here: k3s/pkg/agent/flannel/flannel.go Line 114 in e5244f3
Either using the external ip here or providing an option to override this like --flannel-public-ip would suffice.
|
FYI, I had a similar issue and it had to do with using the latest ubuntu 21 and k3s, flannel will try and program a virtual interface and so will systemd and they conflict. Flannel then goes about and publishes arp tables with an incorrect mac address and hilarity ensues, see: flannel-io/flannel#1155 |
Current issue is specific to the flannel implementation when agents/nodes are behind NAT, as demostrated in current the code |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
This might be related to the discussion on #881, but with some slight differences.
Version:
K3s arguments:
Describe the bug
I am trying to run a cluster which has some nodes hosted on scaleway vps and some in another network/provider.
On scaleway nodes are assigned private ips, but they also have a routable public ip (1:1) and by passing a flag
node-external-ip
I am able to have a proper internal and external ip assigned to nodes, i.e.(above the third node is the master and 2 scaleway's nodes are workers, it was a test to see if having master on public ip solves this issue or not. Normal situation is that 2 scaleway nodes are master and the k3s-online-01 is slave)
Third node which is situated on another provider has only public ip address.
For test purposes all nodes have been wiped clean, they also had the same OS version (debian buster), no firewall is set up.
The issue is that until I remain in the Scaleway realm, everything works as expected. When I add an external node from a different network it shows up as ready in the cluster, I am able to deploy pods to it and exec into a shell.
However I am unable to reach any of the other network, dns resolution doesnt' work on any address (kubernetes or outside word), and I am only able to ping resources in WWW.
Is there something I am missing? I tried flannel with default/ipsec/wireguard setting with no success so far.
The text was updated successfully, but these errors were encountered: