You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This creates a few problems for Talos or any other VM which has no google-agent running.
It does not know that its needs to accept packets with the source ip loadbalancer. And therefor drops it
Return traffic should always have the loadbalancer's source ip. Otherwise this would create problems with statefull firewall like the one in GCP. If the Talos node responds with his normal interface ip, the statefull firewall cannot match the session and tries to create a new one. (which probably will be blocked as these source/destination ip/port tuple is normally not in the firewall rules)
The Solution
Source IPs that can be delivered to the Talos node by the ILB are stated in the GCE metadata. Talos could watch for this field and add/remove routes on the interface with scope local.
Conditions
This solution should only be active when:
its running on a worker node (not needed for Cilium but maybe other CNIs need it to support ILB). This solution is not hindering CNIs as google-agent is also running on GKE nodes.
A control-plane node after the local kube-api server has been successfully bootstrapped. Otherwise it will never bootstrap because it cannot reach a working k8s api server on the loadbalancing ip). @smira suggested to be started after kube-apiserver is started, satisfies this condition.
Tried workarounds
After discussing this with @rsmitty and @smira in the #support slack channel we came up with a workaround using:
This works with non-strict firewall rules. However this creates async traffic as the return traffic has the source ip of the DHCP address in stead of the Load Balancer. (This currently blocks my whole project from going forward)
Using google cloud controller manager to create a loadbalance service in K8S to create an ILB for the kubernetes api only addresses the source ip problem. As now the CNI (cilium in my case) will take care of the source ip, and this gets now accepted by the node. However, as kube-apiserver is running on HostNetwork, return traffic will not be flowing back to the CNI. And the source port + source ip are not rewritten. Thereby creating async traffic because the source ip & the source port (6443 in stead of the nodeport used by the GCE Cloud Controller Manager) are not matchable by the statefull firewall.
Another solution would be to create a yaml patch with the route just like the google-agent would create dynamically. However, there is currently no way to create a route with scope local in Talos. (if it would it would help me really out here)
Example code
To prove my dire need for this feature asap, I started to create some code. Although I can create some simple go code, I do not feel at all comfortable to create some hooks where I need them.
Also it needs scope local route support in talos route functions. But it does provide all the metadata and logic parts. See nberlee@d5f6cad
However, feel free to totally rewrite it.
The text was updated successfully, but these errors were encountered:
as an option to solve it - is assign VIP address on all control plane nodes manually (ip alias).
But it can be done after etcd join event. Because Talos cannot find the other neighbors thought VIP address. (discovery service is off in this case)
To make it automatically by Talos:
at boot time wait the etcd join event and assign to VIP address to the lo interface.
And probably run this (not in the cloud case) to disable arp announce:
Feature Request
Talos should be mimicking the google-agent (watch metadata for forwardedIPs and add them to the routing table with the scope local) in order to support ILB for at least the kubernetes api endpoint.
The Problem
The only way in GCP to create a load-balancer without a public internet address is to use a Internal Load Balancer (ILB). An ILB does not use tcp-proxy for LB, but forwards the packet directly to the backend, without modification, and expects the backend to accept this packet and leverage direct server return (DSR)
This creates a few problems for Talos or any other VM which has no google-agent running.
The Solution
Source IPs that can be delivered to the Talos node by the ILB are stated in the GCE metadata. Talos could watch for this field and add/remove routes on the interface with scope local.
Conditions
This solution should only be active when:
Tried workarounds
After discussing this with @rsmitty and @smira in the #support slack channel we came up with a workaround using:
This works with non-strict firewall rules. However this creates async traffic as the return traffic has the source ip of the DHCP address in stead of the Load Balancer. (This currently blocks my whole project from going forward)
Using google cloud controller manager to create a loadbalance service in K8S to create an ILB for the kubernetes api only addresses the source ip problem. As now the CNI (cilium in my case) will take care of the source ip, and this gets now accepted by the node. However, as kube-apiserver is running on HostNetwork, return traffic will not be flowing back to the CNI. And the source port + source ip are not rewritten. Thereby creating async traffic because the source ip & the source port (6443 in stead of the nodeport used by the GCE Cloud Controller Manager) are not matchable by the statefull firewall.
Another solution would be to create a yaml patch with the route just like the google-agent would create dynamically. However, there is currently no way to create a route with scope local in Talos. (if it would it would help me really out here)
Example code
To prove my dire need for this feature asap, I started to create some code. Although I can create some simple go code, I do not feel at all comfortable to create some hooks where I need them.
Also it needs scope local route support in talos route functions. But it does provide all the metadata and logic parts. See nberlee@d5f6cad
However, feel free to totally rewrite it.
The text was updated successfully, but these errors were encountered: