-
-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend cluster to more than 50 nodepools #894
Comments
@thomasprade It makes sense. Please have a look at the subnet creation logic in locals.tf. If you can think of a way to solve this, PR most welcome, of course it needs to be optional with a variable flag, so as to stay backward compatible. It's kind of a deep problem because of how the cluster networking works (but it's really the best solution we could find). And I believe you also have a limit of 500 nodes per Hetzner network, so I would imagine you are going to hit that pretty soon too. What you could try, is create a vpn network overlay with tailscale on top of multiple clusters. See https://docs.k3s.io/networking#multicluster-cidr-experimental. PR welcome too for that feature. As a plan c, there's also something called submariner.io that was built by the rancher team. But the tailscale solution seems better. |
I think there is a limit of 100 nodes per network: https://docs.hetzner.com/cloud/networks/faq#are-there-any-limits-on-how-networks-can-be-used
I am not sure if it is possible to expand it via a request to Hetzner |
@thomasprade Can you explain your use case in a bit more detail? It feels like you are creating a separate node pool for each node. Just wondering why you can't work with increasing the node agent_nodepools = [
{
name = "agent-small",
server_type = "cpx11",
location = "fsn1",
labels = [],
taints = [],
count = 10
}
] |
@M4t7e That is correct, each nodepool only has a single server in it. I know, this is not the intended way to use a tool like this, or even kubernetes in general, but it allows us to more easily manage both, servers and application deployments. So technically my issue is pretty much an edge case, but others may hit that problem as well, so a fix or at the least a warning/disclaimer is in order. |
I subscribed to this one as I am going to have a similar issue in the future, and for similar reasons. If there was a way to label nodes in groups of three when increasing the
Refactoring the monolithic app that runs on it is certainly the ideal solution, but that has to be a longer term objective. |
@mysticaltech While debugging a few attempts at the subnet creation logic, i also found, that creating the 42th nodepool results in an ip-address conflict with the default I tried setting the So I added the argument to the k3s server start command in locals.tf:69 with This definitely isn't a solution for the whole problem, that @maggie44 is also facing, but at least it allows to create nodepools up to the limit of 50 in total (for now). |
@thomasprade That's a nice observation! I've found another problem recently with The issue here is that the Hetzner Network Routes for the Pod ranges are still created by K3s (my assumption) and they are totally decoupled from Ciliums network assignments. I still try to debug all the details, but my guess is that routed outbound connections (without SNAT) to/via the Hetzner Network from the Pods can break, because the Hetzner Router does not have the correct Pod routes for each node (still the K3s network routes that are derived from the default In general my expectation was that K3s decides which ranges a Node is allowed to use for the Pods, sets the Hetzner Network Routes accordingly and the CNI should just use what K3s assigned to the node. Here I have not enough experience with K3s and additional CNI network assignments and still trying to figure out what are the best practices here... Whatever we do here, we should try to follow best practices and find a solution that will work for all CNIs in a generic way. |
@M4t7e good to know, that using Cilium the issue of the pod IPs is different. In #902 I added the use of the custom cluster- and service-cidrs to the k3s start command, which results in k3s reliably using an IP range other than A component, that is, afaik, also involved in the pod IP scheduling, or rather the addition of routes you mentioned, is the hcloud controller manager. Right now I'm testing some solutions to omit the creation of subnets for each nodepool altogether, i.e. just one for all agents and control-planes respectively. |
I'm thinking something like this may be the best solution for me:
For @thomasprade, without knowing the specifics of the use case, this might look something like:
A deployment for customer 1 could then be created, and the deployment could use a NodeSelector based on the label "customer1". Probably also it's own namespace. Then the namespace could be deleted upon a customer leaving and the nodes deprovisioned:
Not super confident in this particular syntax being the best, but as an example. Think it is safe to say the two of us are looking for particularly niche setups, a better way to articulate this as a feature for the kube-hetzner project would be:
It wouldn't mean we would be able to use 255 node pools as originally thought but permits a similar effect. |
@M4t7e @thomasprade It is indeed the cloud controller, in our case the hcloud cloud controller that assigns the routes. You can see them its pod's logs, and also while runnig In the past with Cilium, in the past, I was able to make it work with the Hetzner network directly without overlay network (native). It was not partly working but not super stable, but recently a user found out that the mtu for cilium was not set correctly and that is now fixed, so maybe that was the cause. |
@thomasprade I submitted your issue to GPT-4 out of curiousity, and you may find it interesting. I think it's really worth it considering tweaking your follow, here what it said: While it's a deviation from the conventional use of Kubernetes, your reasons for such an approach are understood. Let's explore some suggestions and thoughts:
In conclusion, while unconventional, the flexibility of Kubernetes allows for various deployment strategies. What's crucial is aligning these strategies with business needs while being aware of best practices and potential challenges. |
@maggie44 Let's open a proper feature requests for what you need in the issues section, as it seems related to the node labels we talked about tthe under day. (In discussions, I may forget about it, but not in the issues). |
@thomasprade Closing this for now, as there is not much we can do. The limitation has you mentioned above is from the Hetzner network. I will adjust the docs to give proper expectations. |
@mysticaltech In my PR #902 I already updated/clarified the documentation in the I will also further investigate a potential solution for the subnet limitation, and if I get to something practical I will open a new PR. Thanks for the help and for the suggestions above 👍 |
Description
I have an existing cluster with ~40 nodepools. After adding ~15 more nodepools to the agent-nodepool list
terraform apply
failed with the following error message:The documentation in the
kube.tf
file states, that the maximum number of nodepools is 255 in total.But since a new subnet is created for every nodepool, the limit for maximum subnets set by hetzner, which is 50, prevents the creation of that many nodepools.
I could not find any option to configure this behaviour. So apparently the actual limit of combined nodepools is 50 due to the hetzner subnet limitation.
A potential solution to this would be to add an optional
subnet
parameter to the nodepool configuration, so one can configure multiple nodepools to use the same internal subnet, or to disable the creation of subnets for the nodepools alltogether and just add all nodes to the overlaying10.0.0.0/8
subnet.Kube.tf file
Screenshots
No response
Platform
Ubuntu
The text was updated successfully, but these errors were encountered: