k3s node fails to rejoin HA cluster when ip changes #11778

saviorallen · 2025-02-12T22:50:20Z

Environmental Info:
K3s Version:

k3s version v1.30.6+k3s1 (1829eaa)
go version go1.22.8

Node(s) CPU architecture, OS, and Version:
Linux r01-k3s01 5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 nodes running on Ubuntu 22.04.2 LTS. Nodes get addresses via DHCP. During install the first node is initialized with curl -sfL https://get.k3s.io | sh -s - server --cluster-init --tls-san fqdn.of.haproxy.lb. and the remaining nodes are added via curl -sfL https://get.k3s.io | sh -s server --server https://fqdn.of.haproxy.lb.

fqdn.of.haproxy.lb is an haproxy instance running in a separate VM doing round-robin balancing to all nodes on 6443, using their hostnames.

Describe the bug:
If one of the nodes is shut down and has a different IP when it comes back up, it fails to rejoin the cluster. The journalctl log on the failing node is filled with repeated messages stating Failed to test data store connection: this server is a not a member of the etcd cluster. Found [r01-k3s03-6d718e3e=https://192.168.115.8:2380 r01-k3s02-c5f4597e=https://192.168.115.7:2380 r01-k3s01-94ae9d73=https://192.168.115.129:2380], expect: r01-k3s02-c5f4597e=https://192.168.115.9:2380"

The node in question in this case is k3s02 and as can be seen, the IP has changed from 192.168.115.7 to 192.168.115.9.

Steps To Reproduce:
Install a cluster as described, shut down one node, and bring it up with a different address, for example by changing it's DHCP reservation.

Expected behavior:
The node should successfully rejoin the cluster. My understanding is that node IPs can be freely changed in this way and static IPs / DHCP reservations are not needed.

Actual behavior:
The node did not rejoin the cluster.

Additional context / logs:
I routinely bring up and shut down clusters using a set of in-house IAAS scripts using terraform, ansible, etc. This has been working fine for a few years but I never had occasion to test shutting down and bringing any nodes back up before now.

The number of nodes in each cluster and the number of clusters in total is variable. I would really like to understand how to make this work if this is user error and not an actual bug without having to set up worst-case-number-of-clusters x worst-case-number-of-nodes-per-cluster DHCP reservations and statically assigned MACs for all of them.

The text was updated successfully, but these errors were encountered:

brandond · 2025-02-12T23:34:16Z

Duplicate of How to reconfigure the HA node when the associated internal IP is changed ? #4233

Your nodes, and ESPECIALLY your server nodes, must have fixed IP addresses. Whether you use static IPs, or DHCP reservations, or something else, doesn't matter. They just can't have IPs that change randomly or whenever it restarts.

If the IPs do change, you need to delete the node from the cluster and rejoin it. If all members have new IPs, you'd need to do a cluster-reset on the first node, and then rejoin the others.

saviorallen · 2025-02-12T23:47:02Z

Is this documented anywhere? The only documentation stating this that I've found for k8s or k3s says that while the server address given to the nodes during init has to be fixed, and that a load balancer is recommended, fixed addresses for the nodes themselves does not seem to be mentioned anywhere.

Online discussions all over the place have people saying opposite things, some saying as you did that they must be unique, some saying that they do not need to be.

Mentioning this in the setup/config/requirements for the project would really be helpful.

brandond · 2025-02-13T00:23:33Z

I guess not, but given that it's only come up a handful of time I guess most folks just tend to have suitable environments? I am not sure why someone would want a server whose address changes randomly.

Node names and IPs definitely need to be unique. They should also be static.

saviorallen · 2025-02-13T00:38:16Z

I guess not,

I'll file a doc request then.

but given that it's only come up a handful of time

Maybe in the issue tracker but seems common enough elsewhere.

I am not sure why someone would want a server whose address changes randomly.

If you don't mind me getting a little flippant: Because we've had DNS for over 40 years now, DHCP for over 30 years, and I don't care (or understand why I should) what addresses backend servers have when I only connect to them via their names or the name of the load balancer in front of them.

The environment I described in the original post is a good enough reason for me. Every week I spin up clusters from 3 to 9 nodes and usually have 2 or 3 of them going at any one time. Not having DNS "working" here and requiring static IPs means I need to:

Change the IaaS tooling to consistently generate static MACs based on node name.
Enter 4 (clusters) x (9 (nodes) + 1 (haproxy)) = 40 IP addresses & MAC addresses into the DHCP reservations list.

That's a fair bit of tedious busy work that would be entirely mitigated if the cluster could just be configured to use names instead of addresses. My DNS is reliable, and if it's not, that seems like my problem to handle.

Node names and IPs definitely need to be unique. They should also be static.

Of course.

Thanks for hearing out the rant at least.

github-project-automation bot added this to K3s Development Feb 12, 2025

github-project-automation bot moved this to New in K3s Development Feb 12, 2025

brandond closed this as completed Feb 12, 2025

github-project-automation bot moved this from New to Done Issue in K3s Development Feb 12, 2025

This was referenced Feb 13, 2025

Add static IP requirement to documentation k3s-io/docs#384

Open

Allow nodes to communicate with one another by name rather than IP, when told to do so. #11779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s node fails to rejoin HA cluster when ip changes #11778

k3s node fails to rejoin HA cluster when ip changes #11778

saviorallen commented Feb 12, 2025

brandond commented Feb 12, 2025 •

edited

Loading

saviorallen commented Feb 12, 2025

brandond commented Feb 13, 2025 •

edited

Loading

saviorallen commented Feb 13, 2025

k3s node fails to rejoin HA cluster when ip changes #11778

k3s node fails to rejoin HA cluster when ip changes #11778

Comments

saviorallen commented Feb 12, 2025

brandond commented Feb 12, 2025 • edited Loading

saviorallen commented Feb 12, 2025

brandond commented Feb 13, 2025 • edited Loading

saviorallen commented Feb 13, 2025

brandond commented Feb 12, 2025 •

edited

Loading

brandond commented Feb 13, 2025 •

edited

Loading