Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Kubernetes wont start after removing first node #2585

Closed
Iliasb opened this issue Jul 27, 2022 · 5 comments
Closed

[Question] Kubernetes wont start after removing first node #2585

Iliasb opened this issue Jul 27, 2022 · 5 comments

Comments

@Iliasb
Copy link

Iliasb commented Jul 27, 2022

Hello everyone,

I recently had to remove the fist node I added to our Harvester cluster. (Hardware failure)
I was able to put the Node in maintenance mode before removing it from the dashboard.

After removing the node in the dashboard the cluster went down. VIP address is unavailable.

When I log on to the running nodes it seems that Kubernetes is also down.

 systemctl status rke2-server.service
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; disabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/rke2-server.service.d
             └─override.conf
     Active: activating (auto-restart) (Result: exit-code) since Wed 2022-07-27 15:15:57 UTC; 742ms ago
       Docs: https://github.com/rancher/rke2#readme
    Process: 24598 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited>    Process: 24607 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 24608 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 24609 ExecStartPre=/usr/sbin/harv-update-rke2-server-url server (code=exited, status=0/SUCCESS)
    Process: 24611 ExecStart=/usr/local/bin/rke2 server (code=exited, status=1/FAILURE)
    Process: 24632 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (container>   Main PID: 24611 (code=exited, status=1/FAILURE)
kubectl get vm -n harvester-system
W0727 15:15:34.367736   24316 loader.go:221] Config not found: /etc/rancher/rke2/rke2.yaml

What is the best way to debug this? Little bit stuck here

Thanks

@ibrokethecloud
Copy link
Contributor

@Iliasb how many nodes did you have in your cluster before you removed the first node?

@Iliasb
Copy link
Author

Iliasb commented Jul 29, 2022

@Iliasb how many nodes did you have in your cluster before you removed the first node?

4 Nodes

@Iliasb
Copy link
Author

Iliasb commented Jul 29, 2022

Found the issue.
etcdserver/api/etcdhttp: /health error; no leader (status code 503)

How can I select another node as master?

@FrankYang0529
Copy link
Member

Hi @Iliasb, thanks for filing an issue here. Do you remember whether your cluster had 3 control plane nodes? If yes, you may encounter a know issue #2191. You can try the workaround in the thread #2191 (comment). Thank you.

@w13915984028
Copy link
Member

The issue is not updated/reported recently, and the farily possible root cause was identified and fixed. close now.

Feel free to reopen, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Resolved/Scheduled
Development

No branches or pull requests

4 participants