-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rke2-server isn't starting up after restarting cluster node #6644
Comments
The etcd and kube-apiserver static pods do not appear to be running. Check the logs under /var/log/pods to see why. You might also check kubelet.log. |
Yeah I didn't detect anything that was very helpful Command: Output: Command: Output: |
Please just attach the files, a few lines from the tail isn't really enough to figure out what's going on. The messages from etcd suggest that it is running EXTREMELY slowly though. What are the CPU and IO load on this node? If this is a single-node cluster, there should be no delays in RAFT operations. |
I was having the same issue today, but I kept looking and found that I had been using the gzipped rke2 binary, which didn't have any suffix attached to it. Run the linux 'file' command on your rke2 binary to verify that you're not doing the same. If it's a gzip file, then the solution is to simply extract it and then move the bin/rke2 in place. |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
Environmental Info:
RKE2 Version:
rke2 version v1.26.15+rke2r1 (a413a7f)
go version go1.21.8 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Rocky 8.6
Cluster Configuration:
1 server. 5 agents.
Describe the bug:
API server seems like it's failing to come up at port 6443.
Steps To Reproduce:
sudo systemctl restart rke2-server
Expected behavior:
rke2-server starts up normally and connects to agent nodes.
Actual behavior:
Command:
netstat -ltn
Output:
Shows that etcd is listening on port 2379
tcp 129 0 127.0.0.1:2379 0.0.0.0:* LISTEN
Command:
curl -k https://127.0.0.1:2379/health
Output:
Times out
Command:
journalctl -u rke2-server -f
Output:
Aug 27 16:31:32 OurServer rke2[132272]: {"level":"warn","ts":"2024-08-27T16:31:32.345951-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000d29500/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Aug 27 16:31:32 OurServer rke2[132272]: time="2024-08-27T16:31:32-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded" Aug 27 16:31:32 OurServer rke2[132272]: time="2024-08-27T16:31:32-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error" Aug 27 16:31:35 OurServer rke2[132272]: time="2024-08-27T16:31:35-04:00" level=error msg="Sending HTTP 503 response to 192.168.1.111:53002: runtime core not ready" Aug 27 16:31:37 OurServer rke2[132272]: time="2024-08-27T16:31:37-04:00" level=error msg="Sending HTTP 503 response to 192.168.1.115:50932: runtime core not ready" Aug 27 16:31:37 OurServer rke2[132272]: time="2024-08-27T16:31:37-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error" Aug 27 16:31:40 OurServer rke2[132272]: time="2024-08-27T16:31:40-04:00" level=error msg="Sending HTTP 503 response to 192.168.1.111:53158: runtime core not ready" Aug 27 16:31:41 OurServer rke2[132272]: {"level":"warn","ts":"2024-08-27T16:31:41.599955-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000d29500/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: i/o timeout\""} Aug 27 16:31:41 OurServer rke2[132272]: {"level":"info","ts":"2024-08-27T16:31:41.600009-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"} Aug 27 16:31:41 OurServer rke2[132272]: time="2024-08-27T16:31:41-04:00" level=info msg="Waiting for etcd server to become available" Aug 27 16:31:41 OurServer rke2[132272]: time="2024-08-27T16:31:41-04:00" level=info msg="Waiting for API server to become available"
Additional context / logs:
We have disabled firewalld on all the server and agent nodes.
/etc/rancher/rke2/rke2.yaml
/etc/rancher/rke2/registries.yaml
The text was updated successfully, but these errors were encountered: