k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)" #1403

bokysan · 2020-02-09T20:21:59Z

Version: k3s version v1.17.2+k3s1 (cdab19b0)

Description:

k3s master fails to start with in the log "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)"

This has happened after the machines were forcefully shut down (power loss). There's no info on the web on how to resolve this error or what to do next.

To Reproduce:

install cluster using Ansible scripts on at least two nodes
unplug power (I guess?)

Expected behavior:

cluster survives power outages / gives a clear path how to restore it manually

Actual behavior:

cluster doesn't startup anymore

Additional context

k3s is (was) running on a cluster of TWO machines
k3s non-master node seems to start up successfully
k3s is installed on almost clean Armbian, on Pine64
cluster was working fine before the power loss

uname -a
Linux ariana 5.4.7-sunxi64 #19.11.6 SMP Sat Jan 4 19:40:10 CET 2020 aarch64 GNU/Linux


lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster


cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target
[Service]
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s server --cluster-init --write-kubeconfig-mode 664
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

/var/log/syslog:

...
Feb  9 00:00:12 ariana systemd[1]: Starting Lightweight Kubernetes...
Feb  9 00:00:12 ariana systemd[1]: Started Lightweight Kubernetes.
Feb  9 00:00:13 ariana k3s[3961]: time="2020-02-09T00:00:13.429349422Z" level=info msg="Starting k3s v1.17.2+k3s1 (cdab19b0)"
Feb  9 00:00:16 ariana k3s[3961]: time="2020-02-09T00:00:16.592512841Z" level=fatal msg="starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)"
Feb  9 00:00:16 ariana systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Feb  9 00:00:16 ariana systemd[1]: k3s.service: Failed with result 'exit-code'.
Feb  9 00:00:21 ariana systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Feb  9 00:00:21 ariana systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5380.
Feb  9 00:00:21 ariana systemd[1]: Stopped Lightweight Kubernetes.
...

The text was updated successfully, but these errors were encountered:

Kampe · 2020-02-27T18:27:11Z

seeing the same issues, I was purposefully deleting master nodes at various intervals and discovered this on reboot after a couple of times.

brandond · 2020-02-27T19:15:57Z

This appears to be the upstream dqlite issue: canonical/dqlite#190

dqlite is still experimental; there does not appear to be a way to recover from this at the moment. If you need more production-ready HA you should probably be using an external DB.

Also, a two-node dqlite cluster won't meet Raft consensus requirements (no quorum if one goes down) so this setup probably won't ever work as expected.

stale · 2021-07-30T23:45:54Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

rancher-max mentioned this issue Nov 3, 2020

Pods not deleted after its Rancher App is deleted #2048

Closed

mnorrsken mentioned this issue Mar 9, 2021

cAdvisor stops updating metrics after k3s upgrade #3035

Closed

ctigersre mentioned this issue Mar 29, 2021

k3s server won't allow connections to 6443 on CentOS 8.3 minimal install #3118

Closed

clambin mentioned this issue Apr 6, 2021

CoreDNS not working #3153

Closed

stale bot added the status/stale label Jul 30, 2021

stale bot closed this as completed Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)" #1403

k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)" #1403

bokysan commented Feb 9, 2020

Kampe commented Feb 27, 2020

brandond commented Feb 27, 2020 •

edited

Loading

stale bot commented Jul 30, 2021

k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)" #1403

k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)" #1403

Comments

bokysan commented Feb 9, 2020

Kampe commented Feb 27, 2020

brandond commented Feb 27, 2020 • edited Loading

stale bot commented Jul 30, 2021

brandond commented Feb 27, 2020 •

edited

Loading