Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s commands timeout in Raspberry Pi 3 #2822

Closed
felipecruz91 opened this issue Jan 17, 2021 · 13 comments
Closed

k3s commands timeout in Raspberry Pi 3 #2822

felipecruz91 opened this issue Jan 17, 2021 · 13 comments

Comments

@felipecruz91
Copy link

felipecruz91 commented Jan 17, 2021

Environmental Info:
K3s Version:

k3s version v1.20.2+k3s1 (1d4adb0)
go version go1.15.5

Node(s) CPU architecture, OS, and Version:
Linux raspberrypi 5.4.83-v7+ #1379 SMP Mon Dec 14 13:08:57 GMT 2020 armv7l GNU/Linux

Cluster Configuration:
Just one Raspberry Pi 3 Model B acting as master node (no worker nodes).

Describe the bug:
k3s seems to be in a loop crashing (noticed high CPU consumption without having deployed a single container yet).
image

Steps To Reproduce:

  • Installed K3s:
$ sudo curl -sfL https://get.k3s.io | env K3S_KUBECONFIG_MODE="777" INSTALL_K3S_VERSION=v1.20.2+k3s1 sh -s - server --disable=merics-server

Expected behavior:
Just k3s up and running without crashing.

Actual behavior:
Simple commands such as k3s kubectl get pods time out after a long period waiting.

Additional context / logs:

image

SD Benchmark results:

root@raspberrypi:/home/pi# /usr/share/agnostics/sdtest.sh
Run 1
prepare-file;0;0;3894;7
seq-write;0;0;6737;13
rand-4k-write;0;0;2353;588
rand-4k-read;7062;1765;0;0
Sequential write speed 6737 KB/sec (target 10000) - FAIL
Note that sequential write speed declines over time as a card is used - your card may require reformatting
Random write speed 588 IOPS (target 500) - PASS
Random read speed 1765 IOPS (target 1500) - PASS
Run 2
prepare-file;0;0;3984;7
seq-write;0;0;7220;14
rand-4k-write;0;0;2331;582
rand-4k-read;7311;1827;0;0
Sequential write speed 7220 KB/sec (target 10000) - FAIL
Note that sequential write speed declines over time as a card is used - your card may require reformatting
Random write speed 582 IOPS (target 500) - PASS
Random read speed 1827 IOPS (target 1500) - PASS
Run 3
prepare-file;0;0;4180;8
seq-write;0;0;6356;12
rand-4k-write;0;0;2039;509
rand-4k-read;7149;1787;0;0
Sequential write speed 6356 KB/sec (target 10000) - FAIL
Note that sequential write speed declines over time as a card is used - your card may require reformatting
Random write speed 509 IOPS (target 500) - PASS
Random read speed 1787 IOPS (target 1500) - PASS
/usr/share/agnostics/sdtest.sh: line 37: return: can only `return' from a function or sourced script

Result of journalctl --no-pager -u k3s &> k3s.log:

https://gist.github.com/felipecruz91/9f9472a2f74df474e0ccaef8a4a8a134

Result of systemctl status k3s:

● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (start) since Sun 2021-01-17 10:43:31 GMT; 15s ago
     Docs: https://k3s.io
  Process: 1500 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 1501 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 1502 (k3s-server)
    Tasks: 38
   Memory: 229.8M
   CGroup: /system.slice/k3s.service
           ├─1502 /usr/local/bin/k3s server
           └─1532 containerd

Jan 17 10:43:47 raspberrypi k3s[1502]: I0117 10:43:47.170289    1502 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.171420    1502 pod_container_deletor.go:79] Container "18f5b5f54fd22652aa97adf448c56466d541db3071ef30ed44e98d8dd7f198bc" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.171732    1502 pod_container_deletor.go:79] Container "9114e7967a09d6f2ab9a24a28e2eb146f59ab23e90328cd2bc6d0edfd50a01fa" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.172036    1502 pod_container_deletor.go:79] Container "a1336764975585b8d2c3848cca1c1f45ef9f3d046bd03861fbeed7f3e928b554" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.172332    1502 pod_container_deletor.go:79] Container "b45af27b5e6a5442557fef3fd99149adc7fbb008b8bc4e5e020d272663f812b1" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.172605    1502 pod_container_deletor.go:79] Container "cab1c48c58bd1ec89b66cf9a680b0f7abc4c39809e7e2ae99967549833e3fa59" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.172862    1502 pod_container_deletor.go:79] Container "decd4d63140ee2578c2b4151d2004eaabfaa79d8a9dab74c377d30f0187d1ada" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: W0117 10:43:47.173190    1502 pod_container_deletor.go:79] Container "07d2dba21ce5e532f44137558e5677c8489fb669e946137f4b57779af13e32f6" not found in pod's containers
Jan 17 10:43:47 raspberrypi k3s[1502]: E0117 10:43:47.235033    1502 reflector.go:138] object-"kube-system"/"metrics-server-token-5zklh": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "metrics
Jan 17 10:43:47 raspberrypi k3s[1502]: I0117 10:43:47.289929    1502 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "config-volume" (UniqueName: "kubernetes.io/configmap
@brandond
Copy link
Member

The log file indicates that there were files (database and certs) on this node from an existing install - are these logs going all the way back to the first startup? Can you clean files off this node and then collect logs from an initial clean startup?

@felipecruz91
Copy link
Author

@brandond I have bought a new SD card (SanDisk Extreme Plus) and installed k3s. I've updated the gist's content to reflect the new logs: https://gist.github.com/felipecruz91/9f9472a2f74df474e0ccaef8a4a8a134

@felipecruz91
Copy link
Author

@brandond Do you know any k3s version that doesn't come with this issue?

@brandond
Copy link
Member

brandond commented Jan 20, 2021

I don't believe this is a K3s issue. You're getting slow IO warnings immediately on startup which isn't a good sign.

Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1541358030]: ---"Transaction prepared" 5679ms (17:23:00.952)
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1541358030]: [5.731692446s] [5.731692446s] END
Jan 18 17:23:02 raspberrypi k3s[545]: I0118 17:23:02.010779     545 trace.go:205] Trace[428925386]: "Patch" url:/api/v1/namespaces/kube-system/pods/helm-install-traefik-trwgn/status,user-agent:k3s/v1.20.2+k3s1 (linux/arm) kubernetes/1d4adb0,client:127.0.0.1 (18-Jan-2021 17:22:56.271) (total time: 5738ms):
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[428925386]: ---"Object stored in database" 5671ms (17:23:00.007)
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[428925386]: [5.738858279s] [5.738858279s] END
Jan 18 17:23:02 raspberrypi k3s[545]: I0118 17:23:02.015616     545 controller.go:609] quota admission added evaluator for: leases.coordination.k8s.io
Jan 18 17:23:02 raspberrypi k3s[545]: I0118 17:23:02.052637     545 trace.go:205] Trace[1064813453]: "GuaranteedUpdate etcd3" type:*core.Node (18-Jan-2021 17:21:49.842) (total time: 13397ms):
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1064813453]: ---"Transaction prepared" 13276ms (17:23:00.935)
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1064813453]: [13.397425568s] [13.397425568s] END
Jan 18 17:23:02 raspberrypi k3s[545]: I0118 17:23:02.055762     545 trace.go:205] Trace[1779803018]: "Update" url:/api/v1/nodes/raspberrypi,user-agent:k3s/v1.20.2+k3s1 (linux/arm) kubernetes/1d4adb0,client:127.0.0.1 (18-Jan-2021 17:21:49.831) (total time: 13411ms):
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1779803018]: ---"Object stored in database" 13407ms (17:23:00.053)
Jan 18 17:23:02 raspberrypi k3s[545]: Trace[1779803018]: [13.411719474s] [13.411719474s] END

I have a Pi3b with a "SanDisk 64GB MicroSDXC Ultra Uhs-1 Class 10" SD card. Here's what I get from sdtest.sh after running k3s for about a week:

root@pi03:~# bash /usr/share/agnostics/sdtest.sh
Run 1
prepare-file;0;0;17940;35
seq-write;0;0;17588;34
rand-4k-write;0;0;2604;651
rand-4k-read;7754;1938;0;0
Sequential write speed 17588 KB/sec (target 10000) - PASS
Random write speed 651 IOPS (target 500) - PASS
Random read speed 1938 IOPS (target 1500) - PASS
Run 2
prepare-file;0;0;17300;33
seq-write;0;0;15910;31
rand-4k-write;0;0;2201;550
rand-4k-read;7651;1912;0;0
Sequential write speed 15910 KB/sec (target 10000) - PASS
Random write speed 550 IOPS (target 500) - PASS
Random read speed 1912 IOPS (target 1500) - PASS
Run 3
prepare-file;0;0;15596;30
seq-write;0;0;17541;34
rand-4k-write;0;0;2475;618
rand-4k-read;7671;1917;0;0
Sequential write speed 17541 KB/sec (target 10000) - PASS
Random write speed 618 IOPS (target 500) - PASS
Random read speed 1917 IOPS (target 1500) - PASS

@felipecruz91
Copy link
Author

@brandond Well, I just bought a new SanDisk Extreme Plus card... and installed k3s on it (nothing more is installed and I have only flashed it once). I find it hard to believe that the SD card is not good enough :/

@felipecruz91
Copy link
Author

@brandond I have run the tests in this new SD card and all of them are passing.

root@raspberrypi:/home/pi# /usr/share/agnostics/sdtest.sh
Run 1
prepare-file;0;0;19704;38
seq-write;0;0;19895;38
rand-4k-write;0;0;3465;866
rand-4k-read;7504;1876;0;0
Sequential write speed 19895 KB/sec (target 10000) - PASS
Random write speed 866 IOPS (target 500) - PASS
Random read speed 1876 IOPS (target 1500) - PASS
/usr/share/agnostics/sdtest.sh: line 34: return: can only `return' from a function or sourced script
Run 2
prepare-file;0;0;20984;40
seq-write;0;0;20951;40
rand-4k-write;0;0;3543;885
rand-4k-read;7501;1875;0;0
Sequential write speed 20951 KB/sec (target 10000) - PASS
Random write speed 885 IOPS (target 500) - PASS
Random read speed 1875 IOPS (target 1500) - PASS
/usr/share/agnostics/sdtest.sh: line 34: return: can only `return' from a function or sourced script
Run 3
prepare-file;0;0;20924;40
seq-write;0;0;20884;40
rand-4k-write;0;0;3714;928
rand-4k-read;7470;1867;0;0
Sequential write speed 20884 KB/sec (target 10000) - PASS
Random write speed 928 IOPS (target 500) - PASS
Random read speed 1867 IOPS (target 1500) - PASS
/usr/share/agnostics/sdtest.sh: line 34: return: can only `return' from a function or sourced script
/usr/share/agnostics/sdtest.sh: line 37: return: can only `return' from a function or sourced script

@brandond
Copy link
Member

That looks much better. Are you still getting multi-second transaction times on that SD? Do you have other things running on this Pi that might be contesting with K3s for disk IO?

@felipecruz91
Copy link
Author

@brandond Still getting errors, yes. I have no other services running in the RPi (just K3s). I found this very strange as I've heard and read everywhere how easy it is to get your Kubernetes cluster up and running with K3s, but in my case, it seems to be impossible.

@brandond
Copy link
Member

brandond commented Jan 22, 2021

This being a Pi, I should ask if you are using a suitable power supply and cooling? The only other thing I can think of is that it's not getting stable power, or is overheating and throttling the CPU. I run mine in a case with a small fan, and official 2.5 amp power supply. Have you overclocked it or otherwise tweaked the configuration from stock? Are you also running an X11 display on the Pi, or is it just booted to a shell?

@felipecruz91
Copy link
Author

Good point, @brandond.

I'm checking if the RPi is under normal voltage with:

root@raspberrypi:/home/pi# vcgencmd get_throttled
throttled=0x50005

which unfortunately means it is throttled. I'll try getting a better power supply (currently using a phone charger). Thank you!

@felipecruz91
Copy link
Author

felipecruz91 commented Jan 26, 2021

@brandond I have bought the official RPi power supply and k3s is still not able to start after the RPi gets booted (the k3s service fails in a loop):

● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (start) since Tue 2021-01-26 16:54:01 GMT; 11s ago
     Docs: https://k3s.io
  Process: 730 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 731 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 732 (k3s-server)
    Tasks: 35
   Memory: 193.8M
   CGroup: /system.slice/k3s.service
           ├─732 /usr/local/bin/k3s server
           └─758 containerd

Jan 26 16:54:10 raspberrypi k3s[732]: I0126 16:54:10.502135     732 flannel.go:92] Determining IP address of default interface
Jan 26 16:54:10 raspberrypi k3s[732]: I0126 16:54:10.504451     732 flannel.go:105] Using interface with name wlan0 and address 192.168.1.131
Jan 26 16:54:10 raspberrypi k3s[732]: I0126 16:54:10.533830     732 kube.go:117] Waiting 10m0s for node controller to sync
Jan 26 16:54:10 raspberrypi k3s[732]: I0126 16:54:10.534092     732 kube.go:300] Starting kube subnet manager
Jan 26 16:54:10 raspberrypi k3s[732]: time="2021-01-26T16:54:10.575964785Z" level=info msg="labels have already set on node: raspberrypi"
Jan 26 16:54:10 raspberrypi k3s[732]: E0126 16:54:10.790731     732 node.go:161] Failed to retrieve node info: nodes "raspberrypi" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API grou
Jan 26 16:54:10 raspberrypi k3s[732]: I0126 16:54:10.888904     732 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt
Jan 26 16:54:11 raspberrypi k3s[732]: I0126 16:54:11.535150     732 kube.go:124] Node controller sync successful
Jan 26 16:54:11 raspberrypi k3s[732]: I0126 16:54:11.537120     732 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Jan 26 16:54:11 raspberrypi k3s[732]: E0126 16:54:11.966898     732 node.go:161] Failed to retrieve node info: nodes "raspberrypi" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API grou

...

Jan 26 16:55:28 raspberrypi k3s[940]: I0126 16:55:28.839882     940 flannel.go:105] Using interface with name wlan0 and address 192.168.1.131
Jan 26 16:55:28 raspberrypi k3s[940]: I0126 16:55:28.873231     940 kube.go:117] Waiting 10m0s for node controller to sync
Jan 26 16:55:28 raspberrypi k3s[940]: I0126 16:55:28.874378     940 kube.go:300] Starting kube subnet manager
Jan 26 16:55:28 raspberrypi k3s[940]: time="2021-01-26T16:55:28.928611709Z" level=info msg="labels have already set on node: raspberrypi"
Jan 26 16:55:29 raspberrypi k3s[940]: E0126 16:55:29.170740     940 node.go:161] Failed to retrieve node info: nodes "raspberrypi" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API grou
Jan 26 16:55:29 raspberrypi k3s[940]: I0126 16:55:29.257970     940 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt
Jan 26 16:55:29 raspberrypi k3s[940]: I0126 16:55:29.874663     940 kube.go:124] Node controller sync successful
Jan 26 16:55:29 raspberrypi k3s[940]: I0126 16:55:29.875067     940 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Jan 26 16:55:30 raspberrypi k3s[940]: E0126 16:55:30.296875     940 node.go:161] Failed to retrieve node info: nodes "raspberrypi" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API grou
Jan 26 16:55:32 raspberrypi k3s[940]: E0126 16:55:32.616775     940 node.go:161] Failed to retrieve node info: nodes "raspberrypi" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API grou

Full logs here: https://gist.github.com/felipecruz91/70aee615f7a2659ac7266ed1c5da90f3

@brandond
Copy link
Member

Still seeing some excessively long timeouts here:

Jan 26 16:58:15 raspberrypi k3s[1359]: Trace[1324985875]: [10.001888637s] [10.001888637s] END

Have you tried formatting the SD card and starting over with a clean install in case some of the voltage dips caused problems? I've killed several cards with poor power supplies over the years.

@stale
Copy link

stale bot commented Jul 30, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 30, 2021
@stale stale bot closed this as completed Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants