Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource temporarily unavailable: Due to logs filling up /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io #4327

Closed
prashilgupta opened this issue Oct 26, 2021 · 6 comments

Comments

@prashilgupta
Copy link

Environmental Info:
K3s Version:

kubectl version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+k3s1", GitCommit:"75dba57f9b1de3ec0403b148c52c348e1dee2a5e", GitTreeState:"clean", BuildDate:"2021-05-21T16:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+k3s1", GitCommit:"75dba57f9b1de3ec0403b148c52c348e1dee2a5e", GitTreeState:"clean", BuildDate:"2021-05-21T16:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Node(s) CPU architecture, OS, and Version:

uname -a

Linux ztna-gateway 4.15.0-132-generic #136-Ubuntu SMP Tue Jan 12 14:58:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 nodes:

kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
8166b995-e5cd-4358-a7c3-3db4a2796a30 Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.116 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2
e3de9562-ff8a-4ccc-91ea-2183895edf10 Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.115 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2
f702c5aa-c5ae-4322-8536-ff2f012cfeef Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.117 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2

Describe the bug:
/run partition is filling up causing containerd and kubernetes to be unusable

df -ah /run

Filesystem Size Used Avail Use% Mounted on
tmpfs 395M 395M 0 100% /run

kubectl get pods -A

bash: fork: retry: Resource temporarily unavailable
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x404fcf7 m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x404fcf7
stack: frame={sp:0x7fff8dd7b3e8, fp:0x0} stack=[0x7fff8dd5c950,0x7fff8dd7b990)
00007fff8dd7b2e8: 0000000000000037 0000000000000000
00007fff8dd7b2f8: 0000000000000000 0000000000000000
00007fff8dd7b308: 0000000000000001 0000000007906d60
00007fff8dd7b318: 0000000000000037 0000000007c42e20
00007fff8dd7b328: 00007fff8dd7b360 00007fff8dd7b3f0
00007fff8dd7b338: 000000000405281f 0000000000000000
00007fff8dd7b348: 000000000550fc96 00007fff00000000
00007fff8dd7b358: 00007fff8dd7b378 0000003000000010
00007fff8dd7b368: 00007fff8dd7b5c0 00007fff8dd7b4f0
00007fff8dd7b378: 0000000000000000 0000000000000000
00007fff8dd7b388: 0000000000000000 0000000000000000
00007fff8dd7b398: 0000000000000000 5f64616572687470
00007fff8dd7b3a8: 6620657461657263 52203a64656c6961
00007fff8dd7b3b8: 20656372756f7365 7261726f706d6574
00007fff8dd7b3c8: 76616e7520796c69 00656c62616c6961
00007fff8dd7b3d8: 000000000042c825 <runtime.(*pageAlloc).update+1285> 000000000550fc96
00007fff8dd7b3e8: <000000000404fd38 0000000000000000
00007fff8dd7b3f8: 000000000404d1b0 0000000000000000
00007fff8dd7b408: 00007f6bbb506000 00007f6bbb506000
00007fff8dd7b418: 000000000405488d 0000000000000001
00007fff8dd7b428: 000000000405069e 0000000007c42e20
00007fff8dd7b438: 0000000000000000 00007fff8dd7b49f
00007fff8dd7b448: 0000000000000001 0000000000000001
00007fff8dd7b458: 0000000007906d60 000000000000000a
00007fff8dd7b468: 0000000007906dec 00007fff8dd7b690
00007fff8dd7b478: 00007fff8dd7b7a8 0000000007906d60
00007fff8dd7b488: 000000000404b79f 0000000000000000
00007fff8dd7b498: 0a007fff8dd7b7a8 0000000007906d60
00007fff8dd7b4a8: 0000000004050b41 00007fff8dd7b7a8
00007fff8dd7b4b8: 0000000007906d60 000000000550fc96
00007fff8dd7b4c8: 0000000003fa2f8a 0000003000000008
00007fff8dd7b4d8: 00007fff8dd7b5c0 00007fff8dd7b4f0
runtime: unknown pc 0x404fcf7
stack: frame={sp:0x7fff8dd7b3e8, fp:0x0} stack=[0x7fff8dd5c950,0x7fff8dd7b990)
00007fff8dd7b2e8: 0000000000000037 0000000000000000
00007fff8dd7b2f8: 0000000000000000 0000000000000000
00007fff8dd7b308: 0000000000000001 0000000007906d60
00007fff8dd7b318: 0000000000000037 0000000007c42e20
00007fff8dd7b328: 00007fff8dd7b360 00007fff8dd7b3f0
00007fff8dd7b338: 000000000405281f 0000000000000000
00007fff8dd7b348: 000000000550fc96 00007fff00000000
00007fff8dd7b358: 00007fff8dd7b378 0000003000000010
00007fff8dd7b368: 00007fff8dd7b5c0 00007fff8dd7b4f0
00007fff8dd7b378: 0000000000000000 0000000000000000
00007fff8dd7b388: 0000000000000000 0000000000000000
00007fff8dd7b398: 0000000000000000 5f64616572687470
00007fff8dd7b3a8: 6620657461657263 52203a64656c6961
00007fff8dd7b3b8: 20656372756f7365 7261726f706d6574
00007fff8dd7b3c8: 76616e7520796c69 00656c62616c6961
00007fff8dd7b3d8: 000000000042c825 <runtime.(*pageAlloc).update+1285> 000000000550fc96
00007fff8dd7b3e8: <000000000404fd38 0000000000000000
00007fff8dd7b3f8: 000000000404d1b0 0000000000000000
00007fff8dd7b408: 00007f6bbb506000 00007f6bbb506000
00007fff8dd7b418: 000000000405488d 0000000000000001
00007fff8dd7b428: 000000000405069e 0000000007c42e20
00007fff8dd7b438: 0000000000000000 00007fff8dd7b49f
00007fff8dd7b448: 0000000000000001 0000000000000001
00007fff8dd7b458: 0000000007906d60 000000000000000a
00007fff8dd7b468: 0000000007906dec 00007fff8dd7b690
00007fff8dd7b478: 00007fff8dd7b7a8 0000000007906d60
00007fff8dd7b488: 000000000404b79f 0000000000000000
00007fff8dd7b498: 0a007fff8dd7b7a8 0000000007906d60
00007fff8dd7b4a8: 0000000004050b41 00007fff8dd7b7a8
00007fff8dd7b4b8: 0000000007906d60 000000000550fc96
00007fff8dd7b4c8: 0000000003fa2f8a 0000003000000008
00007fff8dd7b4d8: 00007fff8dd7b5c0 00007fff8dd7b4f0

goroutine 1 [runnable, locked to thread]:
text/template/parse.(*lexer).nextItem(...)
/usr/local/go/src/text/template/parse/lex.go:195
text/template/parse.(*Tree).next(...)
/usr/local/go/src/text/template/parse/parse.go:74
text/template/parse.(*Tree).nextNonSpace(...)
/usr/local/go/src/text/template/parse/parse.go:112
text/template/parse.(*Tree).action(0xc0002c59e0, 0xc0002e62a0, 0xa)
/usr/local/go/src/text/template/parse/parse.go:385 +0x672
text/template/parse.(*Tree).textOrAction(0xc0002c59e0, 0x0, 0x0)
/usr/local/go/src/text/template/parse/parse.go:366 +0x319
text/template/parse.(*Tree).itemList(0xc0002c59e0, 0x4bc49d1, 0x5, 0x10)
/usr/local/go/src/text/template/parse/parse.go:346 +0x2cf
text/template/parse.(*Tree).parseControl(0xc0002c59e0, 0xc0006e6c00, 0x4bc49d1, 0x5, 0x0, 0x0, 0xc00076cd80, 0x0, 0x0)
/usr/local/go/src/text/template/parse/parse.go:483 +0xf8
text/template/parse.(*Tree).rangeControl(0xc0002c59e0, 0x54456e8, 0xc0006e6c80)
/usr/local/go/src/text/template/parse/parse.go:525 +0x4c
text/template/parse.(*Tree).action(0xc0002c59e0, 0xc00060c5a0, 0x8)
/usr/local/go/src/text/template/parse/parse.go:395 +0x565
text/template/parse.(*Tree).textOrAction(0xc0002c59e0, 0x0, 0x0)
/usr/local/go/src/text/template/parse/parse.go:366 +0x319
text/template/parse.(*Tree).parse(0xc0002c59e0)
/usr/local/go/src/text/template/parse/parse.go:310 +0x247
text/template/parse.(*Tree).Parse(0xc0002c59e0, 0x4d8938f, 0xf0, 0x0, 0x0, 0x0, 0x0, 0xc00060c2d0, 0xc00074dfd0, 0x2, ...)
/usr/local/go/src/text/template/parse/parse.go:246 +0x23b
text/template/parse.Parse(0x4bc433c, 0x5, 0x4d8938f, 0xf0, 0x0, 0x0, 0x0, 0x0, 0xc00074dfd0, 0x2, ...)
/usr/local/go/src/text/template/parse/parse.go:65 +0x11d
text/template.(*Template).Parse(0xc0006e6bc0, 0x4d8938f, 0xf0, 0x8, 0x416e2c, 0x203000)
/usr/local/go/src/text/template/template.go:201 +0x825
github.com/rancher/k3s/vendor/github.com/opencontainers/runc/libcontainer.init()
/go/src/github.com/rancher/k3s/vendor/github.com/opencontainers/runc/libcontainer/generic_error.go:12 +0xa6

goroutine 6 [chan receive]:
github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).flushDaemon(0x7c0d000)
/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1164 +0x8b
created by github.com/rancher/k3s/vendor/k8s.io/klog/v2.init.0
/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:418 +0xdf

goroutine 84 [chan send]:
text/template/parse.(*lexer).emit(...)
/usr/local/go/src/text/template/parse/lex.go:157
text/template/parse.lexFieldOrVariable(0xc0007d0d00, 0x9, 0x9)
/usr/local/go/src/text/template/parse/lex.go:516 +0x147
text/template/parse.lexField(0xc0007d0d00, 0x4e03c30)
/usr/local/go/src/text/template/parse/lex.go:481 +0x34
text/template/parse.(*lexer).run(0xc0007d0d00)
/usr/local/go/src/text/template/parse/lex.go:230 +0x37
created by text/template/parse.lex
/usr/local/go/src/text/template/parse/lex.go:223 +0x14b

rax 0x0
rbx 0x0
rcx 0x404fcf7
rdx 0x0
rdi 0x2
rsi 0x7fff8dd7b3f0
rbp 0x7fff8dd7b3f0
rsp 0x7fff8dd7b3e8
r8 0xa
r9 0x55241cf
r10 0x8
r11 0x246
r12 0x550fc96
r13 0x7fff8dd7b690
r14 0x52f3864
r15 0x0
rip 0x404fcf7
rflags 0x246
cs 0x33
fs 0x0
gs 0x0

Steps To Reproduce:

  • Have Installed K3s and running since 65 days
  • one of the pods has liveliness probe configured on alpine based keepalived pod
    livenessProbe:
    exec:
    command:
    - pidof
    - keepalived
    failureThreshold: 3
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  • This probe is causing some issue we are suspecting
    :/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io# ls -lah 7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/log.json
    -rw-r--r-- 1 root root 35M Oct 26 08:00 7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/log.json

Screenshot 2021-10-26 at 1 31 59 PM

E1022 18:03:02.434927 13149 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = failed to exec in container: failed to start exec "0e5a8e1539902911f76888152b289f0976dc3d6245f9660952ecE1022 18:02:42.467601 13149 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = failed to exec in container: failed to start exec "8dd086b3d8cddc215260a5778bb43ce2e00784981911930804f6
4ccdb846f16a": OCI runtime exec failed: exec failed: write /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/.8dd086b3d8cddc215260a5778bb43ce2e00784981911930804f64cc

Screenshot 2021-10-26 at 1 37 02 PM

Expected behavior:
limit the size of log in path: :/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io

Actual behavior:
log file in below path /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io keeps growing and make containerd and k3s unstable
Additional context / logs:
Attached screencap of logs

Backporting
N/A

@brandond
Copy link
Member

brandond commented Oct 26, 2021

Your /run appears to be a 400mb tmpfs partition. That's not going to be sufficient to run Kubernetes. You should probably mount a real disk there instead.

@prashilgupta
Copy link
Author

@brandond Thanks for quick reply as always.

All kubernetes pod logs go in below location. Somehow containerd in corner cases is using /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io for some error logging, is there a way we can control it have max size of that log file?

df -ah /var/log/pods

Filesystem Size Used Avail Use% Mounted on
/dev/sda1 37G 8.4G 29G 23% /

@brandond
Copy link
Member

Those are the runc logs, and no the path cannot be changed. If you're seeing excessive growth on those log files it usually indicates something is going on with your pod health checks; you might take a look at what's in those files and see if you can remedy it.

@prashilgupta
Copy link
Author

prashilgupta commented Oct 27, 2021

@bearnard Is there a default limit on size of this runc logs. Also, what is size of /run we should increase it to, would 1.5G be sufficient?
Yes we are seeing following error in runc logs but our health check is fine. we wanted to check how do we configure limit on this runc log file size
Sample error logs in /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/log.json

{"level":"error","msg":"failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-0[31125]: forward stage-1 (31126) and stage-2 (31127) pids to runc{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-1[31126]: signal completion to stage-0\\\"}\" to json: invalid character 'l' after object key:value pair","time":"2021-10-27T05:57:28Z"}
{"level":"error","msg":"failed to decode \"\\\"}\" to json: unexpected end of JSON input","time":"2021-10-27T05:57:28Z"}
{"level":"error","msg":"failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-1[3408]: request stage-0 to forward stage-2 pid (3409){\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-2[18011]: ~\u003e nsexec stage-2\\\"}\" to json: invalid character 'l' after object key:value pair","time":"2021-10-27T05:59:38Z"}
{"level":"error","msg":"failed to decode \"\\\"}\" to json: unexpected end of JSON input","time":"2021-10-27T05:59:38Z"}

@brandond
Copy link
Member

I'm honestly not sure. We're deep into containerd/runc territory here; you might try opening an issue over at https://github.com/containerd/containerd

@stale
Copy link

stale bot commented Apr 25, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Apr 25, 2022
@stale stale bot closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants