-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource temporarily unavailable: Due to logs filling up /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io #3253
Comments
Does not looks like runc bug; runc do not write any logs per se. |
@kolyshkin below is the sample log runc/libcontainer/nsenter/nsexec.c Line 1056 in 4d17654
Can you check again {"level":"error","msg":"failed to decode "{\"level\":\"debug\", \"msg\": \"nsexec-0[24983]: forward stage-1 (24984) and stage-2 (24985) pids to runc{\"level\":\"debug\", \"msg\": \"nsexec-1[24984]: signal completion to stage-0\"}" to json: invalid character 'l' after object key:value pair","time":"2021-10-26T06:11:37Z"} |
@kolyshkin would you consider reopening this issue, given @prashilgupta's last comment? I'm having the exact same problem. |
Yeah, I think this needs to be reopened. I think the root cause is that (unless I'm misreading how the nsenter call is set up) f1b703f changed the log level to DEBUG by default: runc/libcontainer/nsenter/nsexec.c Line 113 in 4d17654
runc expects all the messages output by nsenter to be parsable as JSON, but the debug messages are not. This causes runc to log a bunch of errors when it fails to parse the messages, which are output every time nsenter is used when debugging is enabled. |
@brandond I don't think your explanation is correct:
I suspect that what's happening is that that the JSON-encoded log messages are being interleaved somehow (which would explain why the log messages have interwoven JSON in the error message) -- but this really shouldn't happen (the log messages are all well under <4096 bytes which means they're basically guaranteed by Linux to be written in one block with no interleaving). I'll reopen this since it does appear to be an issue within runc, though I can't see how it could happen -- maybe the system has a reduced global pipe buffer size (though I just did some testing and even setting |
Yeah, I did a little more tracing through the code after that comment and it looked like the log level environment variable was always set. Debug seems like an odd choice for a default though.
I think that might be the source of the problem. Just based on the messages included in the screenshots above, it looks like the runc/libcontainer/nsenter/nsexec.c Lines 180 to 181 in 4d17654
I'm new to this code base, so apologies if none of this makes any sense. |
I was trying to explain this above, but maybe it was a bit unclear. The reason this is necessary is that you need to have debug be the default in order for runc's actual default log level is warn (or info?), not debug.
But I'm not sure how that can happen -- |
This looks very much like a race that we fixed in #3130 (the fix made its way to runc 1.0.2). Since I do not see runc version anywhere in this bug report, I assume you're using a version prior to 1.0.2. Please upgrade to 1.0.2 or 1.0.3. |
IOW this is a duplicate of #3119 |
We're updating to v1.0.3 for our December release cycle - we'll ask users to test once that's available! |
@brandond What version of k3s will have this fix? |
the ones that come out later this week. |
Description
Describe the bug:
/run partition is filling up causing containerd and kubernetes to be unusable
df -ah /run
Filesystem Size Used Avail Use% Mounted on
tmpfs 395M 395M 0 100% /run
kubectl get pods -A
bash: fork: retry: Resource temporarily unavailable
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x404fcf7 m=0 sigcode=18446744073709551610
...
...
Steps To Reproduce:
Have Installed K3s and running since 65 days
one of the pods has liveliness probe configured on alpine based keepalived pod
livenessProbe:
exec:
command:
pidof
keepalived
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
This probe is causing some issue we are suspecting
:/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io# ls -lah 7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/log.json
-rw-r--r-- 1 root root 35M Oct 26 08:00 7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/log.json
Screenshot 2021-10-26 at 1 31 59 PM
E1022 18:03:02.434927 13149 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = failed to exec in container: failed to start exec "0e5a8e1539902911f76888152b289f0976dc3d6245f9660952ecE1022 18:02:42.467601 13149 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = failed to exec in container: failed to start exec "8dd086b3d8cddc215260a5778bb43ce2e00784981911930804f6
4ccdb846f16a": OCI runtime exec failed: exec failed: write /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7d99079fde994c2c54c695ccc47eead5336305e083afdaf585120686d5b498d5/.8dd086b3d8cddc215260a5778bb43ce2e00784981911930804f64cc
Screenshot 2021-10-26 at 1 37 02 PM
Expected behavior:
limit the size of log in path: :/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io
Actual behavior:
log file in below path /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io keeps growing and make containerd and k3s unstable
Additional context / logs:
Attached screencap of logs
Environmental Info:
K3s Version:
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+k3s1", GitCommit:"75dba57f9b1de3ec0403b148c52c348e1dee2a5e", GitTreeState:"clean", BuildDate:"2021-05-21T16:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+k3s1", GitCommit:"75dba57f9b1de3ec0403b148c52c348e1dee2a5e", GitTreeState:"clean", BuildDate:"2021-05-21T16:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Node(s) CPU architecture, OS, and Version:
uname -a
Linux ztna-gateway 4.15.0-132-generic #136-Ubuntu SMP Tue Jan 12 14:58:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
8166b995-e5cd-4358-a7c3-3db4a2796a30 Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.116 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2
e3de9562-ff8a-4ccc-91ea-2183895edf10 Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.115 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2
f702c5aa-c5ae-4322-8536-ff2f012cfeef Ready control-plane,etcd,master 66d v1.21.1+k3s1 10.224.71.117 Ubuntu 18.04.5 LTS 4.15.0-132-generic containerd://1.4.4-k3s2
Reference ticket in k3s
k3s-io/k3s#4327
The text was updated successfully, but these errors were encountered: