-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s in worker node(agent) get OOM if api server is not reachable for a while #11346
Comments
v1.28.9 is several months old, please update. |
Please upgrade to a more recent release and confirm if you still see the issue. v1.28 is end of life as of last month. |
@brandond Got it, I'll do the upgrade first. |
@brandond Bad news, with 1.31.2, memory leakage still exists. it is not drastically, but stably leaking. In the picture, is a node running for about 36 hours, memory usage is growing from about 1% to 24%. |
@brandond Can we re-open this one? |
Environmental Info:
K3s Version: 1.28.9
Node(s) CPU architecture, OS, and Version:
amd64, ubuntu 22.04
Cluster Configuration:
3 servers, 1 agent
Describe the bug:
When all server nodes go offline, agent not keep trying to reach api server, but failed. This is expected. However, if servers are not back on time, k3s agent will consume a significant ram and endup in the OOM killed, putting the traffics on the node into totally in-functional.
Steps To Reproduce:
To make the bug popup more quickly, you can use an agent node with load RAM.
Expected behavior:
Even api server is offline, agent node should keep running as it is, keep the pods on it running.
Because in upstream k8s design, pods will keep running and will restart if they crash but the API will not be available so it will not be possible to run anything new or change them.
Actual behavior:
k3s server use out the system memory, make node unstable, and k3s itself get oom killed, user load get removed after k3s get restarted.
Additional context / logs:
attach screenshot show k3s get killed with oom
The text was updated successfully, but these errors were encountered: