-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory OOM node failure when using windows docker image in talos linux #9215
Comments
Why is that a Talos issue? You can configure reservation for system cgroups yourself via the kubelet configuration, anything besides that is up to your deployment - make sure your pods have proper resource limits sets. |
Yes, but even with memory limits set this happens. But I would not expect that the full node crashes with its services (only the pod should be removed). Normal Starting 2m39s kube-proxy |
Is Evictionhard in the machineconfig still working? Cant find any documentation. Have set memory limits on the pods and the node is still crashing.
|
pglazyfreed 0 |
So I'm not sure what is the issue we should fix, as Kubernetes cluster operator should set proper memory limits and reservations. We will consider tuning reservations for Talos components, but kubelet resource usage depends on the workload, so it can't be set once and for every cluster. See #7081 |
The resource limits are properly set in kubernets. Also I have 31GB allocatable memory on the node and still the windows docker pod is killing the services due to OOM. |
for me this looks very strange as it seems more windows related. You could clone this and apply the kubernetes.yml from |
Could this potentially also be because of docker and WSL2 as I use talos inside docker on my win 11 host. And have inside that a win 11 pod. kern: warning: [2024-08-24T20:31:19.332892508Z]: init invoked oom-killer: gfp_mask=0xdc0(GFP_KERNEL|__GFP_ZERO), order=0, |
with 10gb on docker this if fixed and no crash anymore |
Bug Report
Talos is not correctly making sure that the max amount of memory is evicted if running a windows docker image on linux
Description
Running talos linux and deploy windows docker image on talos kubernetes (https://github.com/dockur/windows) (but with priviledged:false and no kvm mapping).
Pods are first all successfully up and running but after some time all pods and services (only the flannel pods are surviving it) are down. Worker node has state: not ready
Kubelet fails with OOM and cant recover full worker node is not reachable anymore and cant recover on its own. Has to be restarted and pod removed.
Logs
support.zip
Environment
The text was updated successfully, but these errors were encountered: