-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CrashLoopBackOff OOMKilled for kustomize-controller #725
Comments
Can you post here the logs from the previous container instance, |
Here they are. Thoughts?
|
I see no panic nor other errors in the logs you posted, something else must be going on. Can please post here:
|
Here you go! Thanks for your quick response.
|
Hmm really strange, there are no events for the deployment. Could it be that the Kubernetes node has issues? Please delete the pod then after the new one starts, as soon as it fails, post here |
Hi, I'm working with Sofia on this case. What happens with the kustomize-controller pod is that it actually gets OOMKilled. I tried to change its memory limit (from 1 GiB to 2 GiB), and still runs out of memory. I confirmed it happens because of memory ballooning by attaching to the pod during the short time that it runs, and noticing after about 20 seconds that the kustomize-controller process inside the container goes from 700 MiB to more than 3.2 GiB, and then the container gets killed with exit code 137. Here's a snippet of describe pod kustomize-controller:
|
@azure-claudiu To fix the memory leak we need to reproduce it first. Can you please create a repo with the YAML files that make the controller behave like this? |
Just to clarify, we created these Flux configs using an Azure extension. Steps are basically these here: https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2 Would you need just the yaml files, or also the flux configs created by this extension? |
This is our file structure. There are three yaml files in our base/mongodb folder.
kustomization.yaml:
mongo-deployment.yaml:
mongo-service.yaml
Lastly, we created a GitOps configuration with the following command:
|
With those files I can’t reproduce the OOM. Can you please swap the image with the upstream one and see if it fails the same? If it does, then please do a heap dump and share with me. Set the controller image, in the its deployment to: |
I changed the image; it still leaks memory: https://fluxissues1.blob.core.windows.net/videos/top-102.mp4 Here's a heap dump every second (roughly), until it dies: https://fluxissues1.blob.core.windows.net/heaps/heap-b.zip |
Can you please create a zip of your repo and share it with me, I also need the GitRepository and Flux all Kustomizations manifests from the cluster. You can reach out to me on CNCF Slack if the repo contains sensitive information and share it privately. |
I suspect that one of repositories used here is very large and that may cause the OOM as we need to load the content in memory to verify the checksum. Can you please exec into source-controller and post the output of:
|
I managed to reproduce the OOM with a repo containing over 100MB of dummy files. I guess we need to reject such an artifact in source-controller and error out to tell people to use .sourceingore to exclude files that are not made for Flux. |
We definitely have a directory in the source tree that contains large ML model files. Here's the output from our source-controller:
|
Ok then mystery solved, add a A better option would be to push the manifests from that repo to ACR and let Flux sync the manifest from there, see https://fluxcd.io/flux/cheatsheets/oci-artifacts/ |
Thanks for your help, Stefan. That resolved it for us. Closing the issue now! |
We created a Flux configuration in our Kubernetes cluster. We keep getting an issue with the Kustomize-Controller pod getting stuck in a CrashLoopBackOff state. The logs aren't pointing to any particular root cause for the crash.
Pod status:
Pod events:
Pod logs:
The text was updated successfully, but these errors were encountered: