-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-controller pod restarting (OOMKilled) #192
Comments
Can you post here the kubelet error, should be under describe replicaset or pod. Can you also post what interval are you using in the GitRepository. |
The pod description doesn't show any error, just that the pod was Terminated with reason OOMKilled.
And here are the events in gotk-system:
This is how the GitRepository is defined:
|
Are all your HelmReleases coming from HelmRepositories or you do have charts in GitRepositories? |
All the HelmReleases have a HelmRepository as source (the same repository reference) |
Can you share the sizes of the Also: note that the interval you have set for the |
I'm not sure if this is exactly what you're asking for but here are the sizes on the source-controller pod: data/helmchart: 304K |
A 44MB index would explain the OOM, every minute the index is loaded into memory for each release and parsed, with the default number of workers means: |
I've just increased the interval on the HelmReleases to |
Increasing the interval from 1m to 3m in the HelmRelease didn't solve the issue. There have been over 70 restarts in 17h. When doing |
Yeah that's expected, doesn't matter the interval if it's the same for all HRs. You either increase the memory limit or you trim down the 44MB index, for reference, the stable Helm repository index is 7MB. |
We are experiencing a similar issue too. The repo that we clone is pretty big
Is the only way to fix this would be to increase the limit on the source-controller? What did you end up setting your limit to? @avacaru |
You can change any field of Flux manifests with Kustomize patches without interfering with bootstrap, please read the docs https://toolkit.fluxcd.io/guides/installation/#customize-flux-manifests |
@brianpham make sure you use .sourceignore and you exclude everything else but the yaml manifests or consider having the manifests in a dedicated branch. |
I believe that I'm encountering this issue as well with source controller ( From the latest OOM kill last night: Interestingly, the memory usage 'spike' coincides with a bunch of errors logged from source-controller, but I'm not certain if the errors are the cause or the symptom of the memory issue:
Is the appropriate remedy to increase the memory limit for source controller? It's currently set to 1Gi. |
@billimek if you have many Helm related resources in your cluster, you may want to try this, as for some operations we need to read e.g. whole repository indexes from memory. |
Thanks @hiddeco, I beleive that there are probably a lot in this case. I bumped the limit to 2Gi. Appreciate the super fast response!
/data $ du -hs /data/*
124.0K /data/gitrepository
1.6M /data/helmchart
19.9M /data/helmrepository |
@billimek the files are not as enormous as I would have expected (I have seen indexes of ~50MiB). I have created a PR to enable pprof endpoints on the metrics server so that we can get a better insight into the resource consumption of your controller. |
Should I add the following part, to the kustomization.yaml that is in the same directory alongside with gotk-sync.yaml and gotk-components.yaml ??
|
|
@Ayatallah see my example here and here. |
Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?! |
|
IIRC it was synced automatically. |
Okay, can you let me know if I'm missing out anyth: I added the following to kustomization.yaml: so it now looks like this: and gotk-patches.yaml content is as follows: so staging directory now contains 4 files: then commit and push to git but not automatic sync happening! |
|
@Ayatallah many things look wrong in there, there is a typo in the patch file name, also the namespace is wrong, should be flux-system. Please use code blocks and paste the YAML inside them. |
Should I use flux-system as namespace even if its not the namespace i bootstrapped the flux instance in?!
|
|
Is it a must to use namespace=flux-system ?! |
@Ayatallah Flux v2 is not meant to be installed more than once per cluster. See https://github.com/fluxcd/flux2-multi-tenancy on how to do multi-tenancy if that's what you're after. |
|
or for flux v2 multi tenancy to be applied properly, I have to re-structure my repo?! |
No, you can create multiple |
Nothing in my git or helm repositories are particularly large (screenshot); yet, during source-controller pod startup, the pod spikes past 2.5Gb of memory. What are known implementation decisions that would cause a large memory spike on startup? After a couple minutes, it comes back down to around 1Gb where it seems to be staying. I do have decently fast So my question is what is causing this in only one cluster and not elsewhere? I suspect it could be related to the way I have my helm charts configured in this cluster. I have 3 charts being fetched and packaged directly from a GitRepository. I am not doing this in other clusters so I am guessing that could be the root cause. Is there known performance trade offs with using GitRepositories in a helm chart? |
Do you have |
Yes some of my charts have a Edit: I should add that ALL of the charts are stored in the same git repo. So there are definitely charts that have a |
Is Documentation at helm.sh about the purpose of this file indicates that What would be really nice is if you could forward If source controller gitrepositories could somehow know that they'll be used for serving a helm chart, and they should honor Right now I think the only other way to accomplish this installation is to fork and add |
@stefanprodan Also facing the same OOM issue with source controller. |
So i made the changes but still it does apply the limit or the request that i have set for source controller This is my kustomization template file
My kustomization.tf file is like this where i apply this above template
Please let me know if i am doing something wrong |
Hey all. I'm currently experiencing the same issue along with @ekosov-form3 on our work. This thread has been helpful and increasing the memory limit is our solution for now. However we'd like to understand more about the source controller's memory requirements so we can understand why its so high and look at alternative solutions like reducing the index size. We'd like to know
Currently we're seeing memory fluctuate roughly between 600MB and 1200MB with the following resources and a default installation except for the source controller's memory limit.
Thanks |
@matt-woodruff-f3 given you have collected such detailed statistics about your Helm usage, would you be willing to give an image based on #485 a spin? This is getting into a shape that it'll likely end up in a release soon, and will greatly effect the answers to your questions (and should heavily improve performance). If so, please reach out to me on Slack ( |
Release candidate for the above PR has been made available, and instructions are added to the PR for testing purposes. It would be great if some of you could try this out and share results, as simulating real-world Helm setups has proven to be extremely difficult. |
I believe these changes are in source-controller 0.19.0 and Flux 0.24.0, so this issue can be closed out now. (Is that correct?) |
The changes have indeed been released in |
@hiddeco Thanks for the update! We've been running 0.19.0 in 3 of our environments for a few days now and can report no OOM issues. We've even reverted the memory requirements back to default from Max 2Gi to 1Gi. |
Awesome. Thanks for the confirmation @matt-woodruff-f3 – I'll close this now, based on your confirmation! |
I have noticed that the source-controller pod of my gotk deployment restarting a huge number of times over the weekend (148 times -- version 0.1.1). I've re-deployed a newer version (0.2.1) but the restarts keep happening (about 2 every half hour).
This causes the helm-controller to not be able to reconcile HelmReleases:
The source controller manages one GitRepository and two HelmRepositories.
The helm controller takes care of 11 HelmReleases, each with similar configuration:
While writing up this issue the source-controller restarted 3 more times
Logs from the source controller don't indicate any errors:
The text was updated successfully, but these errors were encountered: