-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in provider #87
Comments
The terraform memory leakage is a nuisance. Thanks for making an issue. Looking around upstream I find suggestions to set requests and limits on the ControllerConfig as a stop-gap solution: crossplane-contrib/provider-upjet-aws#325 (comment) Linked from that same issue, there is another solution called ProviderScheduler: crossplane/upjet#178 I don't know if we already implement that but definitely worth investigating. |
Example implemenation of the ProviderScheduler solution: https://github.com/upbound/provider-aws/pull/627/files |
We have a few hundred resources and observed that as well. What helped us is the configuration with
to reduce the load. Every change to the resource will trigger a reconcile anyway. The poll and sync stuff may only help, if somebody did manual changes to a resource which then get reset on next reconcile. But the whole setup with the crossplane provider seems very fragile, we often have to do manual cleanups. :-/ |
To me, it looks like the poll interval doesn't even work 🤔. I've got dashboards being refreshed every minute anyways |
Over the course of the last week I implemented the parts of this provider using the grafana go client instead of terraform as a proof of concept. Please note that I don't want to advertise my implementation as a replacement, since only a few of the resources are implemented and everything is quite young. Instead I want to show this to you guys to have a look at it and decide for yourselves if this could be an option to replace the current terraform/upjet based implementation. For @davidgiga1993 and me the new implementation solved the leak and cpu usage (s. screenshots above) Before 15:00 the provider from this repository was used, after that my implementation was used Here you can find the source code used: https://github.com/Argannor/provider-grafana |
You can definitely advertise your implementation. The Terraform implementation is sub-optimal but I also do not have enough time to maintain a manually written provider. So, unfortunately, I can tell you that we will keep using upjet regardless of the performance issues |
See #107 for more info! |
When running the provider with a medium number of resources (100+ Users, 20+ Orgs) the memory consumption increases until it gets killed by the OOM of the resource limit. The
provider
process is consuming all the memory in that case.Also the memory usage in general is insanely high for what this provider is doing, especially when compared to the others.
Additionally we're also facing the CPU resource issue where the entire crossplane provider consumes the CPUs of an entire node the entire time..
As far as I understand most of this comes probably from upjet?
Wouldn't it make more sense to build a "proper" provider and not rely on terraform internally as it seems to be the root cause of some of those issues?
The text was updated successfully, but these errors were encountered: