Memory leak in provider #87

davidgiga1993 · 2024-02-15T06:46:06Z

When running the provider with a medium number of resources (100+ Users, 20+ Orgs) the memory consumption increases until it gets killed by the OOM of the resource limit. The provider process is consuming all the memory in that case.

Also the memory usage in general is insanely high for what this provider is doing, especially when compared to the others.
Additionally we're also facing the CPU resource issue where the entire crossplane provider consumes the CPUs of an entire node the entire time..

As far as I understand most of this comes probably from upjet?
Wouldn't it make more sense to build a "proper" provider and not rely on terraform internally as it seems to be the root cause of some of those issues?

The text was updated successfully, but these errors were encountered:

Duologic · 2024-02-15T08:47:15Z

The terraform memory leakage is a nuisance. Thanks for making an issue.

Looking around upstream I find suggestions to set requests and limits on the ControllerConfig as a stop-gap solution: crossplane-contrib/provider-upjet-aws#325 (comment)

Linked from that same issue, there is another solution called ProviderScheduler: crossplane/upjet#178 I don't know if we already implement that but definitely worth investigating.

Duologic · 2024-02-15T08:56:45Z

Example implemenation of the ProviderScheduler solution: https://github.com/upbound/provider-aws/pull/627/files

patst · 2024-02-15T13:00:14Z

We have a few hundred resources and observed that as well.
You should check the queue of reconciles. Probably they pile up because the requests are not completed fast enough.

What helped us is the configuration with

    - --poll=12h
    - --sync=12h

to reduce the load.

Every change to the resource will trigger a reconcile anyway. The poll and sync stuff may only help, if somebody did manual changes to a resource which then get reset on next reconcile.

But the whole setup with the crossplane provider seems very fragile, we often have to do manual cleanups. :-/

julienduchesne · 2024-02-15T14:32:01Z

To me, it looks like the poll interval doesn't even work 🤔. I've got dashboards being refreshed every minute anyways

Argannor · 2024-02-26T15:04:02Z

Over the course of the last week I implemented the parts of this provider using the grafana go client instead of terraform as a proof of concept.

Please note that I don't want to advertise my implementation as a replacement, since only a few of the resources are implemented and everything is quite young. Instead I want to show this to you guys to have a look at it and decide for yourselves if this could be an option to replace the current terraform/upjet based implementation.

For @davidgiga1993 and me the new implementation solved the leak and cpu usage (s. screenshots above)

Before 15:00 the provider from this repository was used, after that my implementation was used

(If wanted I can post an update after a longer observation period)

Here you can find the source code used: https://github.com/Argannor/provider-grafana

julienduchesne · 2024-02-26T15:10:50Z

You can definitely advertise your implementation. The Terraform implementation is sub-optimal but I also do not have enough time to maintain a manually written provider. So, unfortunately, I can tell you that we will keep using upjet regardless of the performance issues

julienduchesne · 2024-03-20T18:08:08Z

Fixed in v0.13.0

See #107 for more info!

julienduchesne mentioned this issue Mar 18, 2024

Switch to TerraformPluginSDK framework #107

Closed

julienduchesne closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in provider #87

Memory leak in provider #87

davidgiga1993 commented Feb 15, 2024

Duologic commented Feb 15, 2024

Duologic commented Feb 15, 2024

patst commented Feb 15, 2024

julienduchesne commented Feb 15, 2024 •

edited

Loading

Argannor commented Feb 26, 2024

julienduchesne commented Feb 26, 2024

julienduchesne commented Mar 20, 2024

Memory leak in provider #87

Memory leak in provider #87

Comments

davidgiga1993 commented Feb 15, 2024

Duologic commented Feb 15, 2024

Duologic commented Feb 15, 2024

patst commented Feb 15, 2024

julienduchesne commented Feb 15, 2024 • edited Loading

Argannor commented Feb 26, 2024

julienduchesne commented Feb 26, 2024

julienduchesne commented Mar 20, 2024

julienduchesne commented Feb 15, 2024 •

edited

Loading