-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keda-Operator OOM problem after upgrade to Keda v2.11.* #4789
Comments
@JorTurFer If I get it correctly, we did not patch this in 2.11.2, is that correct? |
I think that we didn't patched this. @zroubalik is going to check this |
Hey @andreb89 can you please maybe share a memory map of keda before it gets oom + apiserver request logs you get from the keda operator service account? For me it helped to investigate regarding OOM issue I had with earlier version of keda - #4687 If you need help regarding how to activate the and get memory map you can go to the debugging section. |
I think that enabling the memory profiling on demand could be worth for debugging some advanced and complex scenarios. WDYT @tomkerkhove @zroubalik ? |
I think I can try since I already did this kind of thing before. |
@yuvalweber this would be great, I think that the more user friendly option the better, thus I am leaning towards using a webserver, that is started only when the profiling is enabled. Thanks for doing this |
Yes I see that the web server option is the more friendly one. |
We had a similar issue after upgrading. We solved it by limiting the namespaces to watch. |
We're upgrading to latest supported version for k8s v1.25 (v2.11.2), and we're having same issues with OOM... Adding profiler doesn't really solve the issue EDIT: After commenting out the limits (and raising requests just in case), the memory usage is back to the same levels. |
Report
Hi,
we have an OOM problem in Kubernetes (AKS 1.26.3) with the keda-operator introduced with version 2.11.*. We are using Postgres- and Prometheus trigger for scaled jobs. For now, we downgraded to 2.10.1 again, where we do not have this issue.
Grafana metrics for the keda-operator pod with 2.11.1:
After the downgrade to 2.10.1:
I added some keda-operator pod logs. but nothing useful is really found around the time the OOM happens.
We are using the default resource request/limits, e.g. keda-operator:
We have about 500 scaledjobs instances and 1 scaledobjects instance. Most of the jobs have a Prometheus trigger with the following template:
Expected Behavior
Memory consumption should stay the same after the Keda version update.
Actual Behavior
Huge jump in memory consumption after the upgrade.
Steps to Reproduce the Problem
Have a bigger cluster with a lot of different scale jobs and try the Keda version upgrade from 2.10.* to 2.11.*.
Maybe this will happen for you, too. Honestly unclear.
Logs from KEDA operator
KEDA Version
2.11.1
Kubernetes Version
1.26
Platform
Microsoft Azure
Scaler Details
Prometheus & Postgres
Anything else?
No response
The text was updated successfully, but these errors were encountered: