Polling interval causes massive CPU use #799

jocelynthode · 2023-09-28T05:50:20Z

Report

Currently the ScaledObject polling interval is hard coded to 1 second. This seems to cause massive CPU usage on our end.

We currently have 120+ HTTPScaledObjects meaning we have 120+ ScaledObjects.

Our keda operator is hovering around 6000m of CPU Usage and the keda operator logs are littered every seconds by a lot of:

2023-09-28T05:45:25Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"ldap-ui","namespace":"ldap-ui"}, "namespace": "ldap-ui", "name": "ldap-ui", "reconcileID": "07460f4d-23d2-4157-b011-bdab317e09cc"}

I would love as a workaround to be able to choose the pollingInterval I want, but I wonder if there could be another way to handle this issue with a push rather than pull method as increasing pollingInterval will solve CPU Usage but scaling will take longer.

Expected Behavior

I would expect http-add-on to not cause such massive CPU usage

Actual Behavior

The CPU usage increases linearly with ScaledObjects, needing ~6CPU for 120 HTTPScaledObjects

Steps to Reproduce the Problem

Create a lot of HTTPScaledObjects
Check CPU Usage of the keda-operator pod

Logs from KEDA HTTP operator

example

HTTP Add-on Version

Other

Kubernetes Version

1.25

Platform

Other

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-10-05T21:12:34Z

Hello
Do you have any other ScaledObject apart from the generated from HTTPScaledObjects? I'm not sure if this is the root cause (I'm not saying that it's not, just that I'm not sure) because we are implementing load tests in the operator and we deal 1K ScaledObjects (pollingInterval: 1 too) with just a single CPU in ideal conditions.
I know that ideal conditions are not real, but the difference is huge. Could you have throttling in the Scaler component and it slows the operator?

JorTurFer · 2023-10-05T21:21:15Z

Even that, I guess that we can increase the polling interval to 15 seconds in general because the current approach is already a pushing approach because we use external-push scaler, not external. It's the scaler who actively push when it's activated, so we don't need to evaluate it every second to scale up because that's implicitly done by the external-push scaler.
For scaling itself, the HPA controller request metrics every 15 seconds, so just using 15 seconds as polling interval could be enough to give fresh metrics on each request.
WDYT @tomkerkhove @t0rr3sp3dr0 ?

are you willing to contribute with the fix @jocelynthode (once they have shared their thought) ?

jocelynthode · 2023-10-05T23:35:17Z

I would be willing to contribute the fix. I'll be in holidays for two weeks but could take this after my return :).

All our ScaledObjects are generated from HTTPScaledObjects as we only use keda in conjunction with http-add-on. (I should probably do a PR to add my company in the list of http-add-on users as we're currently using it in prod as well).

I had no idea there were load tests. My analysis on CPU usage might be wrong. I guessed this was the issue as the CPU usage has increased almost linearly with our increasing number of HTTPScaledObjects and reconciling lines are the only lines I can see in the logs and they are getting spammed a lot.

It might also be some misconfiguration on my end. We're virtually only doing scale-to-zero. Our goal with this add-on is to reduce our footprint by not running unused workload when no one's accessing it so all our HTTPScaledObjects have a min replica number of 0.

If you have some pointers for me, I would also be willing to investigate further the issue to make sure it's not caused by some configuration on my end.

The http-external-scaler seems to consume some cpu as well. Checking the usage for the past 12 hours, it's a bit lower than last time but we're around 3.5 CPU for the operator and 2CPU for the external-scaler:

jocelynthode · 2023-10-26T06:27:20Z

Since upgrading to 0.6.0, the problem seems to have disappeared. I'll still submit a PR to align the interval to 15sec, but I'll go ahead and close this issue.

JorTurFer · 2023-10-29T16:10:19Z

I think that this is the real fix introduced in v0.6.0 that has reduced the CPU: 8ea0896

jocelynthode · 2023-10-29T16:25:35Z

Ah interesting thanks :)

jocelynthode added the bug Something isn't working label Sep 28, 2023

keda-automation added this to Roadmap - KEDA HTTP Add-On Sep 28, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA HTTP Add-On Sep 28, 2023

jocelynthode closed this as completed Oct 26, 2023

github-project-automation bot moved this from To Triage to Done in Roadmap - KEDA HTTP Add-On Oct 26, 2023

jocelynthode mentioned this issue Oct 26, 2023

fix: align polling interval to 15 seconds #829

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polling interval causes massive CPU use #799

Polling interval causes massive CPU use #799

jocelynthode commented Sep 28, 2023

JorTurFer commented Oct 5, 2023 •

edited

Loading

JorTurFer commented Oct 5, 2023

jocelynthode commented Oct 5, 2023

jocelynthode commented Oct 26, 2023

JorTurFer commented Oct 29, 2023

jocelynthode commented Oct 29, 2023

Polling interval causes massive CPU use #799

Polling interval causes massive CPU use #799

Comments

jocelynthode commented Sep 28, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA HTTP operator

HTTP Add-on Version

Kubernetes Version

Platform

Anything else?

JorTurFer commented Oct 5, 2023 • edited Loading

JorTurFer commented Oct 5, 2023

jocelynthode commented Oct 5, 2023

jocelynthode commented Oct 26, 2023

JorTurFer commented Oct 29, 2023

jocelynthode commented Oct 29, 2023

JorTurFer commented Oct 5, 2023 •

edited

Loading