-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding scaledownPeriod #882
Comments
Hello,
KEDA doesn't work as that either. As I said above, if you have a single message randomly and unluckily KEDA doesn't check the queue when the queue has a message, the scaler will consider as not active.
The interceptor is a proxy written in go and, as you have said, it acts as a sort queue. The point here is that the interceptor also has its own autoscaling, so if the amount of requests that it's processing grows, the interceptor will scale out. The interceptor will act as a queue more if your backend responds slowly, but if your backend responds fast, the interceptor is just another proxy in the middle like any other you can already have. |
I now understand the reason behind my initial misconception. We utilize two additional scalers, Regarding in-flight messages, I assumed that our rapid message handling process results in minimal in-flight messages, multiple customers handle similar small requests and as a result, accumulation is minimal, and the consequence is a continuous scaling up and down from zero. While I acknowledge that the |
This could happen in scenarios with small loads (I mean, a few random requests during the time), for them , I'd suggest increasing the cooldownPeriod from 300 to 900 or 1800. Increasing the cooldown you can mitigate the issue. I understand that this behaviour could not fit with all the scenarios, but just returning a counter for total requests introduces other challenges like the aggregation over the time (1min? 5 min? 30sec?), which can introduce extra load in the system and which should be configurable. Currently, the metric returns the real count of requests (which IMHO is the most accurate way in high load scenarios), but I guess that we can discuss how to adapt this for other cases. Maybe we could store and propagate the last request timestamp and return if the scaler is active or not based on it 🤷 |
I created a scenario that's not uncommon for us with the following PowerShell snippets:
I've run the above load test script, which sends 15 batches of 1000 requests with a 1-millisecond interval between batches. We observed that it works well, witnessing successful deployment scaling and effective traffic handling. In a closed test environment, this approach proves effective. Having a timestamp for the last request and incorporating a cooldown period based on that would be an ideal setup, at least for us. Regarding the |
No no, I didn't mean pollingInteval, I meant |
Yeah we could do that, but I still feel like I'm just "pushing the problem" to future me then :) |
it gives you some time until we decide how to proceed xD I get your point and probably we should think about what being active means on HTTP workloads as it's quite different than queue consumers, but I'd like to know also @tomkerkhove and @t0rr3sp3dr0 thoughts about this. |
I appreciate the "open-ness" 🥇, for now, I'll template a longer Keep me in the loop and thank you for the work you do 👍🏻 |
"store and propagate last request timestamp and return if the scaler is active", it sounds like a good solution. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
As a quick update, we have decided to implement some aggregation modes for the metrics instead of the following the approach of managing the IsActive by the scaler directly |
Hi everyone! 👋🏻
We've been utilizing KEDA successfully for a while now, scaling our deployments and enabling scaling to zero.
Recently, I've been delving into implementing the
http-add-on
.I expected that
HTTPScaledObjects
would mirror the behavior ofScaledObjects
, featuring a target value (trigger result or, in this case, pending requests) and a cooldown period that resets with each trigger result / new request.However, my initial testing suggests that
HTTPScaledObjects
doesn't quite work that way. Regardless of the number of requests sent through, it consistently adheres to thescaledownPeriod
and scales the deployment down from the moment it scaled up +scaledownPeriod
, even if it received a request just a second ago.Is this the intended behavior? If so, is there a way to configure the resources to match the behavior of
ScaledObjects
, where thescaledownPeriod
resets with every received request?Additionally, we have customers with burst periods where they will send high volumes of requests per second. I want to ensure that we do not lose requests here via the interceptor.
I assume that if the interceptor is not performing, it will act as a sort of queue, and I'd like to avoid that.
Appreciate any insights or tips you can share! 👍🏻
The text was updated successfully, but these errors were encountered: