-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concurrent request bucketing in autoscaler #1060
Comments
Proposed solution:A possible solution to this issue is to take the maximum actual concurrency for each of the buckets. What does that mean? Today, as noted above, all requests that arrive in a certain timeframe (100ms by default) are measured as concurrent. Codewise, this is done by draining the queue of decrements (i.e. responses sent) after a such a bucket timeframe. We could instead increment/decrement the concurrency counter as requests/responses happen. Now sampling rate becomes an issue, because what happens if you have many of those super short requests but at the end of the bucket your concurrency count is always 0? You wouldn't scale at all. The proposed solution is therefore to take the maximum value of concurrency per recorded bucket instead of sampling the concurrency value at the end of each bucket. The maximum value of concurrency is a "realistic" value of how many concurrent requests there were at a given point in time. There is no arbitrary value needed which leads to inaccuracies like explained above. As everything is smoothened out via still averaging over the buckets this shouldn't be subject to spiky pathologies. |
* Improve concurrent request bucketing in queue-proxy. Instead of counting all requests that arrived in a certain bucket as concurrent, the queue proxy now reports the actual maximum concurrency that was present inside of one bucket. If for example 3 requests arrive at once, the maximum concurrency will be 3 for that bucket. If another arises while the 3 remaining are still open, maximum concurrency is 4. Closing requests results in a decrement on the concurrency immediatly (versus draining the outgoing request queue on quantization which results in the behavior described above). Closes #1060 * Fix edge case and add a specific test. * Unify channels and add buffer again. * Update config documentation. * Adjust channel typing.
* fix configuration reconcile loop * rename var Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>
/area autoscale
/kind dev
/assign @markusthoemmes
Expected Behavior
Constant load of a certain
CONCURRENCY
on an application should result inCONCURRENCY
amount of containers eventually created, regardless of how long the requests take.Actual Behavior
For very short running HTTP requests, a bucketing and quantization mechanism in the queue_proxy counts all requests in a specific timeframe (100ms default) as "concurrent".
If for example 10 10ms requests arrive in one of those 100ms buckets, one after the other, the concurrency reported by the queue for this bucket will be 10, even though the "real" concurrency was only one, since they all arrived one after the other.
Steps to Reproduce the Problem
for i in {0..1000}; do curl -H "Host: $SERVICE_HOST" "http://$SERVICE_IP/"; done
kubectl get pods
, telemetry or the autoscaler logs.Additional Info
Even for that simple bash loop (and with very bad latency of about ~30ms), I get 3 containers created, even though one would easily suffice for the task:
The text was updated successfully, but these errors were encountered: