Improve concurrent request bucketing in autoscaler #1060

markusthoemmes · 2018-06-05T19:42:05Z

/area autoscale
/kind dev
/assign @markusthoemmes

Expected Behavior

Constant load of a certain CONCURRENCY on an application should result in CONCURRENCY amount of containers eventually created, regardless of how long the requests take.

Actual Behavior

For very short running HTTP requests, a bucketing and quantization mechanism in the queue_proxy counts all requests in a specific timeframe (100ms default) as "concurrent".

If for example 10 10ms requests arrive in one of those 100ms buckets, one after the other, the concurrency reported by the queue for this bucket will be 10, even though the "real" concurrency was only one, since they all arrived one after the other.

Steps to Reproduce the Problem

Deploy the helloworld sample: knative/serving:sample/helloworld/README.md@master#running
Run a curl loop (one request after the other) against the system for long enough: for i in {0..1000}; do curl -H "Host: $SERVICE_HOST" "http://$SERVICE_IP/"; done
Observe the pod count via kubectl get pods, telemetry or the autoscaler logs.

Additional Info

Even for that simple bash loop (and with very bad latency of about ~30ms), I get 3 containers created, even though one would easily suffice for the task:

kubectl get pods
NAME                                          READY     STATUS    RESTARTS   AGE
helloworld-00001-deployment-8888987f5-6sx4l   4/4       Running   0          1h
helloworld-00001-deployment-8888987f5-gs8ll   4/4       Running   0          22s
helloworld-00001-deployment-8888987f5-k76tw   4/4       Running   0          18s
helloworld-00001-deployment-8888987f5-vpsk7   4/4       Running   0          12s

The text was updated successfully, but these errors were encountered:

markusthoemmes · 2018-06-05T19:49:06Z

Proposed solution:

A possible solution to this issue is to take the maximum actual concurrency for each of the buckets. What does that mean?

Today, as noted above, all requests that arrive in a certain timeframe (100ms by default) are measured as concurrent. Codewise, this is done by draining the queue of decrements (i.e. responses sent) after a such a bucket timeframe.

We could instead increment/decrement the concurrency counter as requests/responses happen. Now sampling rate becomes an issue, because what happens if you have many of those super short requests but at the end of the bucket your concurrency count is always 0? You wouldn't scale at all.

The proposed solution is therefore to take the maximum value of concurrency per recorded bucket instead of sampling the concurrency value at the end of each bucket. The maximum value of concurrency is a "realistic" value of how many concurrent requests there were at a given point in time. There is no arbitrary value needed which leads to inaccuracies like explained above.

As everything is smoothened out via still averaging over the buckets this shouldn't be subject to spiky pathologies.

* Improve concurrent request bucketing in queue-proxy. Instead of counting all requests that arrived in a certain bucket as concurrent, the queue proxy now reports the actual maximum concurrency that was present inside of one bucket. If for example 3 requests arrive at once, the maximum concurrency will be 3 for that bucket. If another arises while the 3 remaining are still open, maximum concurrency is 4. Closing requests results in a decrement on the concurrency immediatly (versus draining the outgoing request queue on quantization which results in the behavior described above). Closes #1060 * Fix edge case and add a specific test. * Unify channels and add buffer again. * Update config documentation. * Adjust channel typing.

* fix configuration reconcile loop * rename var Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>

google-prow-robot assigned markusthoemmes Jun 5, 2018

google-prow-robot added area/autoscale kind/feature Well-understood/specified features, ready for coding. labels Jun 5, 2018

markusthoemmes mentioned this issue Jun 7, 2018

Improve concurrent request bucketing in queue-proxy. #1091

Merged

google-prow-robot closed this as completed in #1091 Jun 13, 2018

skonto added a commit to skonto/serving that referenced this issue Feb 6, 2025

Fix configuration metadata inconsistency (knative#15601) (knative#1060)

cb97e61

* fix configuration reconcile loop * rename var Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve concurrent request bucketing in autoscaler #1060

Improve concurrent request bucketing in autoscaler #1060

markusthoemmes commented Jun 5, 2018

markusthoemmes commented Jun 5, 2018

Improve concurrent request bucketing in autoscaler #1060

Improve concurrent request bucketing in autoscaler #1060

Comments

markusthoemmes commented Jun 5, 2018

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

markusthoemmes commented Jun 5, 2018

Proposed solution: