Request-based horizontal pod autoscaling #573

deliahu · 2019-11-11T20:51:55Z

Description

Currently, the user must tune an API's CPU request for horizontal pod autoscaling to behave as expected. An approach based on concurrent requests per container may be better (similar to what Knative uses).

This would also make autoscaling for GPU workloads behave more as expected

It may make sense to have both request-based and CPU/GPU-based autoscaling active at the same time, i.e. it will scale when either of the thresholds are met, and won't scale back until both metrics have backed off.

deliahu added the enhancement New feature or request label Nov 11, 2019

deliahu changed the title ~~Revisit pod autoscaling~~ Request-based horizontal pod autoscaling Nov 25, 2019

deliahu added the v0.13 label Nov 25, 2019

vishalbollu mentioned this issue Nov 29, 2019

Serve a collection of custom models based on LRU #619

Closed

deliahu mentioned this issue Dec 3, 2019

Support autoscaling based on GPU utilization #625

Closed

deliahu removed the v0.13 label Dec 20, 2019

ospillinger added the v0.13 label Dec 20, 2019

deliahu assigned vishalbollu and unassigned vishalbollu Dec 20, 2019

deliahu added v0.14 and removed v0.13 labels Jan 2, 2020

deliahu mentioned this issue Feb 14, 2020

Custom autoscaler #815

Merged

4 tasks

deliahu closed this as completed in #815 Feb 16, 2020

deliahu added this to the v0.14 milestone Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request-based horizontal pod autoscaling #573

Request-based horizontal pod autoscaling #573

deliahu commented Nov 11, 2019 •

edited

Loading

Request-based horizontal pod autoscaling #573

Request-based horizontal pod autoscaling #573

Comments

deliahu commented Nov 11, 2019 • edited Loading

Description

deliahu commented Nov 11, 2019 •

edited

Loading