-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support autoscaler in SeldonDeployment #277
Comments
I think this would require some changes to the SeldonDeployment specification as the /scale operation assumes a single location for "replicas" in the definition. At present "replicas" is defined on a per-predictor basis. One option is:
This would allow you to specify per-predictor replicas as now but use a SeldonDeployment wide replicas if you wish and allow autoscaling to use this. Separately, people may wish to define replicas on a per PodTemplateSpec level inside each predictor. If we wanted to do this also we could:
The lowest level of replicas setting would take precedence in the order: spec.predictors[].componentsSpecs[].metadata.annotation Feedback welcome. |
I highly agree that the auto-scaling support would add great value. What I am wondering: now that we have separate deployments e.g. for model images & engine in case of single model serving and other basic constructs, would we want to autoscale those together? I admit that it would be pretty hard to find a good solution for this (maybe by specifying relative replica ratios for the predictors of the SeldonDeployment? ), so as a first step the solution proposed above would be fine. |
Btw, could you pleaseprovide some background info or maybe point to a doc regarding why the (K8s) deployment structure was changed so that the model container and the engine are now in a separate K8S deployment? |
The current latest master versions have the ability to run the service orchestrator internal to the first predictor deployment or as a separate deployment. By default the latest code will use the same deployment as the first podTemplateSpec defined in your SeldonDeployment graph. This should cover most use cases and is best for latency. We need to update docs to add the annotation to allow this configuration option. |
PR #437 adds the ability to add HorizontalPodAutoscaler Specs for the defined PodTemplateSpecs. |
SeldonDeployment
predictor
supportsreplicas
. It would be great if it can supportautoscaler
.The text was updated successfully, but these errors were encountered: