Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support autoscaler in SeldonDeployment #277

Closed
ChenyuanZ opened this issue Oct 31, 2018 · 5 comments · Fixed by #437
Closed

Support autoscaler in SeldonDeployment #277

ChenyuanZ opened this issue Oct 31, 2018 · 5 comments · Fixed by #437
Milestone

Comments

@ChenyuanZ
Copy link
Contributor

SeldonDeployment predictor supports replicas. It would be great if it can support autoscaler.

@ukclivecox
Copy link
Contributor

ukclivecox commented Nov 12, 2018

I think this would require some changes to the SeldonDeployment specification as the /scale operation assumes a single location for "replicas" in the definition. At present "replicas" is defined on a per-predictor basis. One option is:

  • Add a spec.replicas field which provides the number of replicas for any predictor that has not defined a spec.predictors[].replicas

This would allow you to specify per-predictor replicas as now but use a SeldonDeployment wide replicas if you wish and allow autoscaling to use this.

Separately, people may wish to define replicas on a per PodTemplateSpec level inside each predictor. If we wanted to do this also we could:

  • Allow an annotation in the podTemplateSpec.metadata where you can specify the desired number of replicas.

The lowest level of replicas setting would take precedence in the order:

spec.predictors[].componentsSpecs[].metadata.annotation
spec.predictors[].replicas
spec.replicas

Feedback welcome.

@sasvaritoni
Copy link
Contributor

I highly agree that the auto-scaling support would add great value.
I also think that @cliveseldon 's proposal makes sense.

What I am wondering: now that we have separate deployments e.g. for model images & engine in case of single model serving and other basic constructs, would we want to autoscale those together?
I mean let's suppose I have some heavy model for which I might need 10 replicas as a maximum for auto-scaling based on the load. But does it make sense to scale the xxx-svc-orch deployments to the same amount as well?

I admit that it would be pretty hard to find a good solution for this (maybe by specifying relative replica ratios for the predictors of the SeldonDeployment? ), so as a first step the solution proposed above would be fine.

@sasvaritoni
Copy link
Contributor

Btw, could you pleaseprovide some background info or maybe point to a doc regarding why the (K8s) deployment structure was changed so that the model container and the engine are now in a separate K8S deployment?
Thx!

@ukclivecox
Copy link
Contributor

The current latest master versions have the ability to run the service orchestrator internal to the first predictor deployment or as a separate deployment. By default the latest code will use the same deployment as the first podTemplateSpec defined in your SeldonDeployment graph. This should cover most use cases and is best for latency. We need to update docs to add the annotation to allow this configuration option.

@ukclivecox
Copy link
Contributor

PR #437 adds the ability to add HorizontalPodAutoscaler Specs for the defined PodTemplateSpecs.
This is different from what was previously proposed which was to do with manual use of the /scale endpoint for CRDs whereas this focuses on actual autoscaling.
The WIP uses V2beta1 of the HPA API. We could wait for V2beta2 to be availble in the Kubernetes Java client.
Feedback welcome on the WIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants