-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support autoscaling #3
Comments
/milestone v0.0.1 |
/kind feature |
/milestone clear |
/priority important-longterm |
/milestone v0.2.0 |
/assign If the service controller needs to be integrated with hpa, I am willing to give it a try. Is it related to service.Spec.WorkloadTemplate.Replicas? |
type ElasticConfig struct {
// MinReplicas indicates the minimum number of inference workloads based on the traffic.
// Default to nil means we can scale down the instances to 1.
// If minReplicas set to 0, it requires to install serverless component at first.
// +kubebuilder:default=1
// +optional
MinReplicas *int32 `json:"minReplicas,omitempty"`
// MaxReplicas indicates the maximum number of inference workloads based on the traffic.
// Default to nil means there's no limit for the instance number.
// +optional
MaxReplicas *int32 `json:"maxReplicas,omitempty"`
// Metrics contains the specifications which are used to calculate the
// desired replica count (the maximum replica count across all metrics will
// be used). The desired replica count is calculated with multiplying the
// ratio between the target value and the current value by the current
// number of pods. Ergo, metrics used must decrease as the pod count is
// increased, and vice-versa. See the individual metric source types for
// more information about how each type of metric must respond.
// If not set, the HPA will not be created.
// +optional
Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`
} @kerthcet |
I will revisit this latter, but in my imagination, I just don't want to copy the fields from HPA to ElasticConfig, I hope it can work with various systems, like HPA, keda, so the fields should be abstract sufficiently. |
Indeed,. That is, we only need to abstract the fields. The controller provides a provider-like interface (e.g. HPAProvides) internally. These features are implemented internally. right? |
Some related metrics: |
@googs1025 would you like to implement the hpa as our first step, I think we have to align with lws right now which only supports hpa only. But let's not use the |
I'll take a look at it over this weekend and put some thoughts on. |
Thanks, it's a really important feature to us. |
/assign Take it over as target for milestone v0.1.0 |
As the
service.Spec
describes, we haveminReplicas
andmaxReplicas
, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative as alternatives, but here we hope we can have a simple implementation, then no need to depend on other libraries.For the first step, let's integrate with HPA for autoscaling capacities.
The text was updated successfully, but these errors were encountered: