Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support autoscaling #3

Open
kerthcet opened this issue Nov 23, 2023 · 14 comments
Open

Support autoscaling #3

kerthcet opened this issue Nov 23, 2023 · 14 comments
Assignees
Labels
enhancement Categorizes issue or PR as related to a new feature with API changes. feature Categorizes issue or PR as related to a new feature. important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@kerthcet
Copy link
Member

kerthcet commented Nov 23, 2023

As the service.Spec describes, we have minReplicas and maxReplicas, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative as alternatives, but here we hope we can have a simple implementation, then no need to depend on other libraries.

For the first step, let's integrate with HPA for autoscaling capacities.

@kerthcet kerthcet added the enhancement Categorizes issue or PR as related to a new feature with API changes. label Nov 23, 2023
@kerthcet
Copy link
Member Author

/milestone v0.0.1

@kerthcet
Copy link
Member Author

/kind feature

@InftyAI-Agent InftyAI-Agent added this to the v0.0.1 milestone Jul 10, 2024
@kerthcet
Copy link
Member Author

/milestone clear

@InftyAI-Agent InftyAI-Agent added the feature Categorizes issue or PR as related to a new feature. label Jul 10, 2024
@InftyAI-Agent InftyAI-Agent removed this from the v0.0.1 milestone Jul 10, 2024
@kerthcet
Copy link
Member Author

/priority important-longterm

@InftyAI-Agent InftyAI-Agent added the important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 15, 2024
@kerthcet
Copy link
Member Author

kerthcet commented Aug 5, 2024

/milestone v0.2.0

@googs1025
Copy link
Contributor

/assign

If the service controller needs to be integrated with hpa, I am willing to give it a try. Is it related to service.Spec.WorkloadTemplate.Replicas?

@googs1025
Copy link
Contributor

type ElasticConfig struct {
	// MinReplicas indicates the minimum number of inference workloads based on the traffic.
	// Default to nil means we can scale down the instances to 1.
	// If minReplicas set to 0, it requires to install serverless component at first.
	// +kubebuilder:default=1
	// +optional
	MinReplicas *int32 `json:"minReplicas,omitempty"`
	// MaxReplicas indicates the maximum number of inference workloads based on the traffic.
	// Default to nil means there's no limit for the instance number.
	// +optional
	MaxReplicas *int32 `json:"maxReplicas,omitempty"`
	// Metrics contains the specifications which are used to calculate the
	// desired replica count (the maximum replica count across all metrics will
	// be used).  The desired replica count is calculated with multiplying the
	// ratio between the target value and the current value by the current
	// number of pods. Ergo, metrics used must decrease as the pod count is
	// increased, and vice-versa.  See the individual metric source types for
	// more information about how each type of metric must respond.
	// If not set, the HPA will not be created.
	// +optional
	Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`
}

@kerthcet
Should we integrate hpa metrics so that we can also set the required metrics in ElasticConfig

@kerthcet
Copy link
Member Author

I will revisit this latter, but in my imagination, I just don't want to copy the fields from HPA to ElasticConfig, I hope it can work with various systems, like HPA, keda, so the fields should be abstract sufficiently.

@googs1025
Copy link
Contributor

Indeed,. That is, we only need to abstract the fields. The controller provides a provider-like interface (e.g. HPAProvides) internally. These features are implemented internally. right?

@kerthcet
Copy link
Member Author

@googs1025 would you like to implement the hpa as our first step, I think we have to align with lws right now which only supports hpa only.

But let's not use the autoscalingv2 library directly, let's build our structure instead. And we can break into 2 PRs, one for API and another for implementation. Tell us if you're interested, thanks anyway.

@googs1025
Copy link
Contributor

I'll take a look at it over this weekend and put some thoughts on.

@kerthcet
Copy link
Member Author

Thanks, it's a really important feature to us.

@kerthcet
Copy link
Member Author

/assign

Take it over as target for milestone v0.1.0
/milestone v0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Categorizes issue or PR as related to a new feature with API changes. feature Categorizes issue or PR as related to a new feature. important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

3 participants