Support autoscaling #3

kerthcet · 2023-11-23T06:20:12Z

As the service.Spec describes, we have minReplicas and maxReplicas, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative as alternatives, but here we hope we can have a simple implementation, then no need to depend on other libraries.

For the first step, let's integrate with HPA for autoscaling capacities.

The text was updated successfully, but these errors were encountered:

kerthcet · 2024-07-10T02:29:49Z

/milestone v0.0.1

kerthcet · 2024-07-10T02:30:01Z

/kind feature

kerthcet · 2024-07-10T02:30:10Z

/milestone clear

kerthcet · 2024-07-15T05:43:42Z

/priority important-longterm

kerthcet · 2024-08-05T03:07:16Z

/milestone v0.2.0

googs1025 · 2024-09-24T05:38:08Z

/assign

If the service controller needs to be integrated with hpa, I am willing to give it a try. Is it related to service.Spec.WorkloadTemplate.Replicas?

googs1025 · 2024-09-26T04:51:12Z

type ElasticConfig struct {
	// MinReplicas indicates the minimum number of inference workloads based on the traffic.
	// Default to nil means we can scale down the instances to 1.
	// If minReplicas set to 0, it requires to install serverless component at first.
	// +kubebuilder:default=1
	// +optional
	MinReplicas *int32 `json:"minReplicas,omitempty"`
	// MaxReplicas indicates the maximum number of inference workloads based on the traffic.
	// Default to nil means there's no limit for the instance number.
	// +optional
	MaxReplicas *int32 `json:"maxReplicas,omitempty"`
	// Metrics contains the specifications which are used to calculate the
	// desired replica count (the maximum replica count across all metrics will
	// be used).  The desired replica count is calculated with multiplying the
	// ratio between the target value and the current value by the current
	// number of pods. Ergo, metrics used must decrease as the pod count is
	// increased, and vice-versa.  See the individual metric source types for
	// more information about how each type of metric must respond.
	// If not set, the HPA will not be created.
	// +optional
	Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`
}

@kerthcet
Should we integrate hpa metrics so that we can also set the required metrics in ElasticConfig

kerthcet · 2024-09-26T11:00:28Z

I will revisit this latter, but in my imagination, I just don't want to copy the fields from HPA to ElasticConfig, I hope it can work with various systems, like HPA, keda, so the fields should be abstract sufficiently.

googs1025 · 2024-09-26T11:57:59Z

Indeed,. That is, we only need to abstract the fields. The controller provides a provider-like interface (e.g. HPAProvides) internally. These features are implemented internally. right?

kerthcet · 2024-10-30T02:38:32Z

Some related metrics:

vllm: [Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes vllm-project/vllm#5041
TGI: [Feature]: Additional metrics to enable better autoscaling / load balancing of TGI servers in Kubernetes huggingface/text-generation-inference#1977

kerthcet · 2024-12-23T09:46:41Z

@googs1025 would you like to implement the hpa as our first step, I think we have to align with lws right now which only supports hpa only.

But let's not use the autoscalingv2 library directly, let's build our structure instead. And we can break into 2 PRs, one for API and another for implementation. Tell us if you're interested, thanks anyway.

googs1025 · 2024-12-24T03:38:02Z

I'll take a look at it over this weekend and put some thoughts on.

kerthcet · 2024-12-24T07:40:41Z

Thanks, it's a really important feature to us.

kerthcet · 2025-01-21T07:34:36Z

/assign

Take it over as target for milestone v0.1.0
/milestone v0.1.0

kerthcet added the enhancement Categorizes issue or PR as related to a new feature with API changes. label Nov 23, 2023

InftyAI-Agent added this to the v0.0.1 milestone Jul 10, 2024

InftyAI-Agent added the feature Categorizes issue or PR as related to a new feature. label Jul 10, 2024

InftyAI-Agent removed this from the v0.0.1 milestone Jul 10, 2024

InftyAI-Agent added the important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 15, 2024

InftyAI-Agent modified the milestones: v0.1.0, v0.2.0 Aug 5, 2024

kerthcet mentioned this issue Sep 13, 2024

feature: add scalePolicy in lws kubernetes-sigs/lws#213

Closed

3 tasks

InftyAI-Agent modified the milestones: v0.2.0, v0.1.0 Jan 21, 2025

InftyAI-Agent assigned kerthcet Jan 21, 2025

This was referenced Jan 22, 2025

Add scalingPolicy API for elastic scenario #249

Merged

Support Scale in Playground and Service #251

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support autoscaling #3

Support autoscaling #3

kerthcet commented Nov 23, 2023 •

edited

Loading

kerthcet commented Jul 10, 2024

kerthcet commented Jul 10, 2024

kerthcet commented Jul 10, 2024

kerthcet commented Jul 15, 2024

kerthcet commented Aug 5, 2024

googs1025 commented Sep 24, 2024

googs1025 commented Sep 26, 2024

kerthcet commented Sep 26, 2024

googs1025 commented Sep 26, 2024

kerthcet commented Oct 30, 2024

kerthcet commented Dec 23, 2024

googs1025 commented Dec 24, 2024

kerthcet commented Dec 24, 2024

kerthcet commented Jan 21, 2025

Support autoscaling #3

Support autoscaling #3

Comments

kerthcet commented Nov 23, 2023 • edited Loading

kerthcet commented Jul 10, 2024

kerthcet commented Jul 10, 2024

kerthcet commented Jul 10, 2024

kerthcet commented Jul 15, 2024

kerthcet commented Aug 5, 2024

googs1025 commented Sep 24, 2024

googs1025 commented Sep 26, 2024

kerthcet commented Sep 26, 2024

googs1025 commented Sep 26, 2024

kerthcet commented Oct 30, 2024

kerthcet commented Dec 23, 2024

googs1025 commented Dec 24, 2024

kerthcet commented Dec 24, 2024

kerthcet commented Jan 21, 2025

kerthcet commented Nov 23, 2023 •

edited

Loading