[Feature] Helm charts needs to have minimal resource requirements specified in default values for all containers #643

kad · 2024-12-13T07:27:45Z

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-GNR

Running nodes

Multiple Nodes

Description

Currently, all the containers in helm charts by default are missing resource requests/limits. This makes the Pod QoS in Kubernetes to be BestEffort, which is least priority workload on the node, leading to instability of performance.

Default values for all containers should be requesting some minimal amount of CPU and memory resources.
We don't need to specify limits for now, but minimal requests are mandatory to make containers into Burstable Pod QoS.

Minimal values can be easily collected from any of the benchmark runs, during "steady" state of execution. Values not needed to be precise, but some minimal viable numbers should be added.

The text was updated successfully, but these errors were encountered:

lianhao · 2024-12-16T06:53:47Z

The minimum resource requests will be very different for different models for those components directly using LLM models. For other components which doesn't directly involves models, that could be done.

kad · 2024-12-16T11:51:03Z

Even for different LLM models, we can have some minimal requirements: e.g. 1 vcpu, 100Mb of RAM.
For model which we consider default in helm chart, we can add more realistic minimal requirements.

eero-t · 2024-12-16T15:19:41Z

@kad There's some discussion about how to best handle the problem with (model) limits here: #431

kad · 2024-12-16T15:49:53Z

@kad There's some discussion about how to best handle the problem with (model) limits here: #431

that discussion in #431 seems to be stalled, but yes, my point is along the lines that is mentioned in some of the comments:

we have worst Pod QoS now without specifying anything: Best Efforts. We need to get all containers to Burstable QoS.
limits are really model specific and might be trieckier to manage. However, for this issue it is not needed currently. We need only requests to be spcified.
Too big requests can be a problem, but we need to start with some minimal viable values, based on current observations.

yongfengdu · 2024-12-19T07:50:53Z

Do we need to set the minimal resource request for all helm charts, or just the compute intensive one like TEI/TGI/vLLM?

eero-t · 2024-12-19T17:08:55Z

Do we need to set the minimal resource request for all helm charts, or just the compute intensive one like TEI/TGI/vLLM?

All should have them. Application is not much use if backend inferencing is idling because services before it are throttled, or even being evicted.

Suitable (minimum) requests should be much easier to specify for those other services, as their resource usage does not depend on which model user has set in Helm values.

yongfengdu · 2024-12-20T02:32:53Z

2 different scenarios.
Application(e2e Examples) should have these resource settings based on experience and the HW, Models etc. This is performance tuning and we can have recommendation profiles for different flavor.
But here for this issue, there is no best setting as @lianhao said, we are not really optimizing the App, we just set the minimal requests to make it being Burstable QoS.(I'm hesitating to add the same settings to dozens of helm charts without obvious benefit.)
Another options is to leave the common helm-charts not changed, and add the resource settings for the e2e examples.

Do we need to set the minimal resource request for all helm charts, or just the compute intensive one like TEI/TGI/vLLM?

All should have them. Application is not much use if backend inferencing is idling because services before it are throttled, or even being evicted.

Suitable (minimum) requests should be much easier to specify for those other services, as their resource usage does not depend on which model user has set in Helm values.

eero-t · 2024-12-20T10:37:38Z

According to k8s HPA docs: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

CPU/mem resource requests are also needed, and affect, CPU/mem utilization targets for autoscaling:

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod.
...
Please note that if some of the Pod's containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric.

I.e. when HPA is used with CPU components, the requests must be fairly accurate, not just some minimum values.

But that's anyway needed for deployments to work reasonably, when multiple components are allowed to same nodes. Scaling just makes the issue more visible.

eero-t · 2024-12-20T15:23:59Z

2 different scenarios. Application(e2e Examples) should have these resource settings based on experience and the HW, Models etc. This is performance tuning and we can have recommendation profiles for different flavor. But here for this issue, there is no best setting as @lianhao said, we are not really optimizing the App, we just set the minimal requests to make it being Burstable QoS.(I'm hesitating to add the same settings to dozens of helm charts without obvious benefit.)

Some reasonable minimum request values are better than no values. It guarantees that application can do at least some progress when node is constrained, instead of being completely throttled, or even evicted. Or do you see some downside?

More accurate resource requests would naturally be better, so that deployments get (autoscaled +) scheduled correctly, and are guaranteed correct amount of resources even in constrained situations, but that can be optimized later.

kad added the feature New feature or request label Dec 13, 2024

yongfengdu mentioned this issue Dec 17, 2024

Move tei and embedding usvc to GenAIComps opea-project/GenAIComps#1033

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Helm charts needs to have minimal resource requirements specified in default values for all containers #643

[Feature] Helm charts needs to have minimal resource requirements specified in default values for all containers #643

kad commented Dec 13, 2024

lianhao commented Dec 16, 2024

kad commented Dec 16, 2024

eero-t commented Dec 16, 2024

kad commented Dec 16, 2024

yongfengdu commented Dec 19, 2024

eero-t commented Dec 19, 2024

yongfengdu commented Dec 20, 2024

eero-t commented Dec 20, 2024 •

edited

Loading

eero-t commented Dec 20, 2024

[Feature] Helm charts needs to have minimal resource requirements specified in default values for all containers #643

[Feature] Helm charts needs to have minimal resource requirements specified in default values for all containers #643

Comments

kad commented Dec 13, 2024

Priority

OS type

Hardware type

Running nodes

Description

lianhao commented Dec 16, 2024

kad commented Dec 16, 2024

eero-t commented Dec 16, 2024

kad commented Dec 16, 2024

yongfengdu commented Dec 19, 2024

eero-t commented Dec 19, 2024

yongfengdu commented Dec 20, 2024

eero-t commented Dec 20, 2024 • edited Loading

eero-t commented Dec 20, 2024

eero-t commented Dec 20, 2024 •

edited

Loading