Add minimal resource requests for tgi #662

yongfengdu · 2024-12-31T07:38:27Z

Description

This is the first PR for #643 , the plan is to get tgi change reviewed and merged, then doing the same for other charts.(To avoid too many CI testing)
Tried several requests settings(With small values) and it doesn't have obvious impact on the resource usage.
So set this should be good enough.
cpu: 100m
memory: 128Mi

It's performance tuning's task to set more reasonable resource limit/requests according to different scenarios.

Issues

#643

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

Signed-off-by: Dolpher Du <dolpher.du@intel.com>

lianhao · 2025-01-02T00:37:21Z

helm-charts/common/tgi/values.yaml

-  #   memory: 128Mi
+  requests:
+    cpu: 100m
+    memory: 128Mi


Will these numbers are too small? Have you tested it with a minimal model?

The requests are just minimal requests used by scheduler, the actual usage is not affected by this setting.
kubectl describe pod xxx shows that it's Burstable.
QoS Class: Burstable

The actual resources are not affected(The CPU cores could be 6000m CPUs when busy, but is just 2m at idle):
NAME CPU(cores) MEMORY(bytes)
tgi-57c674bfb8-k2h4m 1002m 15773Mi
tgi1-55558f6c9-kmqr9 1002m 15779Mi
tgi2-7854bf5569-qm5p2 2m 16232Mi

For the priority class improvement, request can be for service resource usage when it's idling. This is OK for start.

However, a service that has resources just for idling (when node is otherwise stressed), is not a very useful one.

For better resource requests, values should correspond to active service usage. Problem with that is lack of service SLA definitions for the OPEA services, what kind of load they're supposed to handle, on what HW, and with what performance?

Once the load & HW parts are defined, active service usage can be tested, and corresponding service resource usage determined for the resource requests.

(In TGI case where resource usage is dictated by used model, and options related to it, SLA should define also those.)

Add minimal resource requests for tgi

46df32d

Signed-off-by: Dolpher Du <dolpher.du@intel.com>

yongfengdu requested a review from lianhao as a code owner December 31, 2024 07:38

lianhao reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add minimal resource requests for tgi #662

Add minimal resource requests for tgi #662

yongfengdu commented Dec 31, 2024

lianhao Jan 2, 2025

yongfengdu Jan 2, 2025

eero-t Jan 2, 2025

Add minimal resource requests for tgi #662

Are you sure you want to change the base?

Add minimal resource requests for tgi #662

Conversation

yongfengdu commented Dec 31, 2024

Description

Issues

Type of change

Dependencies

Tests

lianhao Jan 2, 2025

Choose a reason for hiding this comment

yongfengdu Jan 2, 2025

Choose a reason for hiding this comment

eero-t Jan 2, 2025

Choose a reason for hiding this comment