Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal resource requests for tgi #662

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yongfengdu
Copy link
Collaborator

Description

This is the first PR for #643 , the plan is to get tgi change reviewed and merged, then doing the same for other charts.(To avoid too many CI testing)
Tried several requests settings(With small values) and it doesn't have obvious impact on the resource usage.
So set this should be good enough.
cpu: 100m
memory: 128Mi

It's performance tuning's task to set more reasonable resource limit/requests according to different scenarios.

Issues

#643

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

Signed-off-by: Dolpher Du <dolpher.du@intel.com>
@yongfengdu yongfengdu requested a review from lianhao as a code owner December 31, 2024 07:38
# memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these numbers are too small? Have you tested it with a minimal model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests are just minimal requests used by scheduler, the actual usage is not affected by this setting.
kubectl describe pod xxx shows that it's Burstable.
QoS Class: Burstable

The actual resources are not affected(The CPU cores could be 6000m CPUs when busy, but is just 2m at idle):
NAME CPU(cores) MEMORY(bytes)
tgi-57c674bfb8-k2h4m 1002m 15773Mi
tgi1-55558f6c9-kmqr9 1002m 15779Mi
tgi2-7854bf5569-qm5p2 2m 16232Mi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the priority class improvement, request can be for service resource usage when it's idling. This is OK for start.

However, a service that has resources just for idling (when node is otherwise stressed), is not a very useful one.

For better resource requests, values should correspond to active service usage. Problem with that is lack of service SLA definitions for the OPEA services, what kind of load they're supposed to handle, on what HW, and with what performance?

Once the load & HW parts are defined, active service usage can be tested, and corresponding service resource usage determined for the resource requests.

(In TGI case where resource usage is dictated by used model, and options related to it, SLA should define also those.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants