LLM Instance Gateway Proposal #8029

kfswain · 2024-08-21T02:56:09Z

Creating a proposal for the LLM Instance Gateway, as discussed in the wg-serving Wednesday meetings

linux-foundation-easycla · 2024-08-21T02:56:12Z

The committers listed above are authorized under a signed CLA.

✅ login: kfswain (1fb3f91, ab2942e, e6906ab)

k8s-ci-robot · 2024-08-21T02:56:17Z

Welcome @kfswain!

It looks like this is your first PR to kubernetes/community 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/community has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-08-21T02:56:18Z

Hi @kfswain. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-08-21T02:56:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kfswain
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

wg-serving/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2024-08-21T14:27:54Z

/ok-to-test

wg-serving/proposals/llm_instance_gateway/README.md

Jeffwan · 2024-08-21T14:57:54Z

wg-serving/proposals/llm_instance_gateway/README.md

+
+    waiting for a new 
+  model server to start.
+-  Efficient accelerator sharing - Use cases can use less than an accelerator


what's the use case of "use less than an accelerator"?

A model use-case or LoRA adapter that doesn't utilize the full batch size of an accelerator. This is meant to capture the value prop around why sharing accelerators is valuable (a single use-case will not always get the maximum value out of an accelerator at all times).

got you. At first glance, I mistook it for some other GPU sharing techniques.

wg-serving/proposals/llm_instance_gateway/README.md

Jeffwan · 2024-08-21T15:03:46Z

wg-serving/proposals/llm_instance_gateway/README.md

+with this LLM Instance Gateway, the model server would need to implement this
+protocol.
+
+Key requirements would roughly look like: 


em. I am thinking such protocol adaptation takes time to land into the inference server. Should we consider the routing only instead of lora management in the engine?

/cc @varungup90 this is the protocol from feature list

Agree this is a big rock, but I would like to make native support a goal for vllm. We should cover in the architecture proposal.

k8s-ci-robot · 2024-08-21T15:05:57Z

@Jeffwan: GitHub didn't allow me to request PR reviews from the following users: protocol, from, feature, list, varungup90, this, is, the.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @varungup90 this is the protocol from feature list

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jeffwan · 2024-08-21T15:09:59Z

Having an efficient and reliable LLM Gateway is essential. It serves as a bridge between inference engine and LLM management in scale. Bytedance is passionate about supporting innovative ideas like the LLM Gateway and are excited to participate in this project. By doing so, we aim to contribute to the development of routing algorithm, envoy extension, lora integration with engine that can bring significant benefits to LLM users on kubernetes.

wg-serving/proposals/llm_instance_gateway/README.md

kfswain · 2024-09-10T02:11:55Z

Hey all! I'm closing this PR as we now have: kubernetes-sigs/wg-serving#12. Thanks!

Create LLM Instance Gateway Proposal

e6906ab

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 21, 2024

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 21, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 21, 2024

k8s-ci-robot requested review from ArangoGutierrez and Jeffwan August 21, 2024 02:56

k8s-ci-robot added the wg/serving Categorizes an issue or PR as relevant to WG Serving. label Aug 21, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 21, 2024

smarterclayton reviewed Aug 21, 2024

View reviewed changes

wg-serving/proposals/llm_instance_gateway/README.md Outdated Show resolved Hide resolved

smarterclayton reviewed Aug 21, 2024

View reviewed changes

wg-serving/proposals/llm_instance_gateway/README.md Show resolved Hide resolved

Jeffwan reviewed Aug 21, 2024

View reviewed changes

formatting fixes

ab2942e

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 21, 2024

dprotaso reviewed Sep 4, 2024

View reviewed changes

wg-serving/proposals/llm_instance_gateway/README.md Outdated Show resolved Hide resolved

updating doc link

1fb3f91

kfswain mentioned this pull request Sep 10, 2024

Proposing LLM Instance Gateway kubernetes-sigs/wg-serving#12

Merged

kfswain closed this Sep 10, 2024

LLM Instance Gateway Proposal #8029

LLM Instance Gateway Proposal #8029

Uh oh!

Conversation

kfswain commented Aug 21, 2024

Uh oh!

linux-foundation-easycla bot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 21, 2024

Uh oh!

k8s-ci-robot commented Aug 21, 2024

Uh oh!

k8s-ci-robot commented Aug 21, 2024

Uh oh!

smarterclayton commented Aug 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jeffwan Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

kfswain Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Jeffwan Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jeffwan Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Jeffwan Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

smarterclayton Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Aug 21, 2024

Uh oh!

Jeffwan commented Aug 21, 2024

Uh oh!

Uh oh!

kfswain commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

linux-foundation-easycla bot commented Aug 21, 2024 •

edited

Loading