[Doc] More neutral K8s deployment guide #14084

terrytangyuan · 2025-03-02T00:34:40Z

This is to follow-up unaddressed comments in #13841, specifically:

Removed reference to a specific project and switched to reference the integrations page instead;
Removed the phrase that other OSS projects can "make your deployment even smoother" since this is opinionated;
Removed a guide in a specific project and switched to use K8s official guide instead.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

github-actions · 2025-03-02T00:34:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

KuntaiDu

The official Kubernetes doc linked here gives user a "OK now you know how to fly, just go fly yourself" vibe and is probably to difficult for a new user to digest. Request changing to a more user-friendly documentation.

docs/source/deployment/k8s.md

Hanchenli

Since users will (or should) visit this page when they are first trying to deploy their K8s vLLM stack, it is best that we keep some clean references instead of just passing the buck to another link.

The integration link contains many repositories that are less dedicated to vLLM inference. For example, Llama Stack does not only support vLLM but also other API providers like TGI, together AI, Groq... I do not think that is something that should appear when people are solely looking for a K8s deployment for vLLM. On the other hand, I think we should move KubeAI to this page if they are dedicated to vLLM in the future.

I agree with you on the pre-requisite change that it is better to link them to an official K8s page. However, the link you provide is not easy-to-understand enough for beginners. Can you change the link to a more beginner-friendly one.

terrytangyuan · 2025-03-02T00:54:51Z

I don't have better alternatives. The official K8s guide is the most vendor-neutral instructions (e.g. not just NVIDIA GPUs) and contains references to official vendor-specific instructions.

The current link from production-stack only provides instructions on NVIDIA GPUs and has specific prerequisites that are OS and vendor specific. It also uses specific versions Minikube and GPU Operator that can get easily outdated (instead of being maintained by vendors). It also requires Helm as a dependency.

Hanchenli · 2025-03-02T01:04:57Z

I think you got a point about the K8s installation. I totally agree with you that the current link in the documentation page needs an update. I just worry that users will not be able to install K8s if they followed the link you put in the PR so they can't even get started.

Could we get something more user friendly for that part? I could search for some better official tutorial in the mean time as well.

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack (they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?)

terrytangyuan · 2025-03-02T01:10:51Z

We probably should not require GPUs as a prerequisite at all. I can help add a CPU-only K8s deployment guide as a follow-up PR if that's helpful.

terrytangyuan · 2025-03-02T02:02:02Z

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack

I think currently these two pages are a bit confusing so there perhaps needs to be a separate discussion on this:

KServe, KubeAI, production stack, llmaz, LWS, Helm probably should belong in the same category since they are all dedicated to deploying on Kubernetes. That should probably be done separately. I think integrations page is probably the best option so far given that only Llama Stack is irrelevant to K8s. For this PR, I am happy to remove the link to integrations page if that works for you.

they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?

We can simply remove the link if that becomes the case.

Hanchenli · 2025-03-02T02:24:49Z

Apologies but I am a little confused of what change you are proposing exactly.

For this PR, I am happy to remove the link to integrations page if that works for you.

Could you commit again for what you are proposing? (should be several lines) I am glad that we agree that at least Llama-Index has little to do with deploying vLLM on K8s. But I am not very sure about the rest. Would love to talk about how we should put KServe, KubeAI ... etc into the documentation.

terrytangyuan · 2025-03-02T02:29:17Z

docs/source/deployment/k8s.md

-* [vLLM production-stack](https://github.com/vllm-project/production-stack): Born out of a Berkeley-UChicago collaboration, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples.
-
--------
+Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.


Suggested change

Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.

@Hanchenli Here's my proposed change.

I see now. Why don’t we just keep adding direct links on this page instead of moving them all to another page? Btw production stack is not an integration, it is part of the vLLM-project.

That will duplicate the list and introduces maintenance burden. Ideally we should just reorganize those two lists (in a separate PR since redirects need to be properly handled) and then cross-reference the link to the appropriate list here.

I see. So this PR is dependent on the re-organization of the two list. I would suggest that we should first handle the separate PR that figures out the two lists first.

After that we will have a much better sense of what we want to do on this page and PR.

I have updated the PR to include both integrations page and frameworks page just to make it more neutral for all tools before any reorganization.

I suggest that we should wait for more people’s opinion first.

To be fair, this PR focuses on reverting changes in #13841 that were not supposed to be merged since there were very explicit comments that were left unaddressed. People's opinions were expressed in #13841 yet they were not addressed and the PR was merged without waiting for another review.

The conclusion of discussion in #13841 is similar to what I am claiming here. We should put everything in this page instead of removing everything. This is for convenience for beginners. Again, We should add more but not remove more. And you are claiming the opposite in this PR.

And here is why I think the discussion in #13841 was resolved so the PR could be merged. @KuntaiDu could see if that is reasonable and justify for himself. To me, the main concern proposed were that we need to stay neutral and mention more frameworks that belongs to vllm project. This is true and the author explained the future plan to be neutral (by adding more direct links). And this was agreed by two reviewers. Now you are suggesting we should take the opposite approach.

Regarding your last comment before the merge. I think that comment was following a similar logic that the page should be neutral and include more framework. Thus the merging.

Again, the discussion in #13841 is for this page to be neutral and we should add more links to it. The way for this page to be neutral and helpful was proposed and approved. Now you are suggesting another way to be neutral and add more links. That is why I am asking we should wait for more comments.

Hope it makes things clear and I am happy to chat more.

I agree vLLM on k8s is a general idea, and considering KServe is listed as a separate page already, I think we should continue to list all options. Keeping this doc "vanilla k8s" seems fair

I'll let @mgoin to chime in here but my interpretation of his comment #13841 (review) is that we should keep this doc vanilla K8s only and mention all options available (which is why I added both links to integrations and frameworks).

This is true and the author explained the future plan to be neutral (by adding more direct links). And this was agreed by two reviewers. Now you are suggesting we should take the opposite approach.

Maybe I missed it somewhere but where was this agreed by two reviewers after that comment #13841 (comment) by the author? The author merged the PR after I made my comment #13841 (comment). My previous in-line comment #13841 (comment) was also not addressed.

I'll let @mgoin to chime in here but my interpretation of his comment #13841 (review) is that we should keep this doc vanilla K8s only and mention all options available (which is why I added both links to integrations and frameworks).

We should keep the k8s page under open-governance and fit the need of users. That part I 100% agree with @terrytangyuan . However, the best way to keep the page under open governance is probably to let more people chime in and give user choices, instead of giving user official docs and say "OK now you know how to fly, just go fly yourself". Infra and K8s is scary to new users, and we should give them more care and love.

Happy to hear more from @mgoin and let's make the K8s page a better version!

However, the best way to keep the page under open governance is probably to let more people chime in and give user choices

I agree, which is why I removed the specific link to production stack and added links to frameworks and integrations page that provide more options for users.

Infra and K8s is scary to new users, and we should give them more care and love.

In case you missed my previous comments, please see #14084 (comment) to see why the new link is more vendor-neutral, easier to maintain, and more lightweight than the current link. Also see #14084 (comment) on my plan to make it even more beginner friendly with a CPU deployment guide.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

KuntaiDu · 2025-03-08T07:42:16Z

Maybe we can still keep the k8s-deployment-related project list there, but use the vendor-neutral k8s documentation instead of the current one. What's your thought on this @terrytangyuan ?

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

terrytangyuan · 2025-03-09T01:43:07Z

Updated

Hanchenli · 2025-03-09T03:55:51Z

Yeah that is something that I was talking about. What’s the rationale for the order?

Hanchenli · 2025-03-09T03:56:51Z

Also I suggest we should just add vllm ecosystem projects in this link.

terrytangyuan · 2025-03-09T18:21:39Z

Alphabetical.

Also I suggest we should just add vllm ecosystem projects in this link.

What do you mean? Which projects?

Jeffwan · 2025-03-10T08:50:50Z

I'm slightly late. I believe it would be beneficial to list various options for users with different requirements, enabling them to select the appropriate stack to begin with. Thanks for driving the efforts. Given that these options may expand over time, we might consider categorizing them into distinct groups. However, I'm uncertain about the best way to organize this. I see @Hanchenli suggests vLLM community project, I think we can embrace more project from other communities. This is kind of overrlaped area and some cloud-native solutions like KServe/LWS has wide adoption and great to see their continuous efforts on vLLM deployment improvements.

Offering neutral Kubernetes setup guidance is a challenging aspect. Different cloud providers offer diverse toolkits for cluster setup. I've read the current guidance at https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md. It seems suitable for some beginners, but it's not the recommended setup for production scenarios. Most Kubernetes solutions assume that clusters are created without delving too deeply into detailed configurations, such as IAM roles, whether to use a managed node group, GPU AMI, and so on. Although overlooking these details might seem reasonable in some contexts, it's important to note that guiding users through the recommended configurations typically lies within the purview of specific projects list above. Our role could be to utilize this page as a means to direct users towards those projects that offer such in-depth guidance.

terrytangyuan · 2025-03-10T15:56:02Z

@Jeffwan 100% agreed. We should be more inclusive on the list of projects here.

I originally only added a link to integrations and frameworks page since I am not expert in each tool and don't know 100% whether some of them support deployment to Kubernetes. I updated this page to incorporate both @KuntaiDu and @Hanchenli's suggestions to add the list of projects list explicitly.

Are we good to move this forward?

KuntaiDu · 2025-03-13T21:55:33Z

Yeah totally agree that we need to put more ecosystem projects into the list. The key is ordering --- I guess alphabetical ordering is not fair as it encourages people to start a project which name starts with "A". How about we write a github action to ensure that all projects are randomly demosntrated?

terrytangyuan · 2025-03-13T22:27:28Z

That's way beyond the scope of this PR. "Alphabetical ordering is not fair" seems like a big topic. At least listing all relevant projects is fairer than listing only one project.

Could we merge this and propose additional changes separately?

simon-mo · 2025-03-14T19:22:47Z

Here's my recommendation:

We can sort and add the github organization prefix so we have fully qualified names such as vllm-project/production-stack, vllm-project/aibrix, and kubernetes-sigs/lws, etc.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

[DOC] More neutral K8s deployment guide

57196b3

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

mergify bot added the documentation Improvements or additions to documentation label Mar 2, 2025

terrytangyuan mentioned this pull request Mar 2, 2025

[Documentation] Add more deployment guide for Kubernetes deployment #13841

Merged

terrytangyuan changed the title ~~[DOC] More neutral K8s deployment guide~~ [Doc] More neutral K8s deployment guide Mar 2, 2025

KuntaiDu requested changes Mar 2, 2025

View reviewed changes

docs/source/deployment/k8s.md Show resolved Hide resolved

Hanchenli suggested changes Mar 2, 2025

View reviewed changes

terrytangyuan commented Mar 2, 2025

View reviewed changes

Update

b9fa85b

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Add explicit list

6be604f

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

terrytangyuan force-pushed the update-k8s-guide branch from 9d13314 to 6be604f Compare March 8, 2025 16:27

terrytangyuan requested review from Hanchenli and KuntaiDu March 8, 2025 16:28

Merge branch 'main' into update-k8s-guide

202e37c

sort by org/repo

681943a

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Minor change to make it look better

7843ad5

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

simon-mo approved these changes Mar 14, 2025

View reviewed changes

simon-mo merged commit 54a8804 into vllm-project:main Mar 14, 2025
11 of 13 checks passed

terrytangyuan deleted the update-k8s-guide branch March 15, 2025 00:10

terrytangyuan mentioned this pull request Mar 15, 2025

[DOC] Add Kubernetes deployment guide with CPUs #14865

Merged

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Doc] More neutral K8s deployment guide (vllm-project#14084)

74fe43b

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Doc] More neutral K8s deployment guide (vllm-project#14084)

00c44ef

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Doc] More neutral K8s deployment guide (vllm-project#14084)

c8a6ce2

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Uh oh!

[Doc] More neutral K8s deployment guide #14084

[Doc] More neutral K8s deployment guide #14084

Uh oh!

Conversation

terrytangyuan commented Mar 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

KuntaiDu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Hanchenli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

terrytangyuan commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hanchenli commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terrytangyuan commented Mar 2, 2025

Uh oh!

terrytangyuan commented Mar 2, 2025

Uh oh!

Hanchenli commented Mar 2, 2025

Uh oh!

terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

terrytangyuan Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

terrytangyuan Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

terrytangyuan Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

terrytangyuan Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu commented Mar 8, 2025

Uh oh!

terrytangyuan commented Mar 9, 2025

Uh oh!

Hanchenli commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hanchenli commented Mar 9, 2025

Uh oh!

terrytangyuan commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jeffwan commented Mar 10, 2025

Uh oh!

terrytangyuan commented Mar 10, 2025

Uh oh!

terrytangyuan commented Mar 2, 2025 •

edited by github-actions bot

Loading

KuntaiDu left a comment •

edited

Loading

Hanchenli left a comment •

edited

Loading

terrytangyuan commented Mar 2, 2025 •

edited

Loading

Hanchenli commented Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

KuntaiDu Mar 3, 2025 •

edited

Loading

Hanchenli commented Mar 9, 2025 •

edited

Loading

terrytangyuan commented Mar 9, 2025 •

edited

Loading

terrytangyuan commented Mar 13, 2025 •

edited

Loading