Skip to content

Conversation

@terrytangyuan
Copy link
Contributor

@terrytangyuan terrytangyuan commented Mar 2, 2025

This is to follow-up unaddressed comments in #13841, specifically:

  1. Removed reference to a specific project and switched to reference the integrations page instead;
  2. Removed the phrase that other OSS projects can "make your deployment even smoother" since this is opinionated;
  3. Removed a guide in a specific project and switched to use K8s official guide instead.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@github-actions
Copy link

github-actions bot commented Mar 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 2, 2025
@terrytangyuan terrytangyuan changed the title [DOC] More neutral K8s deployment guide [Doc] More neutral K8s deployment guide Mar 2, 2025
Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official Kubernetes doc linked here gives user a "OK now you know how to fly, just go fly yourself" vibe and is probably to difficult for a new user to digest. Request changing to a more user-friendly documentation.

Copy link
Contributor

@Hanchenli Hanchenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since users will (or should) visit this page when they are first trying to deploy their K8s vLLM stack, it is best that we keep some clean references instead of just passing the buck to another link.

The integration link contains many repositories that are less dedicated to vLLM inference. For example, Llama Stack does not only support vLLM but also other API providers like TGI, together AI, Groq... I do not think that is something that should appear when people are solely looking for a K8s deployment for vLLM. On the other hand, I think we should move KubeAI to this page if they are dedicated to vLLM in the future.

I agree with you on the pre-requisite change that it is better to link them to an official K8s page. However, the link you provide is not easy-to-understand enough for beginners. Can you change the link to a more beginner-friendly one.

@terrytangyuan
Copy link
Contributor Author

terrytangyuan commented Mar 2, 2025

I don't have better alternatives. The official K8s guide is the most vendor-neutral instructions (e.g. not just NVIDIA GPUs) and contains references to official vendor-specific instructions.

The current link from production-stack only provides instructions on NVIDIA GPUs and has specific prerequisites that are OS and vendor specific. It also uses specific versions Minikube and GPU Operator that can get easily outdated (instead of being maintained by vendors). It also requires Helm as a dependency.

@Hanchenli
Copy link
Contributor

Hanchenli commented Mar 2, 2025

I think you got a point about the K8s installation. I totally agree with you that the current link in the documentation page needs an update. I just worry that users will not be able to install K8s if they followed the link you put in the PR so they can't even get started.

Could we get something more user friendly for that part? I could search for some better official tutorial in the mean time as well.

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack (they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?)

@terrytangyuan
Copy link
Contributor Author

We probably should not require GPUs as a prerequisite at all. I can help add a CPU-only K8s deployment guide as a follow-up PR if that's helpful.

@terrytangyuan
Copy link
Contributor Author

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack

I think currently these two pages are a bit confusing so there perhaps needs to be a separate discussion on this:

  1. https://docs.vllm.ai/en/latest/deployment/frameworks/index.html
  2. https://docs.vllm.ai/en/latest/deployment/integrations/index.html

KServe, KubeAI, production stack, llmaz, LWS, Helm probably should belong in the same category since they are all dedicated to deploying on Kubernetes. That should probably be done separately. I think integrations page is probably the best option so far given that only Llama Stack is irrelevant to K8s. For this PR, I am happy to remove the link to integrations page if that works for you.

they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?

We can simply remove the link if that becomes the case.

@Hanchenli
Copy link
Contributor

Apologies but I am a little confused of what change you are proposing exactly.

For this PR, I am happy to remove the link to integrations page if that works for you.

Could you commit again for what you are proposing? (should be several lines) I am glad that we agree that at least Llama-Index has little to do with deploying vLLM on K8s. But I am not very sure about the rest. Would love to talk about how we should put KServe, KubeAI ... etc into the documentation.

* [vLLM production-stack](https://github.com/vllm-project/production-stack): Born out of a Berkeley-UChicago collaboration, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples.

--------
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hanchenli Here's my proposed change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. Why don’t we just keep adding direct links on this page instead of moving them all to another page? Btw production stack is not an integration, it is part of the vLLM-project.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will duplicate the list and introduces maintenance burden. Ideally we should just reorganize those two lists (in a separate PR since redirects need to be properly handled) and then cross-reference the link to the appropriate list here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So this PR is dependent on the re-organization of the two list. I would suggest that we should first handle the separate PR that figures out the two lists first.

After that we will have a much better sense of what we want to do on this page and PR.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to include both integrations page and frameworks page just to make it more neutral for all tools before any reorganization.

I suggest that we should wait for more people’s opinion first.

To be fair, this PR focuses on reverting changes in #13841 that were not supposed to be merged since there were very explicit comments that were left unaddressed. People's opinions were expressed in #13841 yet they were not addressed and the PR was merged without waiting for another review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conclusion of discussion in #13841 is similar to what I am claiming here. We should put everything in this page instead of removing everything. This is for convenience for beginners. Again, We should add more but not remove more. And you are claiming the opposite in this PR.

And here is why I think the discussion in #13841 was resolved so the PR could be merged. @KuntaiDu could see if that is reasonable and justify for himself. To me, the main concern proposed were that we need to stay neutral and mention more frameworks that belongs to vllm project. This is true and the author explained the future plan to be neutral (by adding more direct links). And this was agreed by two reviewers. Now you are suggesting we should take the opposite approach.

Regarding your last comment before the merge. I think that comment was following a similar logic that the page should be neutral and include more framework. Thus the merging.

Again, the discussion in #13841 is for this page to be neutral and we should add more links to it. The way for this page to be neutral and helpful was proposed and approved. Now you are suggesting another way to be neutral and add more links. That is why I am asking we should wait for more comments.

Hope it makes things clear and I am happy to chat more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree vLLM on k8s is a general idea, and considering KServe is listed as a separate page already, I think we should continue to list all options. Keeping this doc "vanilla k8s" seems fair

I'll let @mgoin to chime in here but my interpretation of his comment #13841 (review) is that we should keep this doc vanilla K8s only and mention all options available (which is why I added both links to integrations and frameworks).

This is true and the author explained the future plan to be neutral (by adding more direct links). And this was agreed by two reviewers. Now you are suggesting we should take the opposite approach.

Maybe I missed it somewhere but where was this agreed by two reviewers after that comment #13841 (comment) by the author? The author merged the PR after I made my comment #13841 (comment). My previous in-line comment #13841 (comment) was also not addressed.

Copy link
Collaborator

@KuntaiDu KuntaiDu Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mgoin to chime in here but my interpretation of his comment #13841 (review) is that we should keep this doc vanilla K8s only and mention all options available (which is why I added both links to integrations and frameworks).

We should keep the k8s page under open-governance and fit the need of users. That part I 100% agree with @terrytangyuan . However, the best way to keep the page under open governance is probably to let more people chime in and give user choices, instead of giving user official docs and say "OK now you know how to fly, just go fly yourself". Infra and K8s is scary to new users, and we should give them more care and love.

Happy to hear more from @mgoin and let's make the K8s page a better version!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the best way to keep the page under open governance is probably to let more people chime in and give user choices

I agree, which is why I removed the specific link to production stack and added links to frameworks and integrations page that provide more options for users.

Infra and K8s is scary to new users, and we should give them more care and love.

In case you missed my previous comments, please see #14084 (comment) to see why the new link is more vendor-neutral, easier to maintain, and more lightweight than the current link. Also see #14084 (comment) on my plan to make it even more beginner friendly with a CPU deployment guide.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@KuntaiDu
Copy link
Collaborator

KuntaiDu commented Mar 8, 2025

Maybe we can still keep the k8s-deployment-related project list there, but use the vendor-neutral k8s documentation instead of the current one. What's your thought on this @terrytangyuan ?

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@terrytangyuan
Copy link
Contributor Author

Updated

@Hanchenli
Copy link
Contributor

Hanchenli commented Mar 9, 2025

Yeah that is something that I was talking about. What’s the rationale for the order?

@Hanchenli
Copy link
Contributor

Also I suggest we should just add vllm ecosystem projects in this link.

@terrytangyuan
Copy link
Contributor Author

terrytangyuan commented Mar 9, 2025

Alphabetical.

Also I suggest we should just add vllm ecosystem projects in this link.

What do you mean? Which projects?

@Jeffwan
Copy link
Contributor

Jeffwan commented Mar 10, 2025

I'm slightly late. I believe it would be beneficial to list various options for users with different requirements, enabling them to select the appropriate stack to begin with. Thanks for driving the efforts. Given that these options may expand over time, we might consider categorizing them into distinct groups. However, I'm uncertain about the best way to organize this. I see @Hanchenli suggests vLLM community project, I think we can embrace more project from other communities. This is kind of overrlaped area and some cloud-native solutions like KServe/LWS has wide adoption and great to see their continuous efforts on vLLM deployment improvements.

Offering neutral Kubernetes setup guidance is a challenging aspect. Different cloud providers offer diverse toolkits for cluster setup. I've read the current guidance at https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md. It seems suitable for some beginners, but it's not the recommended setup for production scenarios. Most Kubernetes solutions assume that clusters are created without delving too deeply into detailed configurations, such as IAM roles, whether to use a managed node group, GPU AMI, and so on. Although overlooking these details might seem reasonable in some contexts, it's important to note that guiding users through the recommended configurations typically lies within the purview of specific projects list above. Our role could be to utilize this page as a means to direct users towards those projects that offer such in-depth guidance.

@terrytangyuan
Copy link
Contributor Author

@Jeffwan 100% agreed. We should be more inclusive on the list of projects here.

I originally only added a link to integrations and frameworks page since I am not expert in each tool and don't know 100% whether some of them support deployment to Kubernetes. I updated this page to incorporate both @KuntaiDu and @Hanchenli's suggestions to add the list of projects list explicitly.

Are we good to move this forward?

@KuntaiDu
Copy link
Collaborator

Yeah totally agree that we need to put more ecosystem projects into the list. The key is ordering --- I guess alphabetical ordering is not fair as it encourages people to start a project which name starts with "A". How about we write a github action to ensure that all projects are randomly demosntrated?

@terrytangyuan
Copy link
Contributor Author

terrytangyuan commented Mar 13, 2025

That's way beyond the scope of this PR. "Alphabetical ordering is not fair" seems like a big topic. At least listing all relevant projects is fairer than listing only one project.

Could we merge this and propose additional changes separately?

@simon-mo
Copy link
Collaborator

Here's my recommendation:

  • We can sort and add the github organization prefix so we have fully qualified names such as vllm-project/production-stack, vllm-project/aibrix, and kubernetes-sigs/lws, etc.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@simon-mo simon-mo merged commit 54a8804 into vllm-project:main Mar 14, 2025
11 of 13 checks passed
@terrytangyuan terrytangyuan deleted the update-k8s-guide branch March 15, 2025 00:10
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants