Skip to content

Conversation

@KuntaiDu
Copy link
Collaborator

@KuntaiDu KuntaiDu commented Feb 25, 2025

This PR adds deployment guide with Kubernetes using native Kubernetes, and using helm chart provided by vllm-project/production-stack.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
…into kuntai-add-k8s-doc

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Feb 25, 2025
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu changed the title [Documentation] Add deployment guide for Kubernetes [Documentation] Add more deployment guide for Kubernetes deployment Feb 26, 2025
Copy link
Contributor

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just offering a few suggestions and typos I caught 😄

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu requested a review from rafvasq February 27, 2025 19:54
@KuntaiDu
Copy link
Collaborator Author

Thank you for your suggestion @rafvasq , just fixed.

Copy link
Contributor

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! but you'll need a maintainer to approve it of course.

Copy link
Contributor

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many different ways to deploy vLLM on K8s, including vanilla K8s, KServe, AIBrix, production stack, etc. This PR makes production stack a first-class citizen. I think this doc should remain as neutral as possible.

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reconciled with the existing helm instructions for using the helm chart included in the vllm repo: https://docs.vllm.ai/en/latest/deployment/frameworks/helm.html

mgoin
mgoin previously requested changes Feb 28, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice doc and resources to make this easier for new users! Would you be open to submitting this as a new page for production-stack specifically? I agree vLLM on k8s is a general idea, and considering KServe is listed as a separate page already, I think we should continue to list all options. Keeping this doc "vanilla k8s" seems fair

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@mergify
Copy link

mergify bot commented Mar 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 1, 2025
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@KuntaiDu
Copy link
Collaborator Author

KuntaiDu commented Mar 1, 2025

Nice doc and resources to make this easier for new users! Would you be open to submitting this as a new page for production-stack specifically? I agree vLLM on k8s is a general idea, and considering KServe is listed as a separate page already, I think we should continue to list all options. Keeping this doc "vanilla k8s" seems fair

Thanks Michael. Agree that we should keep this doc vanilla k8s, and then add some hyperlinks to link to other projects at the beginning.

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu requested review from mgoin and russellb March 1, 2025 00:36
@simon-mo simon-mo dismissed mgoin’s stale review March 1, 2025 03:38

Michael's comment addressed.

@simon-mo
Copy link
Collaborator

simon-mo commented Mar 1, 2025

Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using native Kubernetes.

## Prerequisites
NOTE: please make sure that there is a running Kubernetes cluster with available GPU resources. If you are new to Kubernetes, here is a [guide](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md) that helps you prepare the Kubernetes environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should link to production stack here

Copy link
Collaborator Author

@KuntaiDu KuntaiDu Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Official k8s docs give people too many choices and can be confusing so this one is probably better. If you found another better installation guilde, feel free to contribute!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to Kind instead? It's pretty straightforward to set up https://kind.sigs.k8s.io/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces additional package? I agree with Terry that we could link to another clean page. Maybe we should actually copy one version of k8s tutorial to vllm documentation later.

Comment on lines 13 to 16
Note that These projects are sorted chronologically.

* [vLLM production-stack](https://github.com/vllm-project/production-stack): Originated from UChicago, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples.
* [Aibrix](https://github.com/vllm-project/aibrix): Originated from Bytedance, Aibrix is a production-level stack that is Kubernetes-friendly and contains rich features (e.g. Lora management).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this page more neutral as mentioned above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I don't have much expertise on Kserve, Aibrix and others. We will leave it blank here and feel free to create more PRs.

Copy link
Contributor

@Hanchenli Hanchenli Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I just heard from some user of production-stack there is also a KubeAI repository which also uses K8S. We potentially could add that as well.

Comment on lines +20 to +22
## Pre-requisite

Ensure that you have a running Kubernetes environment with GPU (you can follow [this tutorial](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md) to install a Kubernetes environment on a bare-medal GPU machine).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment on this one. We could just point to K8s docs directly instead of production stack docs

Copy link
Collaborator Author

@KuntaiDu KuntaiDu Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Official k8s docs give people too many choices and can be confusing so this one is probably better. If you found another better installation guilde, feel free to contribute!

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
…contribution to other frameworks.

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@Hanchenli
Copy link
Contributor

Thanks a lot @KuntaiDu! Production-Stack greatly appreciated the effort to add our content here. They look good to us though we suggest adding the EKS/GKE tutorial and future Terraform deployment links to this documentation! We will submit more PRs for these things in the future!

@KuntaiDu
Copy link
Collaborator Author

KuntaiDu commented Mar 1, 2025

Given that I only tried production stack and have only limited experience of other frameworks such as kserve, I am open to PRs that append other projects by people with more expertise. Feel free to propose PRs! @terrytangyuan @mgoin @russellb @simon-mo

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
@terrytangyuan
Copy link
Contributor

Maybe we can just link to this page instead of mentioning the variety of frameworks/integrations out there? https://docs.vllm.ai/en/latest/deployment/integrations/index.html

@KuntaiDu KuntaiDu enabled auto-merge (squash) March 1, 2025 06:05
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 1, 2025
@KuntaiDu KuntaiDu merged commit 8994dab into vllm-project:main Mar 1, 2025
25 of 30 checks passed
@KuntaiDu KuntaiDu deleted the kuntai-add-k8s-doc branch March 1, 2025 07:01
@terrytangyuan
Copy link
Contributor

It seems like some comments were not addressed so I submitted a follow-up PR: #14084

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025
…llm-project#13841)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…llm-project#13841)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…llm-project#13841)

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants