[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

ryanaoleary · 2024-09-25T00:48:39Z

Why are these changes needed?

This PR adds a new guide to the Ray docs that details how to serve an LLM with vLLM and single-host TPUs on GKE. This PR has been tested by running through the steps in the proposed guide and verifying correct output. This PR uses sample code from https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/ai-ml/gke-ray/rayserve/llm/tpu.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2024-09-25T00:52:38Z

cc: @andrewsykim

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

stale · 2025-02-01T01:11:57Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2025-02-04T02:29:36Z

I went through and updated this guide for the newer TPU Trillium so that it'll be more useful, it no longer requires multi-host since v6e can fit Llama 70B on a single node. I'll create a PR with a separate guide showcasing serving with Llama 405B and multi-host TPUs. cc: @andrewsykim @kevin85421

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

andrewsykim · 2025-02-19T00:38:11Z

@angelinalg can you help us review this please?

kouroshHakha

Hi @ryanaoleary , Thanks for the contribution. We just ossed our native ray serve LLM apis. Do you mind converting this tutorial to use that instead?

https://docs.ray.io/en/latest/serve/llm/overview.html

Basically there is no need to reference a custom application code that is hosted in the kuberay code. Please use the latest apis from master.

Also another question is if this tutorial is part of the ci/cd. For example what will happen if vllm / some api from ray changes?

andrewsykim · 2025-03-05T18:47:03Z

@kouroshHakha thanks for letting us know!

There's actually some existing docs that use KubeRay + vLLM right now, for example see https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html. Would it be possible to merge Ryan's PR as-is and follow-up with a sweeping change that updates all the docs using vLLM for consistency?

kouroshHakha · 2025-03-05T18:53:55Z

Yes we can merge this as it's a self-contained, and the code that most likely won't break in the future so this is a plus for this approach. But can we add a note on the very top to both new files in this PR and the existing kubernetes example (at the same level and visibility as where prerequisites are listed) in this tutorial that mentions there is a native serve LLM API with some pointers so that the reader does go under the impression that this is the only application code that works? In fact the new API is more recommended path for running LLMs through ray serve. The kubernetes deployment configs remain the same and only application code changes to use those APIs. If we do this then there is no need for a sweeping change now as it would be more clear to the reader.

andrewsykim · 2025-03-05T19:30:54Z

@kouroshHakha does the new LLM API support TPUs?

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2025-03-05T20:21:31Z

Yes we can merge this as it's a self-contained, and the code that most likely won't break in the future so this is a plus for this approach. But can we add a note on the very top to both new files in this PR and the existing kubernetes example (at the same level and visibility as where prerequisites are listed) in this tutorial that mentions there is a native serve LLM API with some pointers so that the reader does go under the impression that this is the only application code that works? In fact the new API is more recommended path for running LLMs through ray serve. The kubernetes deployment configs remain the same and only application code changes to use those APIs. If we do this then there is no need for a sweeping change now as it would be more clear to the reader.

Added a note in the vLLM examples in 466ad03 about the new Ray Serve LLM API.

kouroshHakha · 2025-03-05T20:36:04Z

does the new LLM API support TPUs?

@andrewsykim

Right now users can call config.get_serve_options() to get the deployment options (including the placement group) That function currently does not support TPUs. But you can bypass that by providing your own placement group similar to how it's done currently in https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/serve_tpu.py#L102 and calling .options on the deployment object. Everything else remains the same similar to your example.

It would be awesome if you / someone from your team could contribute and extend the current placement group creation within ray.serve.llm to support TPUs as well.

kouroshHakha

LGTM. Thanks. 🚀 The doc build is failing tho. It might be the reference that is messed up. Once fixed we can merge.

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2025-03-05T21:17:59Z

LGTM. Thanks. 🚀 The doc build is failing tho. It might be the reference that is messed up. Once fixed we can merge.

Oh oops, I fixed the links. The build looks to be passing now.

kouroshHakha · 2025-03-05T21:42:41Z

cc @angelinalg can you stamp this PR? It needs docs approval

angelinalg

stamp

kevin85421 · 2025-03-06T03:24:03Z

Hi @ryanaoleary, I will revert this PR because my colleague merges this PR accidentally. We can discuss the next step of this PR in our 1:1 next time.

…ject#47814)" This reverts commit e4a448f.

#51113) This reverts commit e4a448f.   ## Why are these changes needed?  #47814 (comment) ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Abrar Sheikh <abrar@anyscale.com>

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

…ject#47814)" (ray-project#51113) This reverts commit e4a448f.   ## Why are these changes needed?  ray-project#47814 (comment) ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

Initial commit for multi-host Ray vLLM example

c8d0c8b

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary requested review from architkulkarni, maxpumperla, pcmoritz, kevin85421 and a team as code owners September 25, 2024 00:48

ryanaoleary added 2 commits September 25, 2024 00:50

Fix step numbering

b1aadea

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

fix working_dir link

4233e38

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary added 2 commits September 25, 2024 02:45

Update RayService manifest

d4d0d51

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Add example dashboard image

ed43bc0

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

kevin85421 self-assigned this Sep 25, 2024

andrewsykim reviewed Sep 25, 2024

View reviewed changes

ryanaoleary added 2 commits September 27, 2024 07:28

Fix instructions/comments

70fbab0

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Set default docker vals

15b4848

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary changed the title ~~RayServe Multi-Host TPU Example with vLLM~~ [Doc] RayServe Multi-Host TPU Example with vLLM Oct 3, 2024

Uploaded wrong dashboard image file

a5668a9

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 1, 2025

Refactor to be a single host guide with v5e and v6e

65d096c

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 4, 2025

ryanaoleary changed the title ~~[Doc] RayServe Multi-Host TPU Example with vLLM~~ [Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM Feb 4, 2025

Remove v5e instructions

8c76ba5

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary changed the title ~~[Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM~~ [Doc] RayServe Single-Host TPU v6e Example with vLLM Feb 4, 2025

ryanaoleary added 3 commits February 4, 2025 02:21

Fix doc ref

d3622bf

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Undo changes to GKE cluster create doc

21dcad5

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Update guide overview

66f2090

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary requested a review from andrewsykim February 4, 2025 02:27

Remove unnecessary brackets

5d88878

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

andrewsykim approved these changes Feb 12, 2025

View reviewed changes

kevin85421 added the go add ONLY when ready to merge, run all tests label Feb 13, 2025

angelinalg assigned kouroshHakha and unassigned kevin85421 Mar 5, 2025

kouroshHakha requested changes Mar 5, 2025

View reviewed changes

ryanaoleary and others added 2 commits March 5, 2025 20:08

Merge branch 'master' into multi-host-guide

f29ac31

Add note about Ray Serve LLM api to vLLM examples

466ad03

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the multi-host-guide branch from 9799d5d to 466ad03 Compare March 5, 2025 20:19

ryanaoleary requested a review from kouroshHakha March 5, 2025 20:19

kouroshHakha approved these changes Mar 5, 2025

View reviewed changes

Fix links

f174316

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

kouroshHakha approved these changes Mar 5, 2025

View reviewed changes

kouroshHakha enabled auto-merge (squash) March 5, 2025 21:42

angelinalg approved these changes Mar 5, 2025

View reviewed changes

kouroshHakha merged commit e4a448f into ray-project:master Mar 5, 2025
6 checks passed

kevin85421 added a commit to kevin85421/ray that referenced this pull request Mar 6, 2025

Revert "[Doc] RayServe Single-Host TPU v6e Example with vLLM (ray-pro…

809edea

…ject#47814)" This reverts commit e4a448f.

kevin85421 mentioned this pull request Mar 6, 2025

Revert "[Doc] RayServe Single-Host TPU v6e Example with vLLM (#47814)" #51113

Merged

8 tasks

abrarsheikh pushed a commit that referenced this pull request Mar 8, 2025

[Doc] RayServe Single-Host TPU v6e Example with vLLM (#47814)

5b1b7a4

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Abrar Sheikh <abrar@anyscale.com>

elimelt pushed a commit to elimelt/ray that referenced this pull request Mar 9, 2025

[Doc] RayServe Single-Host TPU v6e Example with vLLM (ray-project#47814)

c0b8b4e

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

ryanaoleary commented Sep 25, 2024 •

edited

Loading

ryanaoleary commented Sep 25, 2024

stale bot commented Feb 1, 2025

ryanaoleary commented Feb 4, 2025

andrewsykim commented Feb 19, 2025

kouroshHakha left a comment

andrewsykim commented Mar 5, 2025 •

edited

Loading

kouroshHakha commented Mar 5, 2025 •

edited

Loading

andrewsykim commented Mar 5, 2025

ryanaoleary commented Mar 5, 2025

kouroshHakha commented Mar 5, 2025

kouroshHakha left a comment •

edited

Loading

ryanaoleary commented Mar 5, 2025

kouroshHakha commented Mar 5, 2025

angelinalg left a comment

kevin85421 commented Mar 6, 2025

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

Conversation

ryanaoleary commented Sep 25, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

ryanaoleary commented Sep 25, 2024

stale bot commented Feb 1, 2025

ryanaoleary commented Feb 4, 2025

andrewsykim commented Feb 19, 2025

kouroshHakha left a comment

Choose a reason for hiding this comment

andrewsykim commented Mar 5, 2025 • edited Loading

kouroshHakha commented Mar 5, 2025 • edited Loading

andrewsykim commented Mar 5, 2025

ryanaoleary commented Mar 5, 2025

kouroshHakha commented Mar 5, 2025

kouroshHakha left a comment • edited Loading

Choose a reason for hiding this comment

ryanaoleary commented Mar 5, 2025

kouroshHakha commented Mar 5, 2025

angelinalg left a comment

Choose a reason for hiding this comment

kevin85421 commented Mar 6, 2025

ryanaoleary commented Sep 25, 2024 •

edited

Loading

andrewsykim commented Mar 5, 2025 •

edited

Loading

kouroshHakha commented Mar 5, 2025 •

edited

Loading

kouroshHakha left a comment •

edited

Loading