Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

Merged
merged 19 commits into from
Mar 5, 2025

Conversation

ryanaoleary
Copy link
Contributor

@ryanaoleary ryanaoleary commented Sep 25, 2024

Why are these changes needed?

This PR adds a new guide to the Ray docs that details how to serve an LLM with vLLM and single-host TPUs on GKE. This PR has been tested by running through the steps in the proposed guide and verifying correct output. This PR uses sample code from https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/ai-ml/gke-ray/rayserve/llm/tpu.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary
Copy link
Contributor Author

cc: @andrewsykim

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@kevin85421 kevin85421 self-assigned this Sep 25, 2024
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary changed the title RayServe Multi-Host TPU Example with vLLM [Doc] RayServe Multi-Host TPU Example with vLLM Oct 3, 2024
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Copy link

stale bot commented Feb 1, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 1, 2025
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 4, 2025
@ryanaoleary ryanaoleary changed the title [Doc] RayServe Multi-Host TPU Example with vLLM [Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM Feb 4, 2025
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary changed the title [Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM [Doc] RayServe Single-Host TPU v6e Example with vLLM Feb 4, 2025
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary
Copy link
Contributor Author

I went through and updated this guide for the newer TPU Trillium so that it'll be more useful, it no longer requires multi-host since v6e can fit Llama 70B on a single node. I'll create a PR with a separate guide showcasing serving with Llama 405B and multi-host TPUs. cc: @andrewsykim @kevin85421

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Feb 13, 2025
@andrewsykim
Copy link
Contributor

@angelinalg can you help us review this please?

@angelinalg angelinalg assigned kouroshHakha and unassigned kevin85421 Mar 5, 2025
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ryanaoleary , Thanks for the contribution. We just ossed our native ray serve LLM apis. Do you mind converting this tutorial to use that instead?

https://docs.ray.io/en/latest/serve/llm/overview.html

Basically there is no need to reference a custom application code that is hosted in the kuberay code. Please use the latest apis from master.

Also another question is if this tutorial is part of the ci/cd. For example what will happen if vllm / some api from ray changes?

@andrewsykim
Copy link
Contributor

andrewsykim commented Mar 5, 2025

@kouroshHakha thanks for letting us know!

There's actually some existing docs that use KubeRay + vLLM right now, for example see https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html. Would it be possible to merge Ryan's PR as-is and follow-up with a sweeping change that updates all the docs using vLLM for consistency?

@kouroshHakha
Copy link
Contributor

kouroshHakha commented Mar 5, 2025

Yes we can merge this as it's a self-contained, and the code that most likely won't break in the future so this is a plus for this approach. But can we add a note on the very top to both new files in this PR and the existing kubernetes example (at the same level and visibility as where prerequisites are listed) in this tutorial that mentions there is a native serve LLM API with some pointers so that the reader does go under the impression that this is the only application code that works? In fact the new API is more recommended path for running LLMs through ray serve. The kubernetes deployment configs remain the same and only application code changes to use those APIs. If we do this then there is no need for a sweeping change now as it would be more clear to the reader.

@andrewsykim
Copy link
Contributor

@kouroshHakha does the new LLM API support TPUs?

@ryanaoleary
Copy link
Contributor Author

Yes we can merge this as it's a self-contained, and the code that most likely won't break in the future so this is a plus for this approach. But can we add a note on the very top to both new files in this PR and the existing kubernetes example (at the same level and visibility as where prerequisites are listed) in this tutorial that mentions there is a native serve LLM API with some pointers so that the reader does go under the impression that this is the only application code that works? In fact the new API is more recommended path for running LLMs through ray serve. The kubernetes deployment configs remain the same and only application code changes to use those APIs. If we do this then there is no need for a sweeping change now as it would be more clear to the reader.

Added a note in the vLLM examples in 466ad03 about the new Ray Serve LLM API.

@kouroshHakha
Copy link
Contributor

does the new LLM API support TPUs?

@andrewsykim

Right now users can call config.get_serve_options() to get the deployment options (including the placement group) That function currently does not support TPUs. But you can bypass that by providing your own placement group similar to how it's done currently in https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/serve_tpu.py#L102 and calling .options on the deployment object. Everything else remains the same similar to your example.

It would be awesome if you / someone from your team could contribute and extend the current placement group creation within ray.serve.llm to support TPUs as well.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. 🚀 The doc build is failing tho. It might be the reference that is messed up. Once fixed we can merge.

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary
Copy link
Contributor Author

LGTM. Thanks. 🚀 The doc build is failing tho. It might be the reference that is messed up. Once fixed we can merge.

Oh oops, I fixed the links. The build looks to be passing now.

@kouroshHakha kouroshHakha enabled auto-merge (squash) March 5, 2025 21:42
@kouroshHakha
Copy link
Contributor

cc @angelinalg can you stamp this PR? It needs docs approval

Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp

@kouroshHakha kouroshHakha merged commit e4a448f into ray-project:master Mar 5, 2025
6 checks passed
@kevin85421
Copy link
Member

Hi @ryanaoleary, I will revert this PR because my colleague merges this PR accidentally. We can discuss the next step of this PR in our 1:1 next time.

kevin85421 added a commit to kevin85421/ray that referenced this pull request Mar 6, 2025
angelinalg pushed a commit that referenced this pull request Mar 6, 2025
#51113)

This reverts commit e4a448f.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

#47814 (comment)

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
abrarsheikh pushed a commit that referenced this pull request Mar 8, 2025
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Abrar Sheikh <abrar@anyscale.com>
elimelt pushed a commit to elimelt/ray that referenced this pull request Mar 9, 2025
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
elimelt pushed a commit to elimelt/ray that referenced this pull request Mar 9, 2025
…ject#47814)" (ray-project#51113)

This reverts commit e4a448f.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

ray-project#47814 (comment)

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants