-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814
[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814
Conversation
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
cc: @andrewsykim |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
I went through and updated this guide for the newer TPU Trillium so that it'll be more useful, it no longer requires multi-host since v6e can fit Llama 70B on a single node. I'll create a PR with a separate guide showcasing serving with Llama 405B and multi-host TPUs. cc: @andrewsykim @kevin85421 |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@angelinalg can you help us review this please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ryanaoleary , Thanks for the contribution. We just ossed our native ray serve LLM apis. Do you mind converting this tutorial to use that instead?
https://docs.ray.io/en/latest/serve/llm/overview.html
Basically there is no need to reference a custom application code that is hosted in the kuberay code. Please use the latest apis from master.
Also another question is if this tutorial is part of the ci/cd. For example what will happen if vllm / some api from ray changes?
@kouroshHakha thanks for letting us know! There's actually some existing docs that use KubeRay + vLLM right now, for example see https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html. Would it be possible to merge Ryan's PR as-is and follow-up with a sweeping change that updates all the docs using vLLM for consistency? |
Yes we can merge this as it's a self-contained, and the code that most likely won't break in the future so this is a plus for this approach. But can we add a note on the very top to both new files in this PR and the existing kubernetes example (at the same level and visibility as where prerequisites are listed) in this tutorial that mentions there is a native serve LLM API with some pointers so that the reader does go under the impression that this is the only application code that works? In fact the new API is more recommended path for running LLMs through ray serve. The kubernetes deployment configs remain the same and only application code changes to use those APIs. If we do this then there is no need for a sweeping change now as it would be more clear to the reader. |
@kouroshHakha does the new LLM API support TPUs? |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
9799d5d
to
466ad03
Compare
Added a note in the vLLM examples in 466ad03 about the new Ray Serve LLM API. |
Right now users can call It would be awesome if you / someone from your team could contribute and extend the current placement group creation within ray.serve.llm to support TPUs as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. 🚀 The doc build is failing tho. It might be the reference that is messed up. Once fixed we can merge.
Oh oops, I fixed the links. The build looks to be passing now. |
cc @angelinalg can you stamp this PR? It needs docs approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp
Hi @ryanaoleary, I will revert this PR because my colleague merges this PR accidentally. We can discuss the next step of this PR in our 1:1 next time. |
…ject#47814)" This reverts commit e4a448f.
#51113) This reverts commit e4a448f. <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> #47814 (comment) ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Abrar Sheikh <abrar@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…ject#47814)" (ray-project#51113) This reverts commit e4a448f. <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ray-project#47814 (comment) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
Why are these changes needed?
This PR adds a new guide to the Ray docs that details how to serve an LLM with vLLM and single-host TPUs on GKE. This PR has been tested by running through the steps in the proposed guide and verifying correct output. This PR uses sample code from https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/ai-ml/gke-ray/rayserve/llm/tpu.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.