-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior #54189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior #54189
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…t testing Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the Serve LLM stack to simplify the LLMServer by inheriting directly from vLLM’s OpenAI‐serving components, removes legacy processing layers, and unifies chat, completion, and embedding flows.
- Consolidates engine APIs to use vLLM’s OpenAIServingChat, Completion, and Embedding endpoints
- Refactors and shrinks
LLMServer, removingResponsePostprocessorand redundant code - Updates mocks and tests to align with new request/response types and metadata fields
Reviewed Changes
Copilot reviewed 35 out of 36 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| release/llm_tests/serve/probes/test_basic.py | Adjusted max‐token length and updated invalid logprobs test cases |
| python/ray/llm/tests/serve/mocks/mock_vllm_engine.py | Overhauled mock engine to yield SSE strings and OpenAI‐style responses |
| python/ray/llm/_internal/serve/deployments/llm/llm_server.py | Simplified server logic; unified request→engine→batching pipeline |
| python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py | Integrated vLLM’s OpenAIServingChat/Completion/Embedding APIs |
| python/ray/llm/tests/serve/utils/testing_utils.py | Added reusable response‐validation helpers |
Comments suppressed due to low confidence (4)
release/llm_tests/serve/probes/test_basic.py:319
- The upper‐bound case for
top_logprobs(> maximum allowed) was removed. To fully validate error handling, reintroduce a test value above the allowed maximum.
invalid_num_logprobs = [-1]
python/ray/llm/tests/serve/mocks/mock_vllm_engine.py:5
- The mock implementation uses
await asyncio.sleep(...)butasynciois not imported. Addimport asyncioat the top.
from typing import AsyncGenerator, Dict, Optional, Any, List, Union
python/ray/llm/tests/serve/conftest.py:17
- Fixture imports
EmbeddingCompletionRequest, but the actual class is namedEmbeddingRequest. Update the import and fixture accordingly.
from ray.llm._internal.serve.configs.openai_api_models import (
release/llm_tests/serve/probes/test_basic.py:163
- [nitpick] Using a raw literal
200000can obscure its meaning. Consider replacing it with a named constant or adding a comment explaining its origin.
length = 200000
|
Here is the tentative plan to break these changes up into smaller PRs:
|
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Addresses #53533
Made ready for review for copilot review not ready for human review yet.
Need to merge vllm changes as well: