-
Notifications
You must be signed in to change notification settings - Fork 690
feat: support batch /completions
#1626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update refactors prompt preprocessing to explicitly handle both text and tokenized prompt inputs, adds support for batch token IDs, and extends trait and request implementations to distinguish and extract token-based inputs. Additionally, it modifies SSE event filtering for LLM metric annotations to depend on the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OpenAIPreprocessor
participant Request
participant PreprocessedRequestBuilder
Client->>OpenAIPreprocessor: preprocess_request(Request)
OpenAIPreprocessor->>Request: prompt_input_type()
alt Prompt is Tokens
OpenAIPreprocessor->>Request: extract_tokens()
OpenAIPreprocessor->>PreprocessedRequestBuilder: set token_ids or batch_token_ids
else Prompt is Text
OpenAIPreprocessor->>Request: get raw or formatted prompt
OpenAIPreprocessor->>OpenAIPreprocessor: tokenize prompt
OpenAIPreprocessor->>PreprocessedRequestBuilder: set token_ids
end
OpenAIPreprocessor->>PreprocessedRequestBuilder: set sampling_options, annotations
PreprocessedRequestBuilder->>Client: PreprocessedRequest
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
There's some Dockerfile changes here that I'm using to test. I'll be removing them before I merge this PR. They belong in #1583 |
paulhendricks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to pull out some of the internals of OpenAIPreprocessor so we can add test coverage in the module for different edge cases e.g. "", ["", ""], [], [[]] instead of using the e2e with curl scripts.
Overall approving, looks good!
This PR is a revamp of #1565 based on this comment by @rmccorm4.
Description
OpenAI Completions supports the following inputs
The PR here provides batch style support for
StringArray, andArrayOfIntegerArrayand provides similar support asString(default) toIntegerArray.The
sglang_inc.pyengine has been updated to demonstrate thisApproach
In order to minimize any performance hit, I first
matchbased on input. If we see a tokens style input, we cast tou32and construct the request. If not - we move forward with tokenization as expectedTests with all 4 input types
Script to tokenize text
Using the following for testing
[32, 3460, 4128, 1614, 374, 264]A large language model is aQwen/Qwen2.5-7BIntegerArray
input
output
{ "id": "cmpl-3f3f10ca-320d-4740-b4c6-d044436a0655", "choices": [ { "text": " machine learning tool conceived by Google last August.\n\nImagine putting Strings into an ocean, taking a dip and then catching one for a meal.\n\n\nReth", "index": 0, "finish_reason": null } ], "created": 1750819021, "model": "Qwen/Qwen2.5-7B", "object": "text_completion", "usage": { "prompt_tokens": 6, "completion_tokens": 29, "total_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null } }ArrayOfIntegerArray
input
output
{ "id": "cmpl-50006ebd-b759-41fc-9d5f-335651b1910d", "choices": [ { "text": " type of artificial intelligence designed to carry out human-like, one-sided conversations. Influenced by developments in the NLP/NLU “Big Bang”,", "index": 0, "finish_reason": null }, { "text": " computer model. But that a shorthand. Apparently confusing. A large new language model is both big and notoriously vague. But stay tuned.— Casey C", "index": 1, "finish_reason": null } ], "created": 1750819161, "model": "Qwen/Qwen2.5-7B", "object": "text_completion", "usage": { "prompt_tokens": 0, "completion_tokens": 58, "total_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null } }String (with "nvext": {"use_raw_prompt":true)
input
{ "id": "cmpl-c2f7ef91-dff9-4a19-b3fa-75b41779f5f4", "choices": [ { "text": " complex mathematical system that can be used to solve a wide variety of problems. The model consists of a sequence of tasks, each of which is solved", "index": 0, "finish_reason": null } ], "created": 1750819802, "model": "Qwen/Qwen2.5-7B", "object": "text_completion", "usage": { "prompt_tokens": 6, "completion_tokens": 29, "total_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null } }StringArray
input
output
{ "id": "cmpl-a51a4d40-9e82-4aec-80c8-140073ecd0fd", "choices": [ { "text": " model trained on a diverse dataset consisting of text, images, audio, and natural language processing data. These models allow developers to take feedback from humans", "index": 0, "finish_reason": null }, { "text": " math-informed system which can be used for predicting choices along with processing output for respective ones. Summarizing Deep Learning Books Artificial Intelligence is considered", "index": 1, "finish_reason": null } ], "created": 1750820088, "model": "Qwen/Qwen2.5-7B", "object": "text_completion", "usage": { "prompt_tokens": 0, "completion_tokens": 58, "total_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null } }