Skip to content

Commit e6c7e55

Browse files
sjmonsontlrmchlsmth
andcommitted
Configurable max_tokens/max_completion_tokens key (#399)
<!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Makes the `max_tokens` request key configurable through an environment variable per endpoint type. Defaults to `max_tokens` for legacy `completions` and `max_completion_tokens` for `chat/completions` <!-- Provide a detailed list of all changes introduced in this pull request. --> - Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a dict mapping from route name -> output tokens key. Default is `{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}` <!-- List the steps needed to test this PR. --> - <!-- Link any relevant issues that this PR addresses. --> - Closes #395 - Closes #269 - Related #210 --- - [x] "I certify that all code in this PR is my own, except as noted below." - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Samuel Monson <smonson@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
1 parent 23f8186 commit e6c7e55

File tree

2 files changed

+8
-5
lines changed

2 files changed

+8
-5
lines changed

src/guidellm/backends/openai.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
GenerationResponse,
3131
)
3232
from guidellm.scheduler import HistoryT, ScheduledRequestInfo
33+
from guidellm.settings import settings
3334

3435
__all__ = ["OpenAIHTTPBackend", "UsageStats"]
3536

@@ -628,12 +629,10 @@ def _get_body(
628629
# Handle token limits
629630
max_tokens = max_output_tokens or self.max_output_tokens
630631
if max_tokens is not None:
631-
body.update(
632-
{
633-
"max_tokens": max_tokens,
634-
"max_completion_tokens": max_tokens,
635-
}
632+
max_output_key = settings.openai.max_output_key.get(
633+
endpoint_type, "max_tokens"
636634
)
635+
body[max_output_key] = max_output_tokens
637636
# Set stop conditions only for request-level limits
638637
if max_output_tokens:
639638
body.update({"stop": None, "ignore_eos": True})

src/guidellm/settings.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,10 @@ class OpenAISettings(BaseModel):
8989
base_url: str = "http://localhost:8000"
9090
max_output_tokens: int = 16384
9191
verify: bool = True
92+
max_output_key: dict[str, str] = {
93+
"text_completions": "max_tokens",
94+
"chat_completions": "max_completion_tokens",
95+
}
9296

9397

9498
class ReportGenerationSettings(BaseModel):

0 commit comments

Comments
 (0)