[Platform] Add custom default max tokens #18557

gmarinho2 · 2025-05-22T17:11:45Z

Currently the default max tokens for the completions API is set to max_model_len - prompt_len. The changes in this PR make so that when a platform needs to use a different value for default_max_tokens it can be altered simply by overriding the maybe_update_max_tokens method in the class Plataform. When it is not needed it returns the current default. Edit: typo in commit message: class Plataform is meant to be class Platform.

github-actions · 2025-05-22T17:11:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

maxdebayser

LGTM

joerunde · 2025-05-23T17:52:27Z

Is this something that can be handled by --generation-config?

--generation-config
The folder path to the generation config. Defaults to "auto", the generation config will be loaded from model path. If set to "vllm", no generation config is loaded, vLLM defaults will be used. If set to a folder path, the generation config will be loaded from the specified folder path. If max_new_tokens is specified in generation config, then it sets a server-wide limit on the number of output tokens for all requests.

Default: 'auto'

Should the chat api be respecting a max_new_tokens override from the generation config instead of setting the default to max_model_len - prompt_len? That would allow a default override to be set regardless of platforms.

That said, I do like the code hook to be able to write whatever code you want too...

maxdebayser · 2025-05-28T17:40:54Z

Is this something that can be handled by --generation-config?
Unfortunately not because in vllm-spyre the permissible max_new_tokens depends on the request.

vllm/platforms/interface.py

NickLucche

Thanks for the contribution!

Shouldn't we update serving_completion too?

gmarinho2 · 2025-05-29T18:20:13Z

Thanks for the contribution!

Shouldn't we update serving_completion too?

Done. Since class CompletionRequest has 16 as default it will probably be selected most of the time because the default is set to be the minimum between context window, user request & server limit: REF1, REF2, REF3, REF4.

joerunde · 2025-06-09T17:01:59Z

@youkaichao any thoughts on getting this merged?

NickLucche

lgtm! Apologies for the delay.

joerunde · 2025-06-12T14:05:15Z

Let's get the full CI running then and see if we can get a maintainer to get this merged 👍

joerunde · 2025-06-16T16:50:06Z

@njhill can you hit the big green merge button for us?

vllm/platforms/interface.py

vllm/entrypoints/openai/serving_completion.py

aarnphm

One formatting comment, otherwise lgtm

vllm/entrypoints/openai/protocol.py

DarkLight1337 · 2025-06-27T04:08:45Z

PTAL at the CI failures

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/utils.py

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

vllm/entrypoints/utils.py

mergify · 2025-07-03T14:33:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gmarinho2.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

maxdebayser · 2025-07-04T00:46:47Z

@DarkLight1337 , all tests are passing now.

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

mergify bot added the frontend label May 22, 2025

wallashss mentioned this pull request May 22, 2025

Add get_max_output_tokens for class SpyrePlatform vllm-project/vllm-spyre#179

Merged

maxdebayser approved these changes May 22, 2025

View reviewed changes

youkaichao reviewed May 29, 2025

View reviewed changes

vllm/platforms/interface.py Outdated Show resolved Hide resolved

NickLucche requested changes May 29, 2025

View reviewed changes

gmarinho2 requested a review from NickLucche May 30, 2025 18:53

NickLucche approved these changes Jun 10, 2025

View reviewed changes

joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025

njhill reviewed Jun 16, 2025

View reviewed changes

vllm/platforms/interface.py Outdated Show resolved Hide resolved

gmarinho2 requested a review from aarnphm as a code owner June 17, 2025 20:45

maxdebayser reviewed Jun 18, 2025

View reviewed changes

vllm/entrypoints/openai/serving_completion.py Show resolved Hide resolved

aarnphm approved these changes Jun 18, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

aarnphm approved these changes Jun 19, 2025

View reviewed changes

aarnphm changed the title ~~Add custom default max tokens for different plataforms~~ [Platform] Add custom default max tokens Jun 19, 2025

aarnphm enabled auto-merge (squash) June 19, 2025 05:21

auto-merge was automatically disabled June 26, 2025 21:41
Head branch was pushed to by a user without write access

gmarinho2 requested review from DarkLight1337, WoosukKwon, hmellor, mgoin, robertgshaw2-redhat, tlrmchlsmth and ywang96 as code owners June 26, 2025 21:41

DarkLight1337 removed the qwen Related to Qwen models label Jun 27, 2025

gmarinho2 added 2 commits June 30, 2025 13:24

change default_sampling_params type to dict

ef9d037

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

change default_sampling_params type to dict

b20417b

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

maxdebayser reviewed Jun 30, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

maxdebayser reviewed Jun 30, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

maxdebayser reviewed Jun 30, 2025

View reviewed changes

vllm/entrypoints/utils.py Outdated Show resolved Hide resolved

gmarinho2 added 3 commits July 1, 2025 15:02

refacts get_max_tokens

e1a3e11

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

trigger ci

4e3efef

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

trigger ci

2ab78b1

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

maxdebayser reviewed Jul 3, 2025

View reviewed changes

vllm/entrypoints/utils.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Jul 3, 2025

gmarinho2 and others added 2 commits July 3, 2025 11:58

add getattr default value as none

66a030b

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

Merge branch 'main' into default-tokens

d13ab97

mergify bot removed the needs-rebase label Jul 3, 2025

remove wrong import

5f5310a

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

DarkLight1337 merged commit a4113b0 into vllm-project:main Jul 4, 2025
67 checks passed

github-project-automation bot moved this to Done in Tool Calling Jul 4, 2025

github-project-automation bot moved this to Done in Structured Output Jul 4, 2025

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Platform] Add custom default max tokens (vllm-project#18557)

bed6030

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Platform] Add custom default max tokens (vllm-project#18557)

77743fc

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Platform] Add custom default max tokens (vllm-project#18557)

744b089

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Uh oh!

[Platform] Add custom default max tokens #18557

[Platform] Add custom default max tokens #18557

Uh oh!

Conversation

gmarinho2 commented May 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented May 23, 2025

Uh oh!

maxdebayser commented May 28, 2025

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

gmarinho2 commented May 29, 2025

Uh oh!

joerunde commented Jun 9, 2025

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joerunde commented Jun 12, 2025

Uh oh!

joerunde commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 3, 2025

Uh oh!

maxdebayser commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

gmarinho2 commented May 22, 2025 •

edited by github-actions bot

Loading

NickLucche left a comment •

edited

Loading