Skip to content

Conversation

@gmarinho2
Copy link
Contributor

@gmarinho2 gmarinho2 commented May 22, 2025

FIX: vllm-project/vllm-spyre#148

Currently the default max tokens for the completions API is set to max_model_len - prompt_len. The changes in this PR make so that when a platform needs to use a different value for default_max_tokens it can be altered simply by overriding the maybe_update_max_tokens method in the class Plataform. When it is not needed it returns the current default. Edit: typo in commit message: class Plataform is meant to be class Platform.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joerunde
Copy link
Collaborator

Is this something that can be handled by --generation-config?

--generation-config
The folder path to the generation config. Defaults to "auto", the generation config will be loaded from model path. If set to "vllm", no generation config is loaded, vLLM defaults will be used. If set to a folder path, the generation config will be loaded from the specified folder path. If max_new_tokens is specified in generation config, then it sets a server-wide limit on the number of output tokens for all requests.

Default: 'auto'

Should the chat api be respecting a max_new_tokens override from the generation config instead of setting the default to max_model_len - prompt_len? That would allow a default override to be set regardless of platforms.

That said, I do like the code hook to be able to write whatever code you want too...

@maxdebayser
Copy link
Contributor

Is this something that can be handled by --generation-config?
Unfortunately not because in vllm-spyre the permissible max_new_tokens depends on the request.

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Shouldn't we update serving_completion too?

@gmarinho2
Copy link
Contributor Author

Thanks for the contribution!

Shouldn't we update serving_completion too?

Done. Since class CompletionRequest has 16 as default it will probably be selected most of the time because the default is set to be the minimum between context window, user request & server limit: REF1, REF2, REF3, REF4.

@gmarinho2 gmarinho2 requested a review from NickLucche May 30, 2025 18:53
@joerunde
Copy link
Collaborator

joerunde commented Jun 9, 2025

@youkaichao any thoughts on getting this merged?

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Apologies for the delay.

@joerunde joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025
@joerunde
Copy link
Collaborator

Let's get the full CI running then and see if we can get a maintainer to get this merged 👍

@joerunde
Copy link
Collaborator

@njhill can you hit the big green merge button for us?

@gmarinho2 gmarinho2 requested a review from aarnphm as a code owner June 17, 2025 20:45
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One formatting comment, otherwise lgtm

@aarnphm aarnphm changed the title Add custom default max tokens for different plataforms [Platform] Add custom default max tokens Jun 19, 2025
@aarnphm aarnphm enabled auto-merge (squash) June 19, 2025 05:21
auto-merge was automatically disabled June 26, 2025 21:41

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 removed the qwen Related to Qwen models label Jun 27, 2025
@DarkLight1337
Copy link
Member

PTAL at the CI failures

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
gmarinho2 added 3 commits July 1, 2025 15:02
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
@mergify
Copy link

mergify bot commented Jul 3, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gmarinho2.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 3, 2025
gmarinho2 and others added 2 commits July 3, 2025 11:58
@mergify mergify bot removed the needs-rebase label Jul 3, 2025
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
@maxdebayser
Copy link
Contributor

@DarkLight1337 , all tests are passing now.

@DarkLight1337 DarkLight1337 merged commit a4113b0 into vllm-project:main Jul 4, 2025
67 checks passed
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

Incorrect default max_completion_tokens being set

8 participants