Enable decode of max model len #4

yannicks1 · 2025-09-24T17:48:48Z

This PR enables one last decode step when the context length equals the max model length.
It is a follow up (the second part) of this PR, which (re)enabled token generation on the max model length for prefill.

Note that Hugging Face transformers recently also enabled the same (see this PR).
Therefore this PR establishes consistent behavior of vLLM vs. HF again.

tasks to do:

assert that the hf transformers warning This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (see source code here) is not thrown during hf text generation.

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

…max-model-len

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 added 2 commits September 24, 2025 19:05

allow last decode on max model len

a91bc5d

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

adapt unit test to cover decode of max model len

4d215df

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 mentioned this pull request Sep 24, 2025

Re-enable prefill of max model length vllm-project/vllm#24446

Merged

yannicks1 added 2 commits September 26, 2025 13:06

Merge branch 'enable-prefill-of-max-model-len' into enable-decode-of-…

a1bdd4e

…max-model-len

fix test matrix

3f15494

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 deleted the branch enable-prefill-of-max-model-len October 3, 2025 12:25

yannicks1 closed this Oct 3, 2025

yannicks1 deleted the enable-decode-of-max-model-len branch October 3, 2025 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable decode of max model len #4

Enable decode of max model len #4

Uh oh!

yannicks1 commented Sep 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable decode of max model len #4

Enable decode of max model len #4

Uh oh!

Conversation

yannicks1 commented Sep 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yannicks1 commented Sep 24, 2025 •

edited by github-actions bot

Loading