Skip to content

Conversation

@simondanielsson
Copy link
Contributor

@simondanielsson simondanielsson commented Sep 17, 2025

Purpose

Closes #25071.

Test Plan

  1. When using whisper:
vllm serve openai/whisper-large-v3

logs should no longer mention "Chunked prefill is enabled with ...":

(APIServer pid=3140911) INFO 09-17 12:37:08 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=3140911) INFO 09-17 12:37:10 [__init__.py:2790] Encoder-decoder models do not support chunked prefill nor prefix caching; disabling both.

Expecting simply

(APIServer pid=3140911) INFO 09-17 12:37:10 [__init__.py:2790] Encoder-decoder models do not support chunked prefill nor prefix caching; disabling both.
  1. Should lead to no changes to the SchedulerConfig nor VllmConfig. Verify with new tests.

Test Result

  1. Command:
  • Tested on GPU: L4.
  • Output from "test" command:
(vllm) danielssonsimon@XXXXXX:~/code/vllm$ vllm serve openai/whisper-large-v3
INFO 09-17 18:43:30 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=49917) INFO 09-17 18:43:33 [api_server.py:1813] vLLM API server version 0.10.2rc3.dev169+ge3db5ebb6.d20250917
(APIServer pid=49917) INFO 09-17 18:43:33 [utils.py:328] non-default args: {'model_tag': 'openai/whisper-large-v3', 'model': 'openai/whisper-large-v3'}
(APIServer pid=49917) INFO 09-17 18:43:42 [__init__.py:707] Resolved architecture: WhisperForConditionalGeneration
(APIServer pid=49917) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=49917) INFO 09-17 18:43:42 [__init__.py:1762] Using max model len 448
(APIServer pid=49917) INFO 09-17 18:43:43 [scheduler.py:197] Encoder-decoder models do not support chunked prefill nor prefix caching; disabling both.
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 11915.64it/s]
  1. New tests pass locally.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@simondanielsson simondanielsson changed the title [Bug]: Clean up chunked prefill logging when using whisper [Bugfix]: Clean up chunked prefill logging when using whisper Sep 17, 2025
@mergify
Copy link

mergify bot commented Sep 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @simondanielsson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment on lines 96 to 102
is_encoder_decoder: bool = False
"""True if the model is an encoder-decoder model."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this already exists in ModelConfig, why duplicate it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we likely don't want to store it here as well.

Would an InitVar be sufficient here?

Copy link
Member

@hmellor hmellor Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InitVar solution works.

However, in other cases like this (where two sibling configs interact) I've tended to perform those interactions in the parent's __post_init__, VllmConfig in this case. Would that work in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where I had it before this change, but we end up with a confusing log message about features being enabled coming from the SchedulerConfig's post_init before VllmConfig's post_init fixed it and disabled them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to perform the Chunked prefill is enabled... log in the VllmConfig, but not sure it makes sense to put it there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thankn you for explaining. Let's stick with the initvar

@simondanielsson simondanielsson force-pushed the feature/clean-up-prefill-logging branch from fefc7ab to 4a48dc5 Compare September 18, 2025 13:03
@simondanielsson simondanielsson force-pushed the feature/clean-up-prefill-logging branch 3 times, most recently from 2abc703 to b721f6c Compare September 29, 2025 18:32
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson simondanielsson force-pushed the feature/clean-up-prefill-logging branch from b721f6c to 46594df Compare September 30, 2025 06:34
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson
Copy link
Contributor Author

@russelb conflicts fixed now - should be good to go after CI. Thanks!

@hmellor hmellor enabled auto-merge (squash) September 30, 2025 07:30
@hmellor hmellor merged commit e23cacd into vllm-project:main Sep 30, 2025
45 checks passed
@simondanielsson simondanielsson deleted the feature/clean-up-prefill-logging branch September 30, 2025 08:36
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Oct 24, 2025
### What this PR does / why we need it?
This is the step 1 of refactoring code to adapt with vllm main, and this
pr aligned with
vllm-project/vllm@17c540a

1. refactor deepseek to the latest code arch as of
vllm-project/vllm@17c540a
 
2. bunches of fixes due to vllm changes
- Fix `AscendScheduler` `__post_init__`, caused by
vllm-project/vllm#25075
- Fix `AscendScheduler` init got an unexpected arg `block_size`, caused
by vllm-project/vllm#26296
- Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by
vllm-project/vllm#23485
- Fix `MLAAttention` import,caused by
vllm-project/vllm#25103
- Fix `SharedFusedMoE` import, caused by
vllm-project/vllm#26145
- Fix `LazyLoader` improt, caused by
vllm-project/vllm#27022
- Fix `vllm.utils.swap_dict_values` improt, caused by
vllm-project/vllm#26990
- Fix `Backend` enum import, caused by
vllm-project/vllm#25893
- Fix `CompilationLevel` renaming to `CompilationMode` issue introduced
by vllm-project/vllm#26355
- Fix fused_moe ops, caused by
vllm-project/vllm#24097
- Fix bert model because of `inputs_embeds`, caused by
vllm-project/vllm#25922
- Fix MRope because of `get_input_positions_tensor` to
`get_mrope_input_positions`, caused by
vllm-project/vllm#24172
- Fix `splitting_ops` changes introduced by
vllm-project/vllm#25845
- Fix multi-modality changes introduced by
vllm-project/vllm#16229
- Fix lora bias dropping issue introduced by
vllm-project/vllm#25807
- Fix structured ouput break introduced by
vllm-project/vllm#26737

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…roject#25075)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Clean up chunked prefill logging when using whisper

3 participants