[v1] Re-init input batch for multiple kv cache groups #18654

heheda12345 · 2025-05-24T09:02:26Z

Ideally, we should initialize the input batch inside initialize_kv_cache based on the kv cache config. However, as in #18298, due to some unknown reasons, we have to initialize the input batch before load_model, quantization + weight offloading will fail otherwise.

As a temporary solution, we initialize the input batch in GPUModelRunner.__init__, and re-initialize it in initialize_kv_cache only when necessary.

To avoid the error in weight offloading + quantization, we only allow the re-initialization when weight offloading is disabled. This is fine as there is only one kv cache group now, and after hybrid allocator is landed, we can disable hybrid allocator and ensure there is only one kv cache group in this case.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

github-actions · 2025-05-24T09:02:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

…_batch

mergify · 2025-05-30T10:05:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…_batch Signed-off-by: Chen Zhang <zhangch99@outlook.com>

WoosukKwon

LGTM

### What this PR does / why we need it? - Re-enable sleep mode test - Fix nightly performance benchmark workflow - Fix model-runner-v1 bug for upstream [change](vllm-project/vllm#18654) --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…ject#990) ### What this PR does / why we need it? - Re-enable sleep mode test - Fix nightly performance benchmark workflow - Fix model-runner-v1 bug for upstream [change](vllm-project/vllm#18654) --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…ject#990) ### What this PR does / why we need it? - Re-enable sleep mode test - Fix nightly performance benchmark workflow - Fix model-runner-v1 bug for upstream [change](vllm-project/vllm#18654) --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>

…ject#990) ### What this PR does / why we need it? - Re-enable sleep mode test - Fix nightly performance benchmark workflow - Fix model-runner-v1 bug for upstream [change](vllm-project/vllm#18654) --------- Signed-off-by: wangli <wangli858794774@gmail.com>

re-init input batch

9970daf

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

heheda12345 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners May 24, 2025 09:02

mergify bot added v1 tpu Related to Google TPUs labels May 24, 2025

Merge branch 'main' of github.com:vllm-project/vllm into reinit_input…

646f961

…_batch

mergify bot added the needs-rebase label May 30, 2025

Merge branch 'main' of github.com:vllm-project/vllm into reinit_input…

f8be307

…_batch Signed-off-by: Chen Zhang <zhangch99@outlook.com>

mergify bot removed the needs-rebase label Jun 2, 2025

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2025

WoosukKwon approved these changes Jun 3, 2025

View reviewed changes

WoosukKwon enabled auto-merge (squash) June 3, 2025 16:01

WoosukKwon merged commit 6cac54f into vllm-project:main Jun 3, 2025
71 checks passed

Potabk mentioned this pull request Jun 4, 2025

[CI] Re-enable sleep mode test and skip failure breaking CI vllm-project/vllm-ascend#990

Merged

kzjeef mentioned this pull request Jul 10, 2025

[Model] Add reason parser for Hunyuan A13B Model. #20625

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[v1] Re-init input batch for multiple kv cache groups #18654

[v1] Re-init input batch for multiple kv cache groups #18654

Uh oh!

heheda12345 commented May 24, 2025

Uh oh!

github-actions bot commented May 24, 2025

Uh oh!

mergify bot commented May 30, 2025

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

[v1] Re-init input batch for multiple kv cache groups #18654

[v1] Re-init input batch for multiple kv cache groups #18654

Uh oh!

Conversation

heheda12345 commented May 24, 2025

Uh oh!

github-actions bot commented May 24, 2025

Uh oh!

mergify bot commented May 30, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants