Skip to content

Conversation

@heheda12345
Copy link
Collaborator

Ideally, we should initialize the input batch inside initialize_kv_cache based on the kv cache config. However, as in #18298, due to some unknown reasons, we have to initialize the input batch before load_model, quantization + weight offloading will fail otherwise.

As a temporary solution, we initialize the input batch in GPUModelRunner.__init__, and re-initialize it in initialize_kv_cache only when necessary.

To avoid the error in weight offloading + quantization, we only allow the re-initialization when weight offloading is disabled. This is fine as there is only one kv cache group now, and after hybrid allocator is landed, we can disable hybrid allocator and ensure there is only one kv cache group in this case.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added v1 tpu Related to Google TPUs labels May 24, 2025
@mergify
Copy link

mergify bot commented May 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 30, 2025
…_batch

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@mergify mergify bot removed the needs-rebase label Jun 2, 2025
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2025
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WoosukKwon WoosukKwon enabled auto-merge (squash) June 3, 2025 16:01
@WoosukKwon WoosukKwon merged commit 6cac54f into vllm-project:main Jun 3, 2025
71 checks passed
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jun 4, 2025
### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 5, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 5, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 5, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 5, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ject#990)

### What this PR does / why we need it?

- Re-enable sleep mode test
- Fix nightly performance benchmark workflow
- Fix model-runner-v1 bug for upstream
[change](vllm-project/vllm#18654)
---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants