Skip to content

Conversation

Potabk
Copy link
Collaborator

@Potabk Potabk commented May 11, 2025

What this PR does / why we need it?

Fix V1 error found by nightly_ci, broken by [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483, make InputBatch parameter consistent with vllm.

Does this PR introduce any user-facing change?

How was this patch tested?

CI passed

Potabk added 2 commits May 11, 2025 16:34
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
@Potabk
Copy link
Collaborator Author

Potabk commented May 11, 2025

@wangxiyuan

Signed-off-by: wangli <wangli858794774@gmail.com>
Potabk added 3 commits May 11, 2025 17:20
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
@Yikun
Copy link
Collaborator

Yikun commented May 11, 2025

diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
index 76d3ea4..11355f8 100644
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -55,6 +55,7 @@ from vllm.v1.worker.gpu_input_batch import CachedRequestState, InputBatch
 from vllm_ascend.attention.attention import AttentionMaskBuilder
 from vllm_ascend.attention.attention_v1 import AscendAttentionState
 from vllm_ascend.platform import NPUPlatform
+from vllm_ascend.utils import vllm_version_is

 if TYPE_CHECKING:
     import xgrammar as xgr  # type: ignore[import-untyped]
@@ -186,14 +187,26 @@ class NPUModelRunner:
         # Request states.
         self.requests: Dict[str, CachedRequestState] = {}
         # Persistent batch.
-        self.input_batch = InputBatch(
-            max_num_reqs=self.max_num_reqs,
-            max_model_len=self.model_config.max_model_len,
-            max_num_blocks_per_req=self.max_num_blocks_per_req,
-            device=self.device,
-            pin_memory=True,
-            vocab_size=self.model_config.get_vocab_size(),
-        )
+        # Remove this when drop 0.8.5 suport
+        if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):
+            self.input_batch = InputBatch(
+                max_num_reqs=self.max_num_reqs,
+                max_model_len=self.model_config.max_model_len,
+                max_num_blocks_per_req=self.max_num_blocks_per_req,
+                device=self.device,
+                pin_memory=True,
+                vocab_size=self.model_config.get_vocab_size(),
+            )
+        else:
+            self.input_batch = InputBatch(
+                max_num_reqs=self.max_num_reqs,
+                max_model_len=self.model_config.max_model_len,
+                max_num_blocks_per_req=self.max_num_blocks_per_req,
+                max_num_batched_tokens=self.max_num_tokens,
+                device=self.device,
+                pin_memory=True,
+                vocab_size=self.model_config.get_vocab_size(),
+            )

I suggest to keep it simple and will drop useless branch when we don't support v0.8.5

Yikun added 2 commits May 11, 2025 17:53
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Potabk Thanks for fixing it, will merge this after CI passed

@Yikun Yikun changed the title [Bugfix] Fix model_runner_v1 InputBatch parameter [Bugfix] Fix model_runner_v1 InputBatch parameter to make main CI pass May 11, 2025
@Yikun Yikun changed the title [Bugfix] Fix model_runner_v1 InputBatch parameter to make main CI pass [Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass May 11, 2025
@Potabk
Copy link
Collaborator Author

Potabk commented May 11, 2025

@Yikun Some errors occurred in the CI, I think it's a problem with the python version. The python 3.10 we are using does not import TypedDict from typing , in the python 3.10, the method should import as from typing_extensions import TypedDict, NotRequired

@Potabk
Copy link
Collaborator Author

Potabk commented May 11, 2025

@Yikun Some errors occurred in the CI, I think it's a problem with the python version. The python 3.10 we are using does not import TypedDict from typing , in the python 3.10, the method should import as from typing_extensions import TypedDict, NotRequired

Found it! this commit broken, this issue should fixed after #17962 merged

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
@Yikun
Copy link
Collaborator

Yikun commented May 11, 2025

@Potabk Thanks for investigation, let's skipt it and recover main CI first

@Yikun Yikun merged commit cdece86 into vllm-project:main May 11, 2025
14 checks passed
@Yikun
Copy link
Collaborator

Yikun commented May 11, 2025

CI passed, merge it to recover main CI

@Potabk Potabk deleted the bugfix branch May 12, 2025 01:31
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
vllm-project#806)

### What this PR does / why we need it?

1. Fix V1 error found by
[nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610),
broken by [[v1] Pass BlockTable and KVCacheSpec to
AttentionMetadataBuilders
#17483](vllm-project/vllm#17483), make
`InputBatch` parameter consistent with vllm.
2. Disable benmark and fix it in upstream.

### Does this PR introduce _any_ user-facing change?

No


### How was this patch tested?

CI passed

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
vllm-project#806)

### What this PR does / why we need it?

1. Fix V1 error found by
[nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610),
broken by [[v1] Pass BlockTable and KVCacheSpec to
AttentionMetadataBuilders
#17483](vllm-project/vllm#17483), make
`InputBatch` parameter consistent with vllm.
2. Disable benmark and fix it in upstream.

### Does this PR introduce _any_ user-facing change?

No


### How was this patch tested?

CI passed

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants