[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806

Potabk · 2025-05-11T08:39:53Z

What this PR does / why we need it?

Fix V1 error found by nightly_ci, broken by [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483, make InputBatch parameter consistent with vllm.

Does this PR introduce any user-facing change?

How was this patch tested?

CI passed

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk · 2025-05-11T08:58:39Z

@wangxiyuan

Signed-off-by: wangli <wangli858794774@gmail.com>

Yikun · 2025-05-11T09:48:53Z

diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
index 76d3ea4..11355f8 100644
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -55,6 +55,7 @@ from vllm.v1.worker.gpu_input_batch import CachedRequestState, InputBatch
 from vllm_ascend.attention.attention import AttentionMaskBuilder
 from vllm_ascend.attention.attention_v1 import AscendAttentionState
 from vllm_ascend.platform import NPUPlatform
+from vllm_ascend.utils import vllm_version_is

 if TYPE_CHECKING:
     import xgrammar as xgr  # type: ignore[import-untyped]
@@ -186,14 +187,26 @@ class NPUModelRunner:
         # Request states.
         self.requests: Dict[str, CachedRequestState] = {}
         # Persistent batch.
-        self.input_batch = InputBatch(
-            max_num_reqs=self.max_num_reqs,
-            max_model_len=self.model_config.max_model_len,
-            max_num_blocks_per_req=self.max_num_blocks_per_req,
-            device=self.device,
-            pin_memory=True,
-            vocab_size=self.model_config.get_vocab_size(),
-        )
+        # Remove this when drop 0.8.5 suport
+        if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):
+            self.input_batch = InputBatch(
+                max_num_reqs=self.max_num_reqs,
+                max_model_len=self.model_config.max_model_len,
+                max_num_blocks_per_req=self.max_num_blocks_per_req,
+                device=self.device,
+                pin_memory=True,
+                vocab_size=self.model_config.get_vocab_size(),
+            )
+        else:
+            self.input_batch = InputBatch(
+                max_num_reqs=self.max_num_reqs,
+                max_model_len=self.model_config.max_model_len,
+                max_num_blocks_per_req=self.max_num_blocks_per_req,
+                max_num_batched_tokens=self.max_num_tokens,
+                device=self.device,
+                pin_memory=True,
+                vocab_size=self.model_config.get_vocab_size(),
+            )

I suggest to keep it simple and will drop useless branch when we don't support v0.8.5

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun

@Potabk Thanks for fixing it, will merge this after CI passed

Potabk · 2025-05-11T12:16:51Z

@Yikun Some errors occurred in the CI, I think it's a problem with the python version. The python 3.10 we are using does not import TypedDict from typing , in the python 3.10, the method should import as from typing_extensions import TypedDict, NotRequired

Potabk · 2025-05-11T13:38:59Z

@Yikun Some errors occurred in the CI, I think it's a problem with the python version. The python 3.10 we are using does not import TypedDict from typing , in the python 3.10, the method should import as from typing_extensions import TypedDict, NotRequired

Found it! this commit broken, this issue should fixed after #17962 merged

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun · 2025-05-11T14:28:21Z

@Potabk Thanks for investigation, let's skipt it and recover main CI first

Yikun · 2025-05-11T16:37:19Z

CI passed, merge it to recover main CI

vllm-project#806) ### What this PR does / why we need it? 1. Fix V1 error found by [nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610), broken by [[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483](vllm-project/vllm#17483), make `InputBatch` parameter consistent with vllm. 2. Disable benmark and fix it in upstream. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>

Potabk added 2 commits May 11, 2025 16:34

fix InputBatch parameter

865dcde

Signed-off-by: wangli <wangli858794774@gmail.com>

fix yapf

9e16555

Signed-off-by: wangli <wangli858794774@gmail.com>

make version compatibility

7ed7143

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the bugfix branch from 50d7a83 to 7ed7143 Compare May 11, 2025 09:19

Potabk added 3 commits May 11, 2025 17:20

remove redundant lines

bdc6771

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

5d26f41

Signed-off-by: wangli <wangli858794774@gmail.com>

fix yapf

53de314

Signed-off-by: wangli <wangli858794774@gmail.com>

Yikun added 2 commits May 11, 2025 17:53

Update model_runner_v1.py

a5e2010

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Update model_runner_v1.py

14ab037

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun approved these changes May 11, 2025

View reviewed changes

Yikun changed the title ~~[Bugfix] Fix model_runner_v1 InputBatch parameter~~ [Bugfix] Fix model_runner_v1 InputBatch parameter to make main CI pass May 11, 2025

Yikun changed the title ~~[Bugfix] Fix model_runner_v1 InputBatch parameter to make main CI pass~~ [Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass May 11, 2025

Yikun mentioned this pull request May 11, 2025

Re-enable vllm-empty/tests/benchmarks test #808

Closed

Skip benchmarktest

558b617

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun merged commit cdece86 into vllm-project:main May 11, 2025
14 checks passed

Potabk deleted the bugfix branch May 12, 2025 01:31

Yikun added the vllm-break label Sep 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806

[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806

Uh oh!

Potabk commented May 11, 2025 •

edited

Loading

Uh oh!

Potabk commented May 11, 2025

Uh oh!

Yikun commented May 11, 2025 •

edited

Loading

Uh oh!

Yikun left a comment •

edited

Loading

Uh oh!

Potabk commented May 11, 2025 •

edited

Loading

Uh oh!

Potabk commented May 11, 2025 •

edited

Loading

Uh oh!

Yikun commented May 11, 2025

Uh oh!

Uh oh!

Yikun commented May 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806

[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806

Uh oh!

Conversation

Potabk commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Potabk commented May 11, 2025

Uh oh!

Yikun commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yikun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Potabk commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Potabk commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yikun commented May 11, 2025

Uh oh!

Uh oh!

Yikun commented May 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Potabk commented May 11, 2025 •

edited

Loading

Yikun commented May 11, 2025 •

edited

Loading

Yikun left a comment •

edited

Loading

Potabk commented May 11, 2025 •

edited

Loading

Potabk commented May 11, 2025 •

edited

Loading