-
Notifications
You must be signed in to change notification settings - Fork 506
[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass #806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
index 76d3ea4..11355f8 100644
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -55,6 +55,7 @@ from vllm.v1.worker.gpu_input_batch import CachedRequestState, InputBatch
from vllm_ascend.attention.attention import AttentionMaskBuilder
from vllm_ascend.attention.attention_v1 import AscendAttentionState
from vllm_ascend.platform import NPUPlatform
+from vllm_ascend.utils import vllm_version_is
if TYPE_CHECKING:
import xgrammar as xgr # type: ignore[import-untyped]
@@ -186,14 +187,26 @@ class NPUModelRunner:
# Request states.
self.requests: Dict[str, CachedRequestState] = {}
# Persistent batch.
- self.input_batch = InputBatch(
- max_num_reqs=self.max_num_reqs,
- max_model_len=self.model_config.max_model_len,
- max_num_blocks_per_req=self.max_num_blocks_per_req,
- device=self.device,
- pin_memory=True,
- vocab_size=self.model_config.get_vocab_size(),
- )
+ # Remove this when drop 0.8.5 suport
+ if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):
+ self.input_batch = InputBatch(
+ max_num_reqs=self.max_num_reqs,
+ max_model_len=self.model_config.max_model_len,
+ max_num_blocks_per_req=self.max_num_blocks_per_req,
+ device=self.device,
+ pin_memory=True,
+ vocab_size=self.model_config.get_vocab_size(),
+ )
+ else:
+ self.input_batch = InputBatch(
+ max_num_reqs=self.max_num_reqs,
+ max_model_len=self.model_config.max_model_len,
+ max_num_blocks_per_req=self.max_num_blocks_per_req,
+ max_num_batched_tokens=self.max_num_tokens,
+ device=self.device,
+ pin_memory=True,
+ vocab_size=self.model_config.get_vocab_size(),
+ ) I suggest to keep it simple and will drop useless branch when we don't support v0.8.5 |
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Potabk Thanks for fixing it, will merge this after CI passed
model_runner_v1
InputBatch parametermodel_runner_v1
InputBatch parameter to make main CI pass
model_runner_v1
InputBatch parameter to make main CI pass
Found it! this commit broken, this issue should fixed after #17962 merged |
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
@Potabk Thanks for investigation, let's skipt it and recover main CI first |
CI passed, merge it to recover main CI |
vllm-project#806) ### What this PR does / why we need it? 1. Fix V1 error found by [nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610), broken by [[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483](vllm-project/vllm#17483), make `InputBatch` parameter consistent with vllm. 2. Disable benmark and fix it in upstream. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
vllm-project#806) ### What this PR does / why we need it? 1. Fix V1 error found by [nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610), broken by [[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483](vllm-project/vllm#17483), make `InputBatch` parameter consistent with vllm. 2. Disable benmark and fix it in upstream. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
What this PR does / why we need it?
Fix V1 error found by nightly_ci, broken by [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483, make
InputBatch
parameter consistent with vllm.Does this PR introduce any user-facing change?
How was this patch tested?
CI passed