[Perf] Delete redundant operations in model_runner and forward_context #3677

realliujiaxu · 2025-10-23T10:40:46Z

What this PR does / why we need it?

Remove redundant operations from model_runner and forward_context. This optimization can significantly reduce the idle time (bubble) before decoding when running models with small parameter counts (e.g., Qwen/Qwen2.5-0.5B).

Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms :
Before

After

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@releases/v0.11.1

github-actions · 2025-10-23T10:41:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the codebase by moving some checks into utility functions and removing redundant operations. The changes in model_runner_v1.py correctly fix a bug where stale data was being used and remove redundant code, improving correctness and clarity. However, the new utility function has_layer_idx in utils.py introduces a critical bug due to a flawed caching mechanism. I've provided a comment with a suggested fix for this issue.

gemini-code-assist · 2025-10-23T10:42:38Z

vllm_ascend/utils.py

+def has_layer_idx(model_instance: torch.nn.Module) -> bool:
+    global _HAS_LAYER_IDX
+    if _HAS_LAYER_IDX is None:
+        _HAS_LAYER_IDX = model_instance is not None and \
+            hasattr(model_instance, "model") and \
+            hasattr(model_instance.model, "start_layer")
+    return _HAS_LAYER_IDX


The current implementation of has_layer_idx uses a global variable _HAS_LAYER_IDX to cache its result. This caching mechanism is flawed because the function's outcome depends on the model_instance argument, which is not guaranteed to be the same across all calls.

For instance, set_ascend_forward_context can be invoked with model_instance=None (e.g., from kv_connector_no_forward), which would cause _HAS_LAYER_IDX to be cached as False. Any subsequent calls, even with a valid model_instance, would then incorrectly return False, preventing features that rely on layer_idx from being enabled.

Since this check is inexpensive, I recommend removing the caching mechanism to fix this bug. The global variable _HAS_LAYER_IDX should also be removed.

def has_layer_idx(model_instance: torch.nn.Module) -> bool: return (model_instance is not None and hasattr(model_instance, "model") and hasattr(model_instance.model, "start_layer"))

github-actions · 2025-10-24T02:35:26Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-10-24T08:57:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-10-25T03:25:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

…rd_context (#3775)  cherry pick #3677 Remove redundant operations from `model_runner` and `forward_context`. This optimization can significantly reduce the idle time (bubble) before decoding when running models with small parameter counts (e.g., Qwen/Qwen2.5-0.5B). Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms : Before <img width="1655" height="696" alt="image" src="https://github.com/user-attachments/assets/d7608e52-2438-46dd-8fc9-391fd6274495" /> After <img width="1607" height="774" alt="image" src="https://github.com/user-attachments/assets/56daf081-2dba-4d2e-99d4-e055187d9806" /> ### What this PR does / why we need it?  ### Does this PR introduce _any_ user-facing change?  No ### How was this patch tested?  --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>

github-actions bot added the module:core label Oct 23, 2025

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

realliujiaxu force-pushed the schedule branch from 6547eb7 to 7dc6a33 Compare October 23, 2025 10:43

realliujiaxu changed the title ~~[Feat] Delete redundant operations in model_runner and forward_context~~ [Refactor] Delete redundant operations in model_runner and forward_context Oct 24, 2025

github-actions bot added module:ops merge-conflicts labels Oct 24, 2025

realliujiaxu force-pushed the schedule branch from ce910ca to 35b0878 Compare October 24, 2025 03:32

realliujiaxu changed the title ~~[Refactor] Delete redundant operations in model_runner and forward_context~~ [Perf] Delete redundant operations in model_runner and forward_context Oct 24, 2025

realliujiaxu force-pushed the schedule branch from 35b0878 to 804c357 Compare October 24, 2025 03:36

github-actions bot added module:tests and removed merge-conflicts labels Oct 24, 2025

github-actions bot added the merge-conflicts label Oct 24, 2025

realliujiaxu force-pushed the schedule branch from 804c357 to 453e3ff Compare October 25, 2025 01:47

github-actions bot added merge-conflicts and removed merge-conflicts labels Oct 25, 2025

realliujiaxu added 2 commits October 27, 2025 09:30

Delete redundant operations in model_runner and forward_context

9b246cf

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

run _select_moe_comm_method only in MoE model

fcb2127

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

realliujiaxu force-pushed the schedule branch from 453e3ff to fcb2127 Compare October 27, 2025 01:31

github-actions bot removed the merge-conflicts label Oct 27, 2025

realliujiaxu mentioned this pull request Oct 27, 2025

[v0.11.0][Perf] Delete redundant operations in model_runner and forward_context #3775

Merged

yiz-liu added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Oct 27, 2025

model_instance is None when running kv_connector_no_forward

623f782

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

yiz-liu approved these changes Oct 29, 2025

View reviewed changes

yiz-liu merged commit 7419186 into vllm-project:main Oct 29, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Delete redundant operations in model_runner and forward_context #3677

[Perf] Delete redundant operations in model_runner and forward_context #3677

Uh oh!

realliujiaxu commented Oct 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Perf] Delete redundant operations in model_runner and forward_context #3677

[Perf] Delete redundant operations in model_runner and forward_context #3677

Uh oh!

Conversation

realliujiaxu commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

realliujiaxu commented Oct 23, 2025 •

edited by github-actions bot

Loading