Skip to content

Conversation

@realliujiaxu
Copy link
Contributor

@realliujiaxu realliujiaxu commented Oct 23, 2025

What this PR does / why we need it?

Remove redundant operations from model_runner and forward_context. This optimization can significantly reduce the idle time (bubble) before decoding when running models with small parameter counts (e.g., Qwen/Qwen2.5-0.5B).

Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms :
Before
image

After
image

Does this PR introduce any user-facing change?

No

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase by moving some checks into utility functions and removing redundant operations. The changes in model_runner_v1.py correctly fix a bug where stale data was being used and remove redundant code, improving correctness and clarity. However, the new utility function has_layer_idx in utils.py introduces a critical bug due to a flawed caching mechanism. I've provided a comment with a suggested fix for this issue.

Comment on lines 772 to 808
def has_layer_idx(model_instance: torch.nn.Module) -> bool:
global _HAS_LAYER_IDX
if _HAS_LAYER_IDX is None:
_HAS_LAYER_IDX = model_instance is not None and \
hasattr(model_instance, "model") and \
hasattr(model_instance.model, "start_layer")
return _HAS_LAYER_IDX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of has_layer_idx uses a global variable _HAS_LAYER_IDX to cache its result. This caching mechanism is flawed because the function's outcome depends on the model_instance argument, which is not guaranteed to be the same across all calls.

For instance, set_ascend_forward_context can be invoked with model_instance=None (e.g., from kv_connector_no_forward), which would cause _HAS_LAYER_IDX to be cached as False. Any subsequent calls, even with a valid model_instance, would then incorrectly return False, preventing features that rely on layer_idx from being enabled.

Since this check is inexpensive, I recommend removing the caching mechanism to fix this bug. The global variable _HAS_LAYER_IDX should also be removed.

def has_layer_idx(model_instance: torch.nn.Module) -> bool:
    return (model_instance is not None and hasattr(model_instance, "model") and
            hasattr(model_instance.model, "start_layer"))

@realliujiaxu realliujiaxu changed the title [Feat] Delete redundant operations in model_runner and forward_context [Refactor] Delete redundant operations in model_runner and forward_context Oct 24, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@realliujiaxu realliujiaxu changed the title [Refactor] Delete redundant operations in model_runner and forward_context [Perf] Delete redundant operations in model_runner and forward_context Oct 24, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Oct 27, 2025
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
yiz-liu pushed a commit that referenced this pull request Oct 29, 2025
…rd_context (#3775)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->

cherry pick #3677

Remove redundant operations from `model_runner` and `forward_context`.
This optimization can significantly reduce the idle time (bubble) before
decoding when running models with small parameter counts (e.g.,
Qwen/Qwen2.5-0.5B).

Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms :
Before
<img width="1655" height="696" alt="image"
src="https://github.com/user-attachments/assets/d7608e52-2438-46dd-8fc9-391fd6274495"
/>

After
<img width="1607" height="774" alt="image"
src="https://github.com/user-attachments/assets/56daf081-2dba-4d2e-99d4-e055187d9806"
/>
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
No
### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@yiz-liu yiz-liu merged commit 7419186 into vllm-project:main Oct 29, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants