correct bug to fix the value of max_num_tokens #3933

zouyida2052 · 2025-10-31T08:03:49Z

What this PR does / why we need it?

correct bug to fix the value of max_num_tokens

Does this PR introduce any user-facing change?

no

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

github-actions · 2025-10-31T08:05:01Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request fixes a critical bug in the calculation of max_num_tokens within the NPUTorchairModelRunner. The original code incorrectly used the tensor parallel size, which resulted in a wrong capacity calculation for mc2_tokens_capacity. This could lead to incorrect behavior or performance issues, especially for MoE models. The proposed change correctly computes max_num_tokens using the maximum number of requests and the decode query length, aligning with the intended logic and fixing the bug. The change is correct and necessary.

gemini-code-assist · 2025-10-31T08:05:11Z

vllm_ascend/torchair/torchair_model_runner.py

        # NOTE: To be clear, we need to make sure that during graph capture, the number of
        # tokens is less than or equal to mc2_tokens_capacity. According to _set_cudagraph_sizes,
        # the max number of tokens in graph is min(max_num_seqs * uniform_decode_query_len, 512).
-        max_num_tokens = self.parallel_config.tensor_parallel_size


The calculation for max_num_tokens was incorrect. It was set to the tensor parallel size, which is not related to the number of tokens. This leads to an incorrect mc2_tokens_capacity, which can cause issues with MoE communication strategy selection and other capacity-dependent logic. The fix correctly calculates it based on the maximum number of requests and the uniform decode query length.

Suggested change

max_num_tokens = self.parallel_config.tensor_parallel_size

max_num_tokens = self.max_num_reqs * self.uniform_decode_query_len

correct bug to fix the value of max_num_tokens

86c31fc

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

gemini-code-assist bot reviewed Oct 31, 2025

View reviewed changes

wangxiyuan approved these changes Oct 31, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Oct 31, 2025

MengqingCao approved these changes Nov 3, 2025

View reviewed changes

MengqingCao merged commit ec98320 into vllm-project:main Nov 3, 2025
50 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

correct bug to fix the value of max_num_tokens #3933

correct bug to fix the value of max_num_tokens #3933

zouyida2052 commented Oct 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	max_num_tokens = self.parallel_config.tensor_parallel_size
	max_num_tokens = self.max_num_reqs * self.uniform_decode_query_len

correct bug to fix the value of max_num_tokens #3933

correct bug to fix the value of max_num_tokens #3933

Conversation

zouyida2052 commented Oct 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zouyida2052 commented Oct 31, 2025 •

edited by github-actions bot

Loading