[Fix] Fix update_aclgraph_sizes when running MoE models #913

yiz-liu · 2025-05-21T08:04:26Z

What this PR does / why we need it?

Fix update_aclgraph_sizes when running MoE models.

Does this PR introduce any user-facing change?

How was this patch tested?

MengqingCao · 2025-05-21T09:27:09Z

vllm_ascend/platform.py

+            if (additional_config
+                    and "expert_tensor_parallel_size" in additional_config
+                    and not parallel_config.enable_expert_parallel):
+                parallel_config.expert_tensor_parallel_size = int(


Seems there is no expert_tensor_parallel_size in ParallelConfig? same for parallel_config.expert_parallel_size

vLLM does not support tensor parallelism for experts’ weights, so I added this attribute. This change primarily addresses scenarios where the number of devices significantly exceeds the number of experts.

MengqingCao · 2025-05-21T12:42:11Z

vllm_ascend/platform.py

+            # Calculate expert parallel size based on world size
+            parallel_config.expert_parallel_size = (
+                parallel_config.world_size //
+                parallel_config.expert_tensor_parallel_size)


QQ: Does this mean etp size not equal to 1 only when disabling ep and enabling etp? then why we set ep size here if ep is disabled?

Like I mentioned above, this is more like an enhancement.

MengqingCao · 2025-05-21T13:14:18Z

vllm_ascend/platform.py

+            parallel_config.expert_parallel_size = (
+                parallel_config.world_size //
+                parallel_config.expert_tensor_parallel_size)


Sorry, I'm still confused. Could I understand like this: when disabling ep and enabling etp, a expected etp size is set, and an unexpected ep size is set?

or should ep size be set only when ep is enabled?

Suggested change

parallel_config.expert_parallel_size = (

parallel_config.world_size //

parallel_config.expert_tensor_parallel_size)

if parallel_config.enable_expert_parallel:

parallel_config.expert_parallel_size = (

parallel_config.world_size //

parallel_config.expert_tensor_parallel_size)

We can discuss this in detail later, you are right, perhaps we need to give careful consideration to the configuration logic here to avoid any unwanted confusion.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

…or_parallel_size Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

…#913) ### What this PR does / why we need it? Fix update_aclgraph_sizes when running MoE models. --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>

…#913) ### What this PR does / why we need it? Fix update_aclgraph_sizes when running MoE models. --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

@XWFAlone

### What this PR does / why we need it? 1. [PR913](#913) introduced an error that caused V0's spec decode function to fail. [PR1109](#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove its patchs in this pr. 6. v1 mtp ut failed(https://github.com/vllm-project/vllm-ascend/actions/runs/15782006176/job/44489813330?pr=1323), I commented it out. @XWFAlone @JC-ut0 ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? ngram and eagle were tested locally using an 800I A2 machine, using real weights instead of the random small weights used by UT, and using a scenario test with tp>1. and other were tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

### What this PR does / why we need it? 1. [PR913](#913) introduced an error that caused V0's spec decode function to fail. [PR1109](#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…#913) ### What this PR does / why we need it? Fix update_aclgraph_sizes when running MoE models. --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

### What this PR does / why we need it? 1. [PR913](vllm-project#913) introduced an error that caused V0's spec decode function to fail. [PR1109](vllm-project#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…#913) ### What this PR does / why we need it? Fix update_aclgraph_sizes when running MoE models. --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

### What this PR does / why we need it? 1. [PR913](vllm-project#913) introduced an error that caused V0's spec decode function to fail. [PR1109](vllm-project#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

github-actions bot added the module:core label May 21, 2025

yiz-liu force-pushed the fix-graph branch 2 times, most recently from c9a9b21 to 45d1cc8 Compare May 21, 2025 08:21

MengqingCao reviewed May 21, 2025

View reviewed changes

yiz-liu force-pushed the fix-graph branch from 45d1cc8 to ae10bc3 Compare May 21, 2025 12:56

MengqingCao reviewed May 21, 2025

View reviewed changes

yiz-liu force-pushed the fix-graph branch 4 times, most recently from 64ea69c to 06be3d2 Compare May 26, 2025 03:10

yiz-liu force-pushed the fix-graph branch from 06be3d2 to fec0aca Compare May 29, 2025 07:23

[Fix] Fix update_aclgraph_sizes when running MoE models

7db90a5

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

yiz-liu force-pushed the fix-graph branch from fec0aca to 7db90a5 Compare May 29, 2025 07:56

[Fix] Update the default value of expert_tensor_parallel_size to tens…

aa0bbc9

…or_parallel_size Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

wangxiyuan approved these changes May 30, 2025

View reviewed changes

wangxiyuan merged commit 5a1689f into vllm-project:main May 30, 2025
21 of 26 checks passed

yiz-liu deleted the fix-graph branch June 10, 2025 09:42

This was referenced Jun 20, 2025

[CI/UT][bugfix] fix v0 spec decode #1321

Merged

[v0.9.1-dev][CI/UT][bugfix]fix v0 spec decode #1323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Fix update_aclgraph_sizes when running MoE models #913

[Fix] Fix update_aclgraph_sizes when running MoE models #913

Uh oh!

yiz-liu commented May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Fix] Fix update_aclgraph_sizes when running MoE models #913

[Fix] Fix update_aclgraph_sizes when running MoE models #913

Uh oh!

Conversation

yiz-liu commented May 21, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants