Skip to content

Conversation

@yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented May 21, 2025

What this PR does / why we need it?

Fix update_aclgraph_sizes when running MoE models.

Does this PR introduce any user-facing change?

How was this patch tested?

@yiz-liu yiz-liu force-pushed the fix-graph branch 2 times, most recently from c9a9b21 to 45d1cc8 Compare May 21, 2025 08:21
if (additional_config
and "expert_tensor_parallel_size" in additional_config
and not parallel_config.enable_expert_parallel):
parallel_config.expert_tensor_parallel_size = int(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no expert_tensor_parallel_size in ParallelConfig? same for parallel_config.expert_parallel_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM does not support tensor parallelism for experts’ weights, so I added this attribute. This change primarily addresses scenarios where the number of devices significantly exceeds the number of experts.

# Calculate expert parallel size based on world size
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Does this mean etp size not equal to 1 only when disabling ep and enabling etp? then why we set ep size here if ep is disabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned above, this is more like an enhancement.

Comment on lines +150 to +141
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still confused. Could I understand like this: when disabling ep and enabling etp, a expected etp size is set, and an unexpected ep size is set?

or should ep size be set only when ep is enabled?

Suggested change
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
if parallel_config.enable_expert_parallel:
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this in detail later, you are right, perhaps we need to give careful consideration to the configuration logic here to avoid any unwanted confusion.

@yiz-liu yiz-liu force-pushed the fix-graph branch 4 times, most recently from 64ea69c to 06be3d2 Compare May 26, 2025 03:10
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
…or_parallel_size

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
@wangxiyuan wangxiyuan merged commit 5a1689f into vllm-project:main May 30, 2025
21 of 26 checks passed
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jun 3, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
David9857 pushed a commit to David9857/vllm-ascend that referenced this pull request Jun 3, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
@yiz-liu yiz-liu deleted the fix-graph branch June 10, 2025 09:42
ganyi1996ppo pushed a commit that referenced this pull request Jun 21, 2025
### What this PR does / why we need it?
1. [PR913](#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove its patchs in this pr.
6. v1 mtp ut
failed(https://github.com/vllm-project/vllm-ascend/actions/runs/15782006176/job/44489813330?pr=1323),
I commented it out. @XWFAlone @JC-ut0 

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
ngram and eagle were tested locally using an 800I A2 machine, using real
weights instead of the random small weights used by UT, and using a
scenario test with tp>1.
and other were tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
wangxiyuan pushed a commit that referenced this pull request Jun 23, 2025
### What this PR does / why we need it?
1. [PR913](#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
1. [PR913](vllm-project#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](vllm-project#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…#913)

### What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
1. [PR913](vllm-project#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](vllm-project#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants