Skip to content

Conversation

@Angazenn
Copy link
Collaborator

@Angazenn Angazenn commented May 19, 2025

What this PR does / why we need it?

This PR fixes two accuracy bugs incurred by PR #819 when running deepseekv3 series models:

  1. [BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant #819 adds all_to_all communication in quantized cases, but all_gather && reduce_scatter are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with ep_size == world_size, the moe modules fail to communicate. Therefore, this PR adds all_to_all communication on unquantized situation to solve this accuracy issue.
  2. Use ep_size rather than dp_size to decide whether to use all_to_all in moe.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Signed-off-by: angazenn <zengyanjia@huawei.com>
@MengqingCao
Copy link
Collaborator

Could you post the inference output of deepseek-v2-lite-chat after this pr? I come up with a precision issue with deepseek-v2-lite-chat now

@Angazenn Angazenn changed the title [WIP] Fix accuracy bugs for unquantized deepseekv2/v3 models [WIP] Fix accuracy bugs for unquantized deepseekv3 models May 20, 2025
@wangxiyuan
Copy link
Collaborator

still work in progress? or ready for review.

@Angazenn Angazenn changed the title [WIP] Fix accuracy bugs for unquantized deepseekv3 models [BugFix] Fix accuracy bugs for unquantized deepseekv3 models May 20, 2025
@Angazenn Angazenn force-pushed the fix_acc branch 2 times, most recently from bd9fc37 to 9b23577 Compare May 20, 2025 09:10
@Angazenn
Copy link
Collaborator Author

still work in progress? or ready for review.

It is ready now.

@wangxiyuan wangxiyuan added the ready read for review label May 21, 2025

ep_group = get_ep_group()
self.ep_size = ep_group.world_size
self.ep_group = get_ep_group()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change the code if its not necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some redundant codes in fused_moe before. Now we maintain self.ep_group and use self.ep_group.world_size to get ep_size.

@Angazenn Angazenn force-pushed the fix_acc branch 2 times, most recently from fa81306 to 605fe9a Compare May 23, 2025 07:31
Signed-off-by: angazenn <zengyanjia@huawei.com>
@ganyi1996ppo ganyi1996ppo merged commit 1f9fb86 into vllm-project:main May 24, 2025
15 checks passed
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request May 30, 2025
…oject#897)

### What this PR does / why we need it?
This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running
deepseekv3 series models:
1. vllm-project#819 adds `all_to_all` communication in quantized cases, but
`all_gather` && `reduce_scatter` are removed in both of quantized and
unquantized cases. When running unquantized deepseekv3 models with
`ep_size == world_size`, the moe modules fail to communicate. Therefore,
this PR adds `all_to_all` communication on unquantized situation to
solve this accuracy issue.
2. Use `ep_size` rather than `dp_size` to decide whether to use
`all_to_all` in moe.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
zxdukki pushed a commit to zxdukki/vllm-ascend that referenced this pull request Jun 3, 2025
…oject#897)

### What this PR does / why we need it?
This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running
deepseekv3 series models:
1. vllm-project#819 adds `all_to_all` communication in quantized cases, but
`all_gather` && `reduce_scatter` are removed in both of quantized and
unquantized cases. When running unquantized deepseekv3 models with
`ep_size == world_size`, the moe modules fail to communicate. Therefore,
this PR adds `all_to_all` communication on unquantized situation to
solve this accuracy issue.
2. Use `ep_size` rather than `dp_size` to decide whether to use
`all_to_all` in moe.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…oject#897)

### What this PR does / why we need it?
This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running
deepseekv3 series models:
1. vllm-project#819 adds `all_to_all` communication in quantized cases, but
`all_gather` && `reduce_scatter` are removed in both of quantized and
unquantized cases. When running unquantized deepseekv3 models with
`ep_size == world_size`, the moe modules fail to communicate. Therefore,
this PR adds `all_to_all` communication on unquantized situation to
solve this accuracy issue.
2. Use `ep_size` rather than `dp_size` to decide whether to use
`all_to_all` in moe.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…oject#897)

### What this PR does / why we need it?
This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running
deepseekv3 series models:
1. vllm-project#819 adds `all_to_all` communication in quantized cases, but
`all_gather` && `reduce_scatter` are removed in both of quantized and
unquantized cases. When running unquantized deepseekv3 models with
`ep_size == world_size`, the moe modules fail to communicate. Therefore,
this PR adds `all_to_all` communication on unquantized situation to
solve this accuracy issue.
2. Use `ep_size` rather than `dp_size` to decide whether to use
`all_to_all` in moe.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants