[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897

Angazenn · 2025-05-19T08:47:59Z

What this PR does / why we need it?

This PR fixes two accuracy bugs incurred by PR #819 when running deepseekv3 series models:

[BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant #819 adds all_to_all communication in quantized cases, but all_gather && reduce_scatter are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with ep_size == world_size, the moe modules fail to communicate. Therefore, this PR adds all_to_all communication on unquantized situation to solve this accuracy issue.
Use ep_size rather than dp_size to decide whether to use all_to_all in moe.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Signed-off-by: angazenn <zengyanjia@huawei.com>

MengqingCao · 2025-05-19T12:25:10Z

Could you post the inference output of deepseek-v2-lite-chat after this pr? I come up with a precision issue with deepseek-v2-lite-chat now

wangxiyuan · 2025-05-20T06:38:21Z

still work in progress? or ready for review.

Angazenn · 2025-05-20T11:13:58Z

still work in progress? or ready for review.

It is ready now.

ganyi1996ppo · 2025-05-23T02:40:13Z

vllm_ascend/ops/fused_moe.py


-        ep_group = get_ep_group()
-        self.ep_size = ep_group.world_size
+        self.ep_group = get_ep_group()


Do not change the code if its not necessary

There are some redundant codes in fused_moe before. Now we maintain self.ep_group and use self.ep_group.world_size to get ep_size.

Signed-off-by: angazenn <zengyanjia@huawei.com>

…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>

…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>

add all2all ep for unquantized moe

b838a87

Signed-off-by: angazenn <zengyanjia@huawei.com>

github-actions bot added module:ops module:quantization labels May 19, 2025

Angazenn force-pushed the fix_acc branch from 63e4fee to 6cfbd04 Compare May 19, 2025 09:49

Angazenn changed the title ~~[WIP] Fix accuracy bugs for unquantized deepseekv2/v3 models~~ [WIP] Fix accuracy bugs for unquantized deepseekv3 models May 20, 2025

Angazenn changed the title ~~[WIP] Fix accuracy bugs for unquantized deepseekv3 models~~ [BugFix] Fix accuracy bugs for unquantized deepseekv3 models May 20, 2025

Angazenn force-pushed the fix_acc branch 2 times, most recently from bd9fc37 to 9b23577 Compare May 20, 2025 09:10

wangxiyuan approved these changes May 21, 2025

View reviewed changes

wangxiyuan added the ready read for review label May 21, 2025

ganyi1996ppo reviewed May 23, 2025

View reviewed changes

Angazenn force-pushed the fix_acc branch 2 times, most recently from fa81306 to 605fe9a Compare May 23, 2025 07:31

fix accuracy bug when dp == 1 && ep > 1

767f846

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the fix_acc branch from 605fe9a to 767f846 Compare May 23, 2025 07:55

ganyi1996ppo merged commit 1f9fb86 into vllm-project:main May 24, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897

[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897

Uh oh!

Angazenn commented May 19, 2025 •

edited

Loading

Uh oh!

MengqingCao commented May 19, 2025

Uh oh!

wangxiyuan commented May 20, 2025

Uh oh!

Angazenn commented May 20, 2025

Uh oh!

ganyi1996ppo May 23, 2025

Uh oh!

Angazenn May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897

[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897

Uh oh!

Conversation

Angazenn commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao commented May 19, 2025

Uh oh!

wangxiyuan commented May 20, 2025

Uh oh!

Angazenn commented May 20, 2025

Uh oh!

ganyi1996ppo May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Angazenn commented May 19, 2025 •

edited

Loading