-
Notifications
You must be signed in to change notification settings - Fork 561
[BugFix] Fix accuracy bugs for unquantized deepseekv3 models #897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: angazenn <zengyanjia@huawei.com>
|
Could you post the inference output of deepseek-v2-lite-chat after this pr? I come up with a precision issue with deepseek-v2-lite-chat now |
|
still work in progress? or ready for review. |
bd9fc37 to
9b23577
Compare
It is ready now. |
vllm_ascend/ops/fused_moe.py
Outdated
|
|
||
| ep_group = get_ep_group() | ||
| self.ep_size = ep_group.world_size | ||
| self.ep_group = get_ep_group() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change the code if its not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some redundant codes in fused_moe before. Now we maintain self.ep_group and use self.ep_group.world_size to get ep_size.
fa81306 to
605fe9a
Compare
Signed-off-by: angazenn <zengyanjia@huawei.com>
…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
…oject#897) ### What this PR does / why we need it? This PR fixes two accuracy bugs incurred by PR vllm-project#819 when running deepseekv3 series models: 1. vllm-project#819 adds `all_to_all` communication in quantized cases, but `all_gather` && `reduce_scatter` are removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models with `ep_size == world_size`, the moe modules fail to communicate. Therefore, this PR adds `all_to_all` communication on unquantized situation to solve this accuracy issue. 2. Use `ep_size` rather than `dp_size` to decide whether to use `all_to_all` in moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
What this PR does / why we need it?
This PR fixes two accuracy bugs incurred by PR #819 when running deepseekv3 series models:
all_to_allcommunication in quantized cases, butall_gather&&reduce_scatterare removed in both of quantized and unquantized cases. When running unquantized deepseekv3 models withep_size == world_size, the moe modules fail to communicate. Therefore, this PR addsall_to_allcommunication on unquantized situation to solve this accuracy issue.ep_sizerather thandp_sizeto decide whether to useall_to_allin moe.Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI passed with new added/existing test.