-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with the DeepSeek-V2.5-1210-FP8 model in vllm, specifically related to the configuration of quantization parameters in the cutlass_moe_fp8 function. The changes involve adding a check to ensure that the dimensions of w1_scale are compatible with w1_q when per-out-channel quantization is enabled, and adding another check when it is disabled.
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Fixes #27254
Test
Origin:
Now:
(APIServer pid=655331) INFO: Started server process [655331] (APIServer pid=655331) INFO: Waiting for application startup. (APIServer pid=655331) INFO: Application startup complete.