-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268
Conversation
…fig like qwen3moe Signed-off-by: JartX <sagformas@epdcenter.es>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix to enable GPTQ quantization for the gate layer in Qwen3 Next MOE models when using AutoRound. The change correctly modifies _maybe_ignore_quant_config to check for the autoround_version attribute in the GPTQConfig or GPTQMarlinConfig. This ensures that the gate's quantization configuration is only ignored for standard GPTQ/AutoGPTQ, but applied for AutoRound-quantized models, which is the intended behavior. The implementation is correct and effectively resolves the compatibility issue.
|
@Isotr0py @vadiklyutiy |
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: charlifu <charlifu@amd.com>
…utoGPTQ and AutoRound-GPTQ) (#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Hi everyone! This PR fixes the same issue as the following PR: #23994
Only for qwen3 next, you need to check if it has been quantized with auto_gptq using https://github.com/intel/auto-round
Command:
auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --bits 8 --format "auto_gptq" --output_dir /workspace/outputs
@Isotr0py, it simply rechecks the key extracted from the quant_config. I've verified that it works. Could you try the following model? https://huggingface.co/Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound
IMPORTANT NOTE:
Platform: ROCM
In order to run Qwen3-Next-80B-A3B-Instruct-w4g128 AutoRound-GPTQ, I had to merge the following PR because the block size of attention is 272.
#24486
EDIT: PR #25105 Solved