[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268

JartX · 2025-09-19T15:57:57Z

Hi everyone! This PR fixes the same issue as the following PR: #23994
Only for qwen3 next, you need to check if it has been quantized with auto_gptq using https://github.com/intel/auto-round

Command:

auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --bits 8 --format "auto_gptq" --output_dir /workspace/outputs

@Isotr0py, it simply rechecks the key extracted from the quant_config. I've verified that it works. Could you try the following model? https://huggingface.co/Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound

IMPORTANT NOTE:
Platform: ROCM
In order to run Qwen3-Next-80B-A3B-Instruct-w4g128 AutoRound-GPTQ, I had to merge the following PR because the block size of attention is 272.
#24486
EDIT: PR #25105 Solved

…fig like qwen3moe Signed-off-by: JartX <sagformas@epdcenter.es>

gemini-code-assist

Code Review

This pull request introduces a fix to enable GPTQ quantization for the gate layer in Qwen3 Next MOE models when using AutoRound. The change correctly modifies _maybe_ignore_quant_config to check for the autoround_version attribute in the GPTQConfig or GPTQMarlinConfig. This ensures that the gate's quantization configuration is only ignored for standard GPTQ/AutoGPTQ, but applied for AutoRound-quantized models, which is the intended behavior. The implementation is correct and effectively resolves the compatibility issue.

JartX · 2025-09-19T23:54:04Z

@Isotr0py @vadiklyutiy
After exiting main and adapting the PR: #24486 again, the PR: #25105, works correctly, so for my part I close the problem, and only leave the support of AutoGPTQ=AutoRound-GPTQ to @Isotr0py, I withdraw the rest of the comments to avoid noise, Thank you all very much :)

jeejeelee

LGTM

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: charlifu <charlifu@amd.com>

…utoGPTQ and AutoRound-GPTQ) (#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

_maybe_ignore_quant_config now get check autoround key from quant_con…

8a8c096

…fig like qwen3moe Signed-off-by: JartX <sagformas@epdcenter.es>

JartX requested a review from sighingnow as a code owner September 19, 2025 15:57

mergify bot added the qwen Related to Qwen models label Sep 19, 2025

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

Isotr0py requested a review from jeejeelee September 19, 2025 16:50

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 20, 2025

jeejeelee approved these changes Sep 20, 2025

View reviewed changes

jeejeelee merged commit 3642909 into vllm-project:main Sep 20, 2025
66 checks passed

JartX deleted the fix/qwen3-next-moe-autogptq_autoround_gptq branch September 20, 2025 10:18

Isotr0py mentioned this pull request Sep 23, 2025

[Quantization] Add field to skip unquantized modules for GPTQ config #25455

Merged

5 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…

4ac055d

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…

b7c9866

…utoGPTQ and AutoRound-GPTQ) (#25268) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: yewentao256 <zhyanwentao@126.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…

0472dd9

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (A…

59af577

…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268) Signed-off-by: JartX <sagformas@epdcenter.es>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268

Uh oh!

JartX commented Sep 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

JartX commented Sep 19, 2025

Uh oh!

jeejeelee left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) #25268

Uh oh!

Conversation

JartX commented Sep 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

JartX commented Sep 19, 2025

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JartX commented Sep 19, 2025 •

edited by github-actions bot

Loading