Fix GGUF loader for Qwen3 MoE. #22785

Gh0u1L5 · 2025-08-13T05:35:21Z

Purpose

Despite upstream repositories (gguf-py, transformers) having added support for Qwen3 MoE GGUF quantization, vLLM GGUF loading is still broken.
This PR aims to fix the GGUF loader and a mismatch in the Qwen3 MoE model's embed_tokens layer.

Test Plan

wget https://huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF/resolve/main/Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf
vllm serve ./Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf

Test Result

Successfully loaded the GGUF model.

References

github-actions · 2025-08-13T05:35:29Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

gemini-code-assist

Code Review

This pull request correctly adds support for Qwen2 and Qwen3 MoE models in the GGUF loader and fixes an embed_tokens layer initialization issue. The changes appear to be correct and well-tested. My main feedback is regarding code duplication in the GGUF loader, which could be refactored to improve long-term maintainability.

gemini-code-assist · 2025-08-13T05:41:53Z

vllm/model_executor/model_loader/gguf_loader.py

+        if model_type in ("qwen2_moe", "qwen3_moe"):
+            model_type = model_type.replace("_", "")
+            # GGUF layer map assumes that we will have a merged expert weights
+            # so we need to map them manually
+            for idx in range(config.num_hidden_layers):
+                gguf_to_hf_name_map[f"blk.{idx}.ffn_down_exps.weight"] = \
+                        f"model.layers.{idx}.mlp.experts.0.down_proj.weight"
+                gguf_to_hf_name_map[f"blk.{idx}.ffn_gate_exps.weight"] = \
+                        f"model.layers.{idx}.mlp.experts.0.gate_proj.weight"
+                gguf_to_hf_name_map[f"blk.{idx}.ffn_up_exps.weight"] = \
+                        f"model.layers.{idx}.mlp.experts.0.up_proj.weight"


This block for handling Qwen MoE models introduces significant code duplication with the existing logic for DeepSeek MoE models just above. While functionally correct, this pattern makes the code harder to maintain. If a change is needed for expert weight mapping, it would have to be applied in multiple places. Please consider refactoring to share the common logic between these model types.

Isotr0py

Thanks!

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

yanmindi · 2025-08-18T03:00:23Z

请问你用的版本是什么我们试过都没法正常加载

Isotr0py · 2025-08-18T03:28:32Z

请问你用的版本是什么我们试过都没法正常加载

Try using the nightly wheel, this PR hasn't been included in newest release.

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

adonishong · 2025-08-18T05:18:40Z

请问你用的版本是什么我们试过都没法正常加载

Try using the nightly wheel, this PR hasn't been included in newest release.
pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

we were using latest main branch code, pip install -e .; just now, I tried, "pip remove vllm", then "pip install -U vllm
--pre
--extra-index-url https://wheels.vllm.ai/nightly"

still no luck in load qwen3 mode gguf, both 30B version and 480B version

both are terminated in this error
"vllm/transformers_utils/config.py", line 623, in get_sentence_transformer_tokenizer_config
if not encoder_dict and not model.startswith("/"):
^^^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'startswith'"

and this error info could be observed at very beginning

"ERROR 08-18 05:01:32 [config.py:133] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/dev/shm/Qwen3-30B-A3B-Q4_K_M.gguf'. Use repo_type argument if needed., retrying 1 of 2
ERROR 08-18 05:01:34 [config.py:131] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/dev/shm/Qwen3-30B-A3B-Q4_K_M.gguf'. Use repo_type argument if needed."

Isotr0py · 2025-08-18T05:28:25Z

we were using latest main branch code

AttributeError: 'PosixPath' object has no attribute 'startswith'"

Perhaps you forgot to pull the main branch? This bug has been fixed by #21579 three weeks ago...

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

yanmindi · 2025-08-27T03:49:11Z

Purpose

Despite upstream repositories (gguf-py, transformers) having added support for Qwen3 MoE GGUF quantization, vLLM GGUF loading is still broken. This PR aims to fix the GGUF loader and a mismatch in the Qwen3 MoE model's embed_tokens layer.

Test Plan
wget https://huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF/resolve/main/Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf
vllm serve ./Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf
Test Result

Successfully loaded the GGUF model.

References

[Feature]: Qwen3 Models GGUF Support #21511

Support loading Qwen3 MoE GGUF huggingface/transformers#39638

Fix Qwen3 MoE GGUF architecture mismatch huggingface/transformers#39976

再请教个问题我的也运行起来 gguf 格式了但是推理的时候报错乱码是什么原因？你得推理输出正常吗？换了好多版本的gguf还是不行

yanmindi · 2025-08-27T03:50:35Z

就是输出的内容是乱码一些无意义的东西或者就是不停的输出 iiiiiiiiiiiiiii 服务器端没报错退出

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

Gh0u1L5 requested a review from sighingnow as a code owner August 13, 2025 05:35

mergify bot added the qwen Related to Qwen models label Aug 13, 2025

Gh0u1L5 mentioned this pull request Aug 13, 2025

[Feature]: Qwen3 Models GGUF Support #21511

Open

1 task

Fix GGUF loader for Qwen3 MoE.

f91f0a8

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

Gh0u1L5 force-pushed the main branch from 424eaeb to f91f0a8 Compare August 13, 2025 05:40

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

jeejeelee requested a review from Isotr0py August 13, 2025 05:45

Isotr0py approved these changes Aug 13, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) August 13, 2025 11:07

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 13, 2025

vllm-bot merged commit b159c0a into vllm-project:main Aug 13, 2025
48 of 55 checks passed

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

8d15b3b

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

jinzhen-lin mentioned this pull request Aug 16, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

Merged

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

4b5bced

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

0b36a38

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

a506be2

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

ac66d78

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Fix GGUF loader for Qwen3 MoE. (vllm-project#22785)

8c70373

Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>

Uh oh!

Fix GGUF loader for Qwen3 MoE. #22785

Fix GGUF loader for Qwen3 MoE. #22785

Uh oh!

Conversation

Gh0u1L5 commented Aug 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

References

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yanmindi commented Aug 18, 2025

Uh oh!

Isotr0py commented Aug 18, 2025

Uh oh!

adonishong commented Aug 18, 2025

Uh oh!

Isotr0py commented Aug 18, 2025

Uh oh!

yanmindi commented Aug 27, 2025

Purpose

Test Plan

Test Result

References

Uh oh!

yanmindi commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Gh0u1L5 commented Aug 13, 2025 •

edited by github-actions bot

Loading