[Bugfix] Use correct key "ignore" for config.json non-quantized layers #25706

leejnau · 2025-09-25T20:05:15Z

Purpose

Fix a bug wherein the dictionary key was incorrect for the Hugging Face config.json quantization layers to be ignored (non-quantized layers). For the legacy hf_quant_config.json file, the key is "exclude_modules". For the more modern in-place config.json the key is "ignore".

Test Plan

Tested with the following models with the prompt
'<s> The capital of France is Paris. The capital of the United States is Washington, D.C.'

nvidia/Qwen3-30B-A3B-FP4
nvidia/Phi-4-reasoning-plus-FP4
nvidia/Llama-3.1-8B-Instruct-FP8
nvidia/Phi-4-multimodal-instruct-FP8
RedHatAI/phi-4-FP8-dynamic
RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic

Test Result

All models loaded and ran successfully, producing reasonable output:

nvidia/Qwen3-30B-A3B-FP4 : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Brazil is Brasília. The capital of Canada is Ottawa. The capital of Germany is Berlin. The capital of Italy is Rome. The capital of Japan is Tokyo. The capital of South Korea is Seoul. The capital of the United Kingdom

nvidia/Phi-4-reasoning-plus-FP4 : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. I. I. .... ........................................

nvidia/Llama-3.1-8B-Instruct-FP8 : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of the United Kingdom is London. The capital of China is Beijing. The capital of Japan is Tokyo. The capital of India is New Delhi. The capital of Brazil is Brasília. The capital of Russia is Moscow. The capital of Canada

nvidia/Phi-4-multimodal-instruct-FP8 : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Japan is Tokyo. The capital of Australia is Canberra. The capital of Brazil is Brasília. The capital of India is New Delhi. The capital of Canada is Ottawa. The capital of Germany is Berlin. The capital of Italy is Rome.

RedHatAI/phi-4-FP8-dynamic : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Japan is Tokyo. The capital of Brazil is Brasilia. The capital of Australia is Canberra. The capital of Canada is Ottawa. The capital of India is New Delhi. The capital of China is Beijing. The capital of Russia is Moscow

RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic : <s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Canada is Ottawa. The capital of Australia is Canberra. The capital of Brazil is Brasília. The capital of Mexico is Mexico City. The capital of Germany is Berlin. The capital of the United Kingdom is London. The capital of Italy

Signed-off-by: Lee Nau <lnau@nvidia.com>

github-actions · 2025-09-25T20:05:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request correctly fixes a bug in parsing quantization configurations by using the ignore key instead of exclude_modules for modern Hugging Face config.json files. The changes are applied to both ModelOptFp8Config and ModelOptNvFp4Config. My review focuses on improving the robustness of this configuration parsing. I've suggested falling back to the legacy key if the new key is not present to prevent silent failures and improve user experience.

gemini-code-assist · 2025-09-25T20:06:16Z

vllm/model_executor/layers/quantization/modelopt.py

            kv_cache_quant_method = config.get("kv_cache_quant_algo")
-            exclude_modules = config.get("exclude_modules")
+            # "ignore" is the key in config.json
+            exclude_modules = config.get("ignore")


For better robustness and to avoid silent failures, it's a good practice to support both the new key (ignore) and the legacy key (exclude_modules), with the new key taking precedence. This prevents issues if a user provides a modern config but mistakenly uses the old key.

Suggested change

exclude_modules = config.get("ignore")

exclude_modules = config.get("ignore", config.get("exclude_modules"))

Let's adopt this suggestion.

Ok, are there conditions under which the old key would be used in the new file? Maybe it would be better to enforce only the new key in the new file?

We need to handle the fallback case for hf_quant_config.json, Is it not handled here?

The fallback case is actually in the initial if-statment. The presence of the key name "quantization" is the condition in that initial if-statement. That key only exists in the legacy hf_quant_config.json file. For instance:
https://huggingface.co/nvidia/Qwen3-30B-A3B-FP4/blob/main/hf_quant_config.json#L6

The key for quantization in the config.json file is "quantization_config". For instance: https://huggingface.co/nvidia/Qwen3-30B-A3B-FP4/blob/main/config.json#L38

So the existing logic here is entirely based upon the differing key names in those two files (hf_quant_config.json and config.json).

I tried to indicate this with the comments I left in the code above the key checks.

gemini-code-assist · 2025-09-25T20:06:16Z

vllm/model_executor/layers/quantization/modelopt.py


-            exclude_modules = config.get("exclude_modules", [])
+            # "ignore" is the key in config.json
+            exclude_modules = config.get("ignore", [])


To improve robustness and prevent silent configuration errors, consider supporting both the new ignore key and the legacy exclude_modules key. Prioritizing ignore while falling back to exclude_modules ensures that user configurations with the old key in a modern format are still handled correctly.

Suggested change

exclude_modules = config.get("ignore", [])

exclude_modules = config.get("ignore", config.get("exclude_modules", []))

Let's adopt this suggestion.

pavanimajety

LGTM, thanks!
If you have the logs for the results, please add them to PR desc.

Edwardf0t1 · 2025-09-25T20:37:53Z

vllm/model_executor/layers/quantization/modelopt.py

            kv_cache_quant_method = config.get("kv_cache_quant_algo")
-            exclude_modules = config.get("exclude_modules")
+            # "ignore" is the key in config.json
+            exclude_modules = config.get("ignore")


Let's adopt this suggestion.

Edwardf0t1 · 2025-09-25T20:38:26Z

vllm/model_executor/layers/quantization/modelopt.py


-            exclude_modules = config.get("exclude_modules", [])
+            # "ignore" is the key in config.json
+            exclude_modules = config.get("ignore", [])


Let's adopt this suggestion.

Isotr0py · 2025-09-26T10:52:23Z

vllm/model_executor/layers/quantization/modelopt.py

+            # "exclude_modules" is the key in the legacy hf_quant_config.json
            exclude_modules = quant_config.get("exclude_modules", [])


Suggested change

# "exclude_modules" is the key in the legacy hf_quant_config.json

exclude_modules = quant_config.get("exclude_modules", [])

# "exclude_modules" is the key in the legacy hf_quant_config.json

exclude_modules = quant_config.get("ignore", config.get("exclude_modules", []))

I think we should also modify this line.

cjackal · 2025-09-29T07:17:47Z

This PR would fix most FP8 checkpoints including meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8.

mgoin · 2025-09-29T17:56:15Z

@leejnau @cjackal I don't understand why this would affect non-modelopt checkpoints? The meta-llama and RedHatAI fp8 checkpoints use https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

Use correct key "ignore" for config.json non-quantized layers

d6c4d41

Signed-off-by: Lee Nau <lnau@nvidia.com>

leejnau requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 25, 2025 20:05

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

pavanimajety approved these changes Sep 25, 2025

View reviewed changes

Edwardf0t1 approved these changes Sep 25, 2025

View reviewed changes

DarkLight1337 requested review from Isotr0py and hmellor and removed request for mgoin September 26, 2025 07:23

Isotr0py reviewed Sep 26, 2025

View reviewed changes

Isotr0py approved these changes Sep 29, 2025

View reviewed changes

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 29, 2025

mgoin merged commit d5ab285 into vllm-project:main Sep 29, 2025
61 checks passed

huydhn mentioned this pull request Sep 29, 2025

Add vLLM CI metrics pytorch/test-infra#7239

Merged

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

3f22bc9

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

9555929

#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

e9781bf

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

dac3257

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

1ca5ddf

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

a30762e

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

71bd66a

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>

	exclude_modules = config.get("ignore")
	exclude_modules = config.get("ignore", config.get("exclude_modules"))

	exclude_modules = config.get("ignore", [])
	exclude_modules = config.get("ignore", config.get("exclude_modules", []))

		# "exclude_modules" is the key in the legacy hf_quant_config.json
		exclude_modules = quant_config.get("exclude_modules", [])

Uh oh!

[Bugfix] Use correct key "ignore" for config.json non-quantized layers #25706

[Bugfix] Use correct key "ignore" for config.json non-quantized layers #25706

Uh oh!

Conversation

leejnau commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

leejnau Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leejnau Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

pavanimajety left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjackal commented Sep 29, 2025

Uh oh!

mgoin commented Sep 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

leejnau commented Sep 25, 2025 •

edited by github-actions bot

Loading

Edwardf0t1 Sep 26, 2025 •

edited

Loading

Isotr0py Sep 26, 2025 •

edited

Loading