-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] Use correct key "ignore" for config.json non-quantized layers #25706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lee Nau <lnau@nvidia.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug in parsing quantization configurations by using the ignore key instead of exclude_modules for modern Hugging Face config.json files. The changes are applied to both ModelOptFp8Config and ModelOptNvFp4Config. My review focuses on improving the robustness of this configuration parsing. I've suggested falling back to the legacy key if the new key is not present to prevent silent failures and improve user experience.
| kv_cache_quant_method = config.get("kv_cache_quant_algo") | ||
| exclude_modules = config.get("exclude_modules") | ||
| # "ignore" is the key in config.json | ||
| exclude_modules = config.get("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better robustness and to avoid silent failures, it's a good practice to support both the new key (ignore) and the legacy key (exclude_modules), with the new key taking precedence. This prevents issues if a user provides a modern config but mistakenly uses the old key.
| exclude_modules = config.get("ignore") | |
| exclude_modules = config.get("ignore", config.get("exclude_modules")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adopt this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, are there conditions under which the old key would be used in the new file? Maybe it would be better to enforce only the new key in the new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle the fallback case for hf_quant_config.json, Is it not handled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback case is actually in the initial if-statment. The presence of the key name "quantization" is the condition in that initial if-statement. That key only exists in the legacy hf_quant_config.json file. For instance:
https://huggingface.co/nvidia/Qwen3-30B-A3B-FP4/blob/main/hf_quant_config.json#L6
The key for quantization in the config.json file is "quantization_config". For instance: https://huggingface.co/nvidia/Qwen3-30B-A3B-FP4/blob/main/config.json#L38
So the existing logic here is entirely based upon the differing key names in those two files (hf_quant_config.json and config.json).
I tried to indicate this with the comments I left in the code above the key checks.
|
|
||
| exclude_modules = config.get("exclude_modules", []) | ||
| # "ignore" is the key in config.json | ||
| exclude_modules = config.get("ignore", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve robustness and prevent silent configuration errors, consider supporting both the new ignore key and the legacy exclude_modules key. Prioritizing ignore while falling back to exclude_modules ensures that user configurations with the old key in a modern format are still handled correctly.
| exclude_modules = config.get("ignore", []) | |
| exclude_modules = config.get("ignore", config.get("exclude_modules", [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adopt this suggestion.
pavanimajety
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
If you have the logs for the results, please add them to PR desc.
| kv_cache_quant_method = config.get("kv_cache_quant_algo") | ||
| exclude_modules = config.get("exclude_modules") | ||
| # "ignore" is the key in config.json | ||
| exclude_modules = config.get("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adopt this suggestion.
|
|
||
| exclude_modules = config.get("exclude_modules", []) | ||
| # "ignore" is the key in config.json | ||
| exclude_modules = config.get("ignore", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adopt this suggestion.
| # "exclude_modules" is the key in the legacy hf_quant_config.json | ||
| exclude_modules = quant_config.get("exclude_modules", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # "exclude_modules" is the key in the legacy hf_quant_config.json | |
| exclude_modules = quant_config.get("exclude_modules", []) | |
| # "exclude_modules" is the key in the legacy hf_quant_config.json | |
| exclude_modules = quant_config.get("ignore", config.get("exclude_modules", [])) |
I think we should also modify this line.
|
This PR would fix most FP8 checkpoints including meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8. |
|
@leejnau @cjackal I don't understand why this would affect non-modelopt checkpoints? The meta-llama and RedHatAI fp8 checkpoints use https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py |
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>
#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com>
Purpose
Fix a bug wherein the dictionary key was incorrect for the Hugging Face
config.jsonquantization layers to be ignored (non-quantized layers). For the legacyhf_quant_config.jsonfile, the key is "exclude_modules". For the more modern in-placeconfig.jsonthe key is "ignore".Test Plan
Tested with the following models with the prompt
'<s> The capital of France is Paris. The capital of the United States is Washington, D.C.'nvidia/Qwen3-30B-A3B-FP4nvidia/Phi-4-reasoning-plus-FP4nvidia/Llama-3.1-8B-Instruct-FP8nvidia/Phi-4-multimodal-instruct-FP8RedHatAI/phi-4-FP8-dynamicRedHatAI/Apertus-8B-Instruct-2509-FP8-dynamicTest Result
All models loaded and ran successfully, producing reasonable output:
nvidia/Qwen3-30B-A3B-FP4:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Brazil is Brasília. The capital of Canada is Ottawa. The capital of Germany is Berlin. The capital of Italy is Rome. The capital of Japan is Tokyo. The capital of South Korea is Seoul. The capital of the United Kingdomnvidia/Phi-4-reasoning-plus-FP4:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. I. I. .... ........................................nvidia/Llama-3.1-8B-Instruct-FP8:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of the United Kingdom is London. The capital of China is Beijing. The capital of Japan is Tokyo. The capital of India is New Delhi. The capital of Brazil is Brasília. The capital of Russia is Moscow. The capital of Canadanvidia/Phi-4-multimodal-instruct-FP8:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Japan is Tokyo. The capital of Australia is Canberra. The capital of Brazil is Brasília. The capital of India is New Delhi. The capital of Canada is Ottawa. The capital of Germany is Berlin. The capital of Italy is Rome.RedHatAI/phi-4-FP8-dynamic:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Japan is Tokyo. The capital of Brazil is Brasilia. The capital of Australia is Canberra. The capital of Canada is Ottawa. The capital of India is New Delhi. The capital of China is Beijing. The capital of Russia is MoscowRedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic:<s> The capital of France is Paris. The capital of the United States is Washington, D.C. The capital of Canada is Ottawa. The capital of Australia is Canberra. The capital of Brazil is Brasília. The capital of Mexico is Mexico City. The capital of Germany is Berlin. The capital of the United Kingdom is London. The capital of Italy