-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Your current environment
Tested under VLLM ==0.7.3 and 0.8.2
🐛 Describe the bug
I am using this model (after quantizing it to 4 bits):
nvidia/Llama-3_3-Nemotron-Super-49B-v1
https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1
and according to the documentation here:
https://docs.vllm.ai/en/latest/models/supported_models.html
(This is the summary of the related section in the above website:
To determine whether a given model is natively supported, you can check the config.json file inside the HF repository. If the "architectures" field contains a model architecture listed below, then it should be natively supported.
)
The nvidia/Llama-3_3-Nemotron-Super-49B-v1 model architecture according to the HF is DeciLMForCausalLM
and according to the documentation above, DeciLMForCausalLM is listed as one of the supported architectures hence it should be natively supported. However loading the above model create these issues:
1- Under VLLM ==0.7.3 I get this error:
DeciLMConfig object has no attribute 'num_key_value_heads_per_layer'
2- Under VLLM==0.8.2 I get OOM and this error together.
Can someone explain if I am doing anything wrong, or if I am interpreting anything wrong.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
