Skip to content

[Bug]: Disagreement and misalignment between supported models in documentation and actual testing #15779

@manitadayon

Description

@manitadayon

Your current environment

Tested under VLLM ==0.7.3 and 0.8.2

🐛 Describe the bug

I am using this model (after quantizing it to 4 bits):
nvidia/Llama-3_3-Nemotron-Super-49B-v1
https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1
and according to the documentation here:
https://docs.vllm.ai/en/latest/models/supported_models.html
(This is the summary of the related section in the above website:
To determine whether a given model is natively supported, you can check the config.json file inside the HF repository. If the "architectures" field contains a model architecture listed below, then it should be natively supported.
)
The nvidia/Llama-3_3-Nemotron-Super-49B-v1 model architecture according to the HF is DeciLMForCausalLM
and according to the documentation above, DeciLMForCausalLM is listed as one of the supported architectures hence it should be natively supported. However loading the above model create these issues:

Image

1- Under VLLM ==0.7.3 I get this error:
DeciLMConfig object has no attribute 'num_key_value_heads_per_layer'
2- Under VLLM==0.8.2 I get OOM and this error together.
Can someone explain if I am doing anything wrong, or if I am interpreting anything wrong.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions