[Bug]: Got KeyError 'layers.0.self_attn.qkv_proj.weight' when loading a partially quantized model. #11790
Closed
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I used llmcompressor to create a partially quantized model, where the MLP layers are quantized but the attention layers are not. When I tried to load this model using vllm, I encountered a
KeyError: 'layers.0.self_attn.qkv_proj.weight'
, as shown below.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: