Skip to content

Int4(llama-2-chat-7b) converted model generates response in German language #1683

@bhardwaj-nakul

Description

@bhardwaj-nakul

Describe the bug

  1. Install all the pip dependencies for latest 254-llmchatbot notebook
  2. Follow the steps to convert "llama-2-chat-7b" model to int4 format with default configuration.
  3. Select device as "CPU"
  4. Select model to run "INT4"
  5. Run step to Load and compile the model
  6. Set max_new_token=500 and run ov_model.generate with prompt "Describe Intel in 100 words or less"

Expected behavior

Output should be produced in English. However we are getting output in German language.

  1. Is there any issue with converting llama-2-chat-7b model into int4 format with OpenVino ?
  2. Is the issue with latest openvino==2023.3.0 or nncf==2.9.0.dev0+84b46f58 ?

Screenshots
Screenshot 1
image

Screenshot 2
image

Screenshot 3
image

Installation instructions (Please mark the checkbox)

Additional context
I tried playing around with model_compression_params but it didn't help to resolve this issue.

"llama-2-chat-7b": {
    "mode": nncf.CompressWeightsMode.INT4_SYM,
    "group_size": 128,
    "ratio": 0.8,
},
"llama-2-chat-7b": {
    "mode": nncf.CompressWeightsMode.INT4_ASYM,
    "group_size": 128,
    "ratio": 0.8,
},
"llama-2-chat-7b": {
    "mode": nncf.CompressWeightsMode.INT4_SYM,
    "group_size": 64,
    "ratio": 0.8,
},
"llama-2-chat-7b": {
    "mode": nncf.CompressWeightsMode.INT4_SYM,
    "group_size": 64,
    "ratio": 0.6,
},

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions