Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: Qwen2.5对于一些提示词的输出全都是感叹号 #1128

Open
4 tasks done
Ivy233 opened this issue Dec 12, 2024 · 1 comment
Open
4 tasks done

[Badcase]: Qwen2.5对于一些提示词的输出全都是感叹号 #1128

Ivy233 opened this issue Dec 12, 2024 · 1 comment

Comments

@Ivy233
Copy link

Ivy233 commented Dec 12, 2024

Model Series

Qwen2.5

What are the models used?

Qwen2.5-1.5B-Instruct-AWQ

What is the scenario where the problem happened?

Qwen2.5对于一些提示词的输出全都是感叹号

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

OS: Ubuntu 22.04.1
Python: Python 3.11.10
GPUs: 2x 4080S 32G
NVIDIA driver: 550.107.02
CUDA compiler: 12.1
PyTorch: 2.3.1+cu121
AutoAWQ: 0.2.6
Transformer: 0.46.3

Description

Steps to reproduce

这是一个出现在Qwen2.5进行AWQ量化过程中的问题。复现流程是是这样的:

  1. 使用这些代码生成AWQ的模型。
import os
import subprocess
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = '.../Qwen2.5-1.5B-Instruct/'
quant_path = '.../Qwen2.5-1.5B-Instruct-AWQ'

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, low_cpu_mem_usage=True, use_cache=False, device_map="cuda",
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
# NOTE: We avoid packing weights, so you cannot use this model in AutoAWQ
# after quantizing. The saved model is FP16 but has the AWQ scales applied.
model.quantize(
    tokenizer,
    quant_config={ "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" },
    # export_compatible=True,
)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
  1. 对其进行README中的benchmark。不过略有点修改:使用AutoAWQForCausalLM.from_quantized进行读取,fuse_layers设置为True。其他跟随autoawq默认参数即可。

使用的输入:

  1. Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun
  2. 月光如流水一般,静静地泻在这一片叶子和花上。薄薄的青雾浮起在荷塘里。叶子和花仿佛在牛乳中洗过一样;又像笼着轻纱的梦。虽然是满月,天上却有一层淡淡的云,所以不能朗照;但我以为这恰是到了好处——酣眠固不可少,小睡也别有风味的。
  3. 介绍一下Large Language Model。

Expected results

至少不应该全都是感叹号。

如果这是一个bug,很有可能是因为fuse_layers的问题,希望检查下是如何影响的。

Anything Else

另外,对于Qwen2-1.5B,使用AWQ+fuse_layers,下述提示词只有第一个才会输出全是感叹号。这看起来也像是一个bug。

  1. 介绍一下LLM。
  2. 介绍一下大模型。
  3. 介绍一下Large Language Model。
@jklj077
Copy link
Collaborator

jklj077 commented Dec 17, 2024

I don't think your reproduction code could be run, since there isn't a dataset for calibration.

In addition, the model at https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-AWQ does work with your cases.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants