You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importosimportsubprocessfromawqimportAutoAWQForCausalLMfromtransformersimportAutoTokenizermodel_path='.../Qwen2.5-1.5B-Instruct/'quant_path='.../Qwen2.5-1.5B-Instruct-AWQ'# Load modelmodel=AutoAWQForCausalLM.from_pretrained(
model_path, low_cpu_mem_usage=True, use_cache=False, device_map="cuda",
)
tokenizer=AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize# NOTE: We avoid packing weights, so you cannot use this model in AutoAWQ# after quantizing. The saved model is FP16 but has the AWQ scales applied.model.quantize(
tokenizer,
quant_config={ "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" },
# export_compatible=True,
)
# Save quantized modelmodel.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
Model Series
Qwen2.5
What are the models used?
Qwen2.5-1.5B-Instruct-AWQ
What is the scenario where the problem happened?
Qwen2.5对于一些提示词的输出全都是感叹号
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
OS: Ubuntu 22.04.1
Python: Python 3.11.10
GPUs: 2x 4080S 32G
NVIDIA driver: 550.107.02
CUDA compiler: 12.1
PyTorch: 2.3.1+cu121
AutoAWQ: 0.2.6
Transformer: 0.46.3
Description
Steps to reproduce
这是一个出现在Qwen2.5进行AWQ量化过程中的问题。复现流程是是这样的:
使用的输入:
Expected results
至少不应该全都是感叹号。
如果这是一个bug,很有可能是因为fuse_layers的问题,希望检查下是如何影响的。
Anything Else
另外,对于Qwen2-1.5B,使用AWQ+fuse_layers,下述提示词只有第一个才会输出全是感叹号。这看起来也像是一个bug。
The text was updated successfully, but these errors were encountered: