-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358
Comments
你真速度,同样的问题 |
我似乎没有遇到这个问题,用 fastchat 加载的话 |
same question. |
我用个nf4进行量化,这个int4版也用了,是同样的问题 |
之前7B int4也有反馈这个问题,如果要调确定性还是建议调topp,temperature容易出现这类溢出的问题,尤其int4模型。目前没找到比较好的解法 |
之前有个issue专门讨论了这个问题,temperature过低导致softmax溢出,我在A6000,bf16没问题,fp16在temperature小于0.5后报错 |
同样的问题,只是我没有发现问题出在温度上,还闷头找原因呢。。。 |
@JustinLin610 @deathxlent @OnlookerWong I am using Qwen-14B-Chat-Int4 and in the inference I am also getting: "RuntimeError: probability tensor contains either inf, nan or element < 0" I have used different temperature, top_p values in addition to do_sample but with the same result. Has this been fixed and how? Many thanks |
fastchat可以直接部署int4模型吗 |
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
qwen/Qwen-14B-Chat-Int4 temperature有问题
temperature必须设置大于等于0.51才能使用,0.5及以下都会报:RuntimeError: probability tensor contains either inf, nan or element < 0
7B没有这种问题
已经换了三台不同的服务器,各种版本都有,都试过了,都是7B正常,14B-Int4异常报错,因为只有3090,没试过14B是否有问题
期望行为 | Expected Behavior
temperature=0.01时也能正常使用
复现方法 | Steps To Reproduce
通过修改generation_config.json 或者直接代码中配置都是一个效果,小于0.5及以下会报错
response, history = model.chat(tokenizer, prompt,
history=history, top_p=1,
temperature=0.01)
{ history = json.loads(history_steneration_config.json
"chat_format": "chatml",
"eos_token_id": 151643,
"pad_token_id": 151643,
"max_window_size": 6144,
"max_new_tokens": 512,
"do_sample": true,
"top_k": 0,
"top_p": 1,
"temperature":0.51,
"transformers_version": "4.31.0"
}
运行环境 | Environment
备注 | Anything else?
python load代码
The text was updated successfully, but these errors were encountered: