[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

deathxlent · 2023-09-26T08:38:29Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

qwen/Qwen-14B-Chat-Int4 temperature有问题
temperature必须设置大于等于0.51才能使用，0.5及以下都会报：RuntimeError: probability tensor contains either inf, nan or element < 0

7B没有这种问题
已经换了三台不同的服务器，各种版本都有，都试过了，都是7B正常，14B-Int4异常报错，因为只有3090，没试过14B是否有问题

期望行为 | Expected Behavior

temperature=0.01时也能正常使用

复现方法 | Steps To Reproduce

通过修改generation_config.json 或者直接代码中配置都是一个效果，小于0.5及以下会报错
response, history = model.chat(tokenizer, prompt,
history=history, top_p=1,
temperature=0.01)

{ history = json.loads(history_steneration_config.json
"chat_format": "chatml",
"eos_token_id": 151643,
"pad_token_id": 151643,
"max_window_size": 6144,
"max_new_tokens": 512,
"do_sample": true,
"top_k": 0,
"top_p": 1,
"temperature":0.51,
"transformers_version": "4.31.0"
}

运行环境 | Environment

- OS:Ubuntu18.04、Ubuntu20.04.6
- Python:3.10.12、3.9.6
- Transformers:4.33.2、4.33.1
- PyTorch:2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7，11.8

备注 | Anything else?

python load代码

import json

import torch
from fastapi import FastAPI, Body
from modelscope import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()
MODEL_NAME = './qwen/Qwen-14B-Chat-Int4/'
MODEL = None
TOKENIZER = None
inited=False


def load_module():
    global MODEL
    global TOKENIZER
    global inited
    if inited==False:
        model_name = MODEL_NAME
        print("Loading model... ")
        TOKENIZER = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
        MODEL = AutoModelForCausalLM.from_pretrained(model_name,
                                                     trust_remote_code=True,
                                                     device_map="auto")
        MODEL = MODEL.eval()
        inited=True
    return MODEL, TOKENIZER


def model_generate(prompt, history, model, tokenizer):
    response, history = model.chat(tokenizer, prompt,
                                   history=history, top_p=1,)
                                   #temperature=0.02)  # response: model response; histoty: recore the query and response
    return response, history

The text was updated successfully, but these errors were encountered:

gaodianzhuo · 2023-09-26T09:53:04Z

你真速度,同样的问题

tianjiqx · 2023-09-26T11:13:28Z

我似乎没有遇到这个问题，用 fastchat 加载的话

bsd20107 · 2023-09-26T14:52:34Z

same question.

WSR-wsr · 2023-09-27T01:22:27Z

我用个nf4进行量化，这个int4版也用了，是同样的问题
目前我的解决方法是：使用流式输出
这里是遇到结束标识符才报错的，所以使用流式输出，在报错时加入异常处理就可以了
这是我目前的解决办法，在nf4量化是可以的，gptq的量化可以试试

JustinLin610 · 2023-09-27T03:18:33Z

之前7B int4也有反馈这个问题，如果要调确定性还是建议调topp，temperature容易出现这类溢出的问题，尤其int4模型。目前没找到比较好的解法

rufeng-h · 2023-09-28T02:01:22Z

之前有个issue专门讨论了这个问题，temperature过低导致softmax溢出，我在A6000，bf16没问题，fp16在temperature小于0.5后报错

OnlookerWong · 2023-10-01T07:42:17Z

同样的问题，只是我没有发现问题出在温度上，还闷头找原因呢。。。

mikeleatila · 2023-10-30T00:39:19Z

@JustinLin610 @deathxlent @OnlookerWong I am using Qwen-14B-Chat-Int4 and in the inference I am also getting: "RuntimeError: probability tensor contains either inf, nan or element < 0" I have used different temperature, top_p values in addition to do_sample but with the same result. Has this been fixed and how? Many thanks

Mr-IT007 · 2023-11-22T12:51:31Z

我似乎没有遇到这个问题，用 fastchat 加载的话

fastchat可以直接部署int4模型吗

deathxlent changed the title ~~[BUG] <title>~~ [BUG] Qwen-14B-Chat-Int4 temperature有问题 Sep 26, 2023

JustinLin610 closed this as completed Oct 7, 2023

jklj077 mentioned this issue Nov 14, 2023

batch推理报错RuntimeError: probability tensor contains either inf, nan or element < 0 #620

Closed

2 tasks

JingyiChang mentioned this issue Dec 22, 2023

[BUG] RuntimeError: probability tensor contains either inf, nan or element < 0 #848

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

deathxlent commented Sep 26, 2023 •

edited

Loading

gaodianzhuo commented Sep 26, 2023

tianjiqx commented Sep 26, 2023

bsd20107 commented Sep 26, 2023

WSR-wsr commented Sep 27, 2023 •

edited

Loading

JustinLin610 commented Sep 27, 2023

rufeng-h commented Sep 28, 2023

OnlookerWong commented Oct 1, 2023

mikeleatila commented Oct 30, 2023

Mr-IT007 commented Nov 22, 2023

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

Comments

deathxlent commented Sep 26, 2023 • edited Loading

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

gaodianzhuo commented Sep 26, 2023

tianjiqx commented Sep 26, 2023

bsd20107 commented Sep 26, 2023

WSR-wsr commented Sep 27, 2023 • edited Loading

JustinLin610 commented Sep 27, 2023

rufeng-h commented Sep 28, 2023

OnlookerWong commented Oct 1, 2023

mikeleatila commented Oct 30, 2023

Mr-IT007 commented Nov 22, 2023

deathxlent commented Sep 26, 2023 •

edited

Loading

WSR-wsr commented Sep 27, 2023 •

edited

Loading