Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

Closed
2 tasks done
deathxlent opened this issue Sep 26, 2023 · 9 comments
Closed
2 tasks done

[BUG] Qwen-14B-Chat-Int4 temperature有问题 #358

deathxlent opened this issue Sep 26, 2023 · 9 comments

Comments

@deathxlent
Copy link

deathxlent commented Sep 26, 2023

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

qwen/Qwen-14B-Chat-Int4 temperature有问题
temperature必须设置大于等于0.51才能使用,0.5及以下都会报:RuntimeError: probability tensor contains either inf, nan or element < 0

7B没有这种问题
已经换了三台不同的服务器,各种版本都有,都试过了,都是7B正常,14B-Int4异常报错,因为只有3090,没试过14B是否有问题

期望行为 | Expected Behavior

temperature=0.01时也能正常使用

复现方法 | Steps To Reproduce

通过修改generation_config.json 或者直接代码中配置都是一个效果,小于0.5及以下会报错
response, history = model.chat(tokenizer, prompt,
history=history, top_p=1,
temperature=0.01)

{ history = json.loads(history_steneration_config.json
"chat_format": "chatml",
"eos_token_id": 151643,
"pad_token_id": 151643,
"max_window_size": 6144,
"max_new_tokens": 512,
"do_sample": true,
"top_k": 0,
"top_p": 1,
"temperature":0.51,
"transformers_version": "4.31.0"
}

运行环境 | Environment

- OS:Ubuntu18.04、Ubuntu20.04.6
- Python:3.10.12、3.9.6
- Transformers:4.33.2、4.33.1
- PyTorch:2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7,11.8

备注 | Anything else?

python load代码

import json

import torch
from fastapi import FastAPI, Body
from modelscope import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()
MODEL_NAME = './qwen/Qwen-14B-Chat-Int4/'
MODEL = None
TOKENIZER = None
inited=False


def load_module():
    global MODEL
    global TOKENIZER
    global inited
    if inited==False:
        model_name = MODEL_NAME
        print("Loading model... ")
        TOKENIZER = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
        MODEL = AutoModelForCausalLM.from_pretrained(model_name,
                                                     trust_remote_code=True,
                                                     device_map="auto")
        MODEL = MODEL.eval()
        inited=True
    return MODEL, TOKENIZER


def model_generate(prompt, history, model, tokenizer):
    response, history = model.chat(tokenizer, prompt,
                                   history=history, top_p=1,)
                                   #temperature=0.02)  # response: model response; histoty: recore the query and response
    return response, history
@deathxlent deathxlent changed the title [BUG] <title> [BUG] Qwen-14B-Chat-Int4 temperature有问题 Sep 26, 2023
@gaodianzhuo
Copy link

你真速度,同样的问题

@tianjiqx
Copy link

我似乎没有遇到这个问题,用 fastchat 加载的话

@bsd20107
Copy link

same question.

@WSR-wsr
Copy link

WSR-wsr commented Sep 27, 2023

我用个nf4进行量化,这个int4版也用了,是同样的问题
目前我的解决方法是:使用流式输出
这里是遇到结束标识符才报错的,所以使用流式输出,在报错时加入异常处理就可以了
这是我目前的解决办法,在nf4量化是可以的,gptq的量化可以试试

@JustinLin610
Copy link
Member

之前7B int4也有反馈这个问题,如果要调确定性还是建议调topp,temperature容易出现这类溢出的问题,尤其int4模型。目前没找到比较好的解法

@rufeng-h
Copy link

之前有个issue专门讨论了这个问题,temperature过低导致softmax溢出,我在A6000,bf16没问题,fp16在temperature小于0.5后报错

@OnlookerWong
Copy link

同样的问题,只是我没有发现问题出在温度上,还闷头找原因呢。。。

@mikeleatila
Copy link

@JustinLin610 @deathxlent @OnlookerWong I am using Qwen-14B-Chat-Int4 and in the inference I am also getting: "RuntimeError: probability tensor contains either inf, nan or element < 0" I have used different temperature, top_p values in addition to do_sample but with the same result. Has this been fixed and how? Many thanks

@Mr-IT007
Copy link

我似乎没有遇到这个问题,用 fastchat 加载的话

fastchat可以直接部署int4模型吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants