Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在推理时,chat函数中加入temperature参数,模型就报错 #153

Closed
2 tasks done
zhihao-chen opened this issue Aug 10, 2023 · 10 comments
Closed
2 tasks done
Assignees

Comments

@zhihao-chen
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

当指定temperature参数时就报错,取消了就不报错。

File "/home/aiteam/work2/chenzhihao/kefu_dialogue/examples/qwen_interact.py", line 70, in chatbot
response, history = model.chat(tokenizer, query, history=history, system=SYSTEM_PROMPT, top_p=0.75, temperature=0.3)
File "/home/aiteam/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1010, in chat
outputs = self.generate(
File "/home/aiteam/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1120, in generate
return super().generate(
File "/home/aiteam/anaconda3/envs/llm-py10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/aiteam/work2/chenzhihao/transformers/src/transformers/generation/utils.py", line 1615, in generate
return self.sample(
File "/home/aiteam/work2/chenzhihao/transformers/src/transformers/generation/utils.py", line 2773, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

aleimu added a commit to aleimu/langchain-ChatGLM that referenced this issue Aug 10, 2023
@fyabc
Copy link
Contributor

fyabc commented Aug 10, 2023

您好 @zhihao-chen ,我们这边没能复现您的问题,可以提供更详细的复现代码吗?

@zhihao-chen
Copy link
Author

from transformers.generation import GenerationConfig
from transformers import AutoModelForCausalLM,AutoTokenizer
import torch
import os
import platform

os_name = platform.system()
clear_command = 'cls' if os_name == 'Windows' else 'clear'
stop_stream = False

def signal_handler():
global stop_stream
stop_stream = True

device = torch.device("cuda:6")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat",
torch_dtype=torch.float16,
trust_remote_code=True,
low_cpu_mem_usage=True, device_map=device)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", use_fast=False, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model.eval()

history = []
while True:
    query = input("用户:")
    if not query:
        continue
    if query == 'stop':
        break
    if query == "clear":
        history = []
        os.system(clear_command)
        continue
    response, history = model.chat(tokenizer, query, history=history, system="",
                                   top_p=0.75, temperature=0.3)
    print("AI:", response)

@fyabc
Copy link
Contributor

fyabc commented Aug 10, 2023

from transformers.generation import GenerationConfig from transformers import AutoModelForCausalLM,AutoTokenizer import torch import os import platform

os_name = platform.system() clear_command = 'cls' if os_name == 'Windows' else 'clear' stop_stream = False

def signal_handler(): global stop_stream stop_stream = True

device = torch.device("cuda:6") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", torch_dtype=torch.float16, trust_remote_code=True, low_cpu_mem_usage=True, device_map=device) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", use_fast=False, trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) model.eval()

history = []
while True:
    query = input("用户:")
    if not query:
        continue
    if query == 'stop':
        break
    if query == "clear":
        history = []
        os.system(clear_command)
        continue
    response, history = model.chat(tokenizer, query, history=history, system="",
                                   top_p=0.75, temperature=0.3)
    print("AI:", response)

您好,可以提供下报错的prompt吗?

@zhihao-chen
Copy link
Author

我就输入了“你好”就报错了

@zhihao-chen
Copy link
Author

没有使用flash-att,是在V100上

@fyabc
Copy link
Contributor

fyabc commented Aug 10, 2023

可以提供一下Python,PyTorch,transformers,cuda等版本信息吗?

另外,根据 这个issue ,给model.chat()显式传入do_sample=False能否解决此问题呢?

@zhihao-chen
Copy link
Author

do_sample=False可以解决,但这不就是greedy search了吗,也用不着temperature参数了。
python= 3.10
pytorch = 2.0.1
transformers = 4.32.0.dev0
cuda 11.7
GPU V100

@fyabc
Copy link
Contributor

fyabc commented Aug 11, 2023

do_sample=False可以解决,但这不就是greedy search了吗,也用不着temperature参数了。 python= 3.10 pytorch = 2.0.1 transformers = 4.32.0.dev0 cuda 11.7 GPU V100

这边目前没法复现这个问题,可以麻烦您把transformers版本改为4.31.0再试一下吗?

@pei55
Copy link

pei55 commented Aug 21, 2023

我也碰到了相同错误,3090的显卡,transformers版本为4.31.0

@JustinLin610
Copy link
Member

已经有较多相关issue。如果是量化模型,建议不要调整温度,通过topp调确定性。如果是非量化模型,建议还是使用bf16而非fp16,并且调温度也尽量不要调过小,容易有精度溢出的问题,fp16尤其容易出现。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants