Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>72B-int4 提示 probability tensor contains either inf, nan or element < 0 #857

Closed
2 tasks done
taochangda opened this issue Dec 24, 2023 · 10 comments
Closed
2 tasks done
Labels

Comments

@taochangda
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

72B-int4 提示下面的问题,怎么解决?

tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-72B-Chat-Int4", revision='master', trust_remote_code=True)
2023-12-25 01:33:38,197 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2023-12-25 01:33:38,197 - modelscope - INFO - Use user-specified model revision: master
response, history = model.chat(tokenizer, "HI", history=None)
Traceback (most recent call last):
File "", line 1, in
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-72B-Chat-Int4/modeling_qwen.py", line 1139, in chat
outputs = self.generate(
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-72B-Chat-Int4/modeling_qwen.py", line 1261, in generate
return super().generate(
File "/usr/local/python3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/python3/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
File "/usr/local/python3/lib/python3.10/site-packages/transformers/generation/utils.py", line 2897, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:centos7.6
- Python:3.10.2
- Transformers:transformers>=4.32.0
- PyTorch:2.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8

备注 | Anything else?

No response

@JingyiChang
Copy link

同样的问题,你试过单卡推理吗?int4单卡48G可以带起来。我是只要多卡推理就会出现这个问题,单卡就正常。

@taochangda
Copy link
Author

同样的问题,你试过单卡推理吗?int4单卡48G可以带起来。我是只要多卡推理就会出现这个问题,单卡就正常。

没试过单卡

@onionknightdd
Copy link

同样的问题

@jklj077
Copy link
Contributor

jklj077 commented Dec 29, 2023

@taochangda 请先检查下autogptq安装是否正确哈,cu118的不能pip install auto-gptq哈,请看autogptq官方的说明。

@jklj077
Copy link
Contributor

jklj077 commented Dec 29, 2023

@onionknightdd @taochangda 两位可以报下卡型、卡数量吗?如果方便的话,可以说明是自己的服务器、还是哪个平台租的服务器吗?

@chopin1998
Copy link

chopin1998 commented Jan 2, 2024

似乎报一样的问题, 目前可以单卡推理,

但是多卡的话, 无论是docker环境,还是native环境, 加载模型都可以, 一旦实际chat, 就会出问题

4x3090

@xfcoms
Copy link

xfcoms commented Jan 4, 2024

同样问题,自己服务器运行72B,用6张v100S报和楼主一样的错误,乱码报错;相同的软件环境用单张4090+大内存,chat一句话成功了。魔塔还是huggingface的都下载试过了,而且文件做了校验是对的,都是一样的问题,是不是对老显卡的多卡支持有问题? 现在我又在6*v100s试过了千问14B,device_map=cuda:0 可以正常运行,设置为auto就会复现错误。

@jklj077
Copy link
Contributor

jklj077 commented Jan 4, 2024

@xfcoms 单卡okay,多卡出现问题的,请到#848

@yuunnn-w
Copy link

可以试试把温度调到0,只调整top p即可。温度高了就会有这个问题。

Copy link

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants