Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: probability tensor contains either inf, nan or element < 0 #56

Open
pagyyuan opened this issue Sep 8, 2023 · 7 comments

Comments

@pagyyuan
Copy link

pagyyuan commented Sep 8, 2023

部署baichuan-13B-chat模型报错

@ywancit
Copy link

ywancit commented Sep 8, 2023

我是可以正常用原始模型,但是8bit量化报和你一样的错,https://github.com/baichuan-inc/Baichuan2/issues/48#issue-1885592066。
有解决方案了可以互相通知一下。

@bihui9968
Copy link

遇到了同样的问题,有解决方案了吗

@ChiQiuHong
Copy link

@pagyyuan @ywancit @bihui9968
model = AutoModelForCausalLM.from_pretrained(quant8_saved_dir, load_in_8bit=True, device_map="auto", trust_remote_code=True)
问题应该出自device_map="auto"这里,我的机器内存和GPU显存都较小,因此它会将模型分别加载到cpu和cuda里,接下来保存的时候,就没有把全部权重都保存下来(保存下来的'pytorch_model'只有8GB左右,这并不是一个正常的大小)。
我将把模型全部加载到cpu里:device_map="cpu",保存下来的’pytorch_model‘大小有13.9GB,推理正常没有报错。

@pagyyuan
Copy link
Author

@ChiQiuHong 我将device_map由"auto"改为"cpu"后会报新的错误:ValueError: If passing a string for device_map, please choose 'auto', 'balanced', 'balanced_low_0' or 'sequential'.

@pagyyuan
Copy link
Author

我将报错的地方打印出来:有一个tensor是这样的:tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
dtype=torch.float16)

@baolixiong
Copy link

请问解决了吗

@qiu404
Copy link

qiu404 commented Nov 24, 2023

#291 可以看下这个能不能解你的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants