-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support --gpu-layers
#45
Comments
一样的问题 #55 |
合并到llama.cpp 已解决 |
你好,怎么合并的呢,我使用llama.cpp推理qwen输出总是会出现问题,该仓库又不知道怎么使用多gpu |
@cl886699 出现什么问题? 一般显存不够才需要外部传入一个gpu-layers值,使用这个参数是针对GPU情况而言的,需要把llama.cpp编译成支持GPU的库 |
我用的1954版本的llama.cpp,用convert-hf-to-gguf.py转换的模型,推理时出现这样的问题,Yi模型就能正常推理。 |
直接在hf上下载已经转换好的模型吧,你这个中间环节太多了,没法确定是啥问题,我目测是你编译llama.cpp的时候没带上 |
问题:
可以和llama.cpp一样设定
--gpu-layers
的值,这样可以运行在显存不够的设备上The text was updated successfully, but these errors were encountered: