This repository has been archived by the owner on Dec 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 49
Support --gpu-layers
#45
Comments
一样的问题 #55 |
合并到llama.cpp 已解决 |
你好,怎么合并的呢,我使用llama.cpp推理qwen输出总是会出现问题,该仓库又不知道怎么使用多gpu |
@cl886699 出现什么问题? 一般显存不够才需要外部传入一个gpu-layers值,使用这个参数是针对GPU情况而言的,需要把llama.cpp编译成支持GPU的库 |
我用的1954版本的llama.cpp,用convert-hf-to-gguf.py转换的模型,推理时出现这样的问题,Yi模型就能正常推理。 |
直接在hf上下载已经转换好的模型吧,你这个中间环节太多了,没法确定是啥问题,我目测是你编译llama.cpp的时候没带上 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
问题:
可以和llama.cpp一样设定
--gpu-layers
的值,这样可以运行在显存不够的设备上The text was updated successfully, but these errors were encountered: