Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support --gpu-layers #45

Closed
lindeer opened this issue Nov 22, 2023 · 7 comments
Closed

Support --gpu-layers #45

lindeer opened this issue Nov 22, 2023 · 7 comments

Comments

@lindeer
Copy link

lindeer commented Nov 22, 2023

问题:

build/bin/main -m /app/ecr/models/qwen-7b-ggml/qwen7b-ggml.bin --tiktoken /app/ecr/models/qwen-7b-ggml/qwen.tiktoken -t 6 -i
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660, compute capability 7.5
Welcome to Qwen.cpp! Ask whatever you want. Type 'clear' to clear context. Type 'stop' to exit.

Prompt > 三国演义都有哪些人物? 

CUDA error 2 at /home/ecr/projects/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7310: out of memory
current device: 0

可以和llama.cpp一样设定--gpu-layers的值,这样可以运行在显存不够的设备上

@fann1993814
Copy link

@lindeer
不確定這個 PR 對你來說是否有用?因為這個只實驗在 Apple Metal 上面。
但也許你可以試試看? #41

@lindeer
Copy link
Author

lindeer commented Jan 3, 2024

一样的问题 #55

@lindeer
Copy link
Author

lindeer commented Jan 3, 2024

合并到llama.cpp 已解决

@lindeer lindeer closed this as completed Jan 3, 2024
@cl886699
Copy link

合并到llama.cpp 已解决

你好,怎么合并的呢,我使用llama.cpp推理qwen输出总是会出现问题,该仓库又不知道怎么使用多gpu

@lindeer
Copy link
Author

lindeer commented Jan 26, 2024

@cl886699 出现什么问题? 一般显存不够才需要外部传入一个gpu-layers值,使用这个参数是针对GPU情况而言的,需要把llama.cpp编译成支持GPU的库

@cl886699
Copy link

@cl886699 出现什么问题? 一般显存不够才需要外部传入一个gpu-layers值,使用这个参数是针对GPU情况而言的,需要把llama.cpp编译成支持GPU的库

我用的1954版本的llama.cpp,用convert-hf-to-gguf.py转换的模型,推理时出现这样的问题,Yi模型就能正常推理。
image

@lindeer
Copy link
Author

lindeer commented Feb 2, 2024

直接在hf上下载已经转换好的模型吧,你这个中间环节太多了,没法确定是啥问题,我目测是你编译llama.cpp的时候没带上LLAMA_CUBLAS=on,只有带上这个编译选项才能编出支持GPU运行的二进制

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants