Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

Open
youngallien opened this issue Dec 8, 2023 · 0 comments

Comments

@youngallien
Copy link

root@cs:/home# ./qwen.cpp/build/bin/main -m qwen72b-ggml.bin --tiktoken qwen-72b-raw/qwen.tiktoken -i
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0
Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory
current device: 0


以上是报错信息,运行量化后的72b模型,不到40G的模型文件。一张卡80G不够,然后用两张卡,另外一个卡还没利用上就报错了。
有没有大佬指点一下?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant