CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

youngallien · 2023-12-08T13:35:32Z

root@cs:/home# ./qwen.cpp/build/bin/main -m qwen72b-ggml.bin --tiktoken qwen-72b-raw/qwen.tiktoken -i
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0
Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory
current device: 0

以上是报错信息，运行量化后的72b模型，不到40G的模型文件。一张卡80G不够，然后用两张卡，另外一个卡还没利用上就报错了。
有没有大佬指点一下？

lindeer mentioned this issue Jan 3, 2024

Support --gpu-layers #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

youngallien commented Dec 8, 2023

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

Comments

youngallien commented Dec 8, 2023