llama-server not using GPU #1826

RakshitAralimatti · 2024-11-09T04:23:37Z

After I install llama-cpp-python-server with cuda support and run
python3 -m llama_cpp.server --model starcoderbase-3b/starcoderbase-3b.Q4_K_M.gguf --n_gpu_layers 10
The GPU is not getting used its running on the CPU

The text was updated successfully, but these errors were encountered:

pepijndevos · 2025-01-08T06:13:35Z

I'm seeing the same problem with Vulkan.

I see all layers of the model actually getting loaded on the GPU, and nvtop shows significant memory use.
But then htop shows it's using 100% CPU and only a small blip of GPU.

llama.cpp itself works fine on the same hardware.

pepijndevos · 2025-01-08T09:36:33Z

Wait maybe it's actually using the GPU but just insanely bottlenecked on Python CPU performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-server not using GPU #1826

llama-server not using GPU #1826

RakshitAralimatti commented Nov 9, 2024 •

edited

Loading

pepijndevos commented Jan 8, 2025

pepijndevos commented Jan 8, 2025

llama-server not using GPU #1826

llama-server not using GPU #1826

Comments

RakshitAralimatti commented Nov 9, 2024 • edited Loading

pepijndevos commented Jan 8, 2025

pepijndevos commented Jan 8, 2025

RakshitAralimatti commented Nov 9, 2024 •

edited

Loading