You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After I install llama-cpp-python-server with cuda support and run python3 -m llama_cpp.server --model starcoderbase-3b/starcoderbase-3b.Q4_K_M.gguf --n_gpu_layers 10
The GPU is not getting used its running on the CPU
The text was updated successfully, but these errors were encountered:
I see all layers of the model actually getting loaded on the GPU, and nvtop shows significant memory use.
But then htop shows it's using 100% CPU and only a small blip of GPU.
After I install
llama-cpp-python-server
with cuda support and runpython3 -m llama_cpp.server --model starcoderbase-3b/starcoderbase-3b.Q4_K_M.gguf --n_gpu_layers 10
The GPU is not getting used its running on the CPU
The text was updated successfully, but these errors were encountered: