Bug: llama cpp server arg LLAMA_ARG_N_GPU_LAYERS doesn't follow the same convention as llama cpp python n_gpu_layers #9556
Labels
bug-unconfirmed
low severity
Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
What happened?
If creating a llama model in python code, you can specific n_gpu_layers=-1 so that all layers are offloaded to GPU. (see below example) When starting llama cpp server using the docker image, setting LLAMA_ARG_N_GPU_LAYERS: -1 doesn't have the same functionality.
Name and Version
From the prebuilt docker image ghcr.io/ggerganov/llama.cpp:server-cuda@sha256:fe887bd3debd1a55ddd95f067435a38166f15a058bf50fee173517b9831081c8
version: 0 (unknown)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: