Bug: [SYCL] Error loading models larger than Q4 #9472
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
stale
What happened?
After building the SYCL server image, trying to load a model larger than Q4 on my Arc A770 fails with a memory error.
Anything below Q4 will execute, but this is due to the "llm_load_tensors: SYCL0 buffer size" being below ~4200MiB.
The Arc A770 has 16GB of VRAM, so should be perfectly capable of loading much higher buffer values into its VRAM.
Looking for information on this. Thanks!
Name and Version
Relevant docker run command used:
docker run -it --rm -p 11434:11434 -v /mnt/user/models/model-files:/app/models --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 -e OverrideGpuAddressSpace=48 -e NEOReadDebugKeys=1 llama-server-cpp-intel -m /app/models/Meta-Llama-3.1-8B-Instruct-Q5_K_L.gguf -n 2048 -e -ngl 33 --port 11434
What operating system are you seeing the problem on?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: