CUDA acceleration doesn't seem to work #1445

megupta · 2023-05-14T04:30:16Z

I compiled the latest code in this repo with cuBLAS support as described in the README

It doesn't seem to be utilizing my 1070 although main is running in nvidia-smi

llama_model_load_internal: [cublas] offloading 0 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 0 MB

What am I missing here?

The text was updated successfully, but these errors were encountered:

FSSRepo · 2023-05-14T04:35:20Z

Add the option -ngl 10, for upload 10 layers to your gpu memory

Green-Sky closed this as completed May 14, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback