How to implement CLBLAST ? #1433

LeLaboDuGame · 2023-05-13T16:18:48Z

Hey ! I want to implement CLBLAST to use llama.cpp with my AMD GPU but I dont how to do it !
Can you explain me ? How to use it with llama-cpp-python ?
PS: I'm on windows ... My linux is bad...
Thanks in advance !
Labo

SlyEcho · 2023-05-13T20:16:42Z

It should be possible.

It needs to have a file llama.dll somewhere in its package. This is not built for the normal Windows releases but it could be enabled with the CMake variable BUILD_SHARED_LIBS=ON.

The commands you can see in build.yml.

Sorry, it may be a little hard to understand this stuff if you are not a developer.

LeLaboDuGame · 2023-05-15T20:30:45Z

Thanks ! What is llama.dll ? and BUILD_Shared_LIBS=on not work (llama.dll dosnt appear).

SlyEcho · 2023-05-15T22:06:36Z

llama-cpp-python needs a library form of llama.cpp which on windows would be in a file called llama.dll or maybe libllama.dll. It must exist somewhere in the directory structure of where you installed llama-cpp-python. When you build llama.cpp on Windows with CMake you can give it the option -DBUILD_SHARED_LIBS=ON and this file will be built, if you add -DLLAMA_CLBLAST=ON then it will build this file with CLBlast support. Then overwrite the old .dll with the new one and add the clblast.dll file too. Something like that.

But you could ask the lllama_cpp_python maintainers to do this.

LeLaboDuGame · 2023-05-16T16:22:17Z

Thanks !
okay and how I update llama-cpp-python ? I have to rebuild the project with env var -DBUILD_SHARED_LIBS=ON -DLLAMA_CLBLAST=ON ??? And how ?
Also in my packet I have just llama.dll and not th llama.cpp dir with the CMAKELists.txt so I dont know if I have to import them

LeLaboDuGame · 2023-05-17T15:25:37Z

Ok !!! After long and long time finaly get the llama.dll (very hard a lot of error, it's not very simple ...)
So now I'm stuck here I have this error:

llama.cpp: loading model from D:\ia\ia\ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  90.75 KB
llama_model_load_internal: mem required  = 11359.05 MB (+ 1608.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: w�U Device: ��(�
OpenCL clCreateContext error -33 at D:\ia\ia\llama.cpp\ggml-opencl.c:213 #The error is here !

what I have to do ?

SlyEcho · 2023-05-21T21:58:57Z

Using Platform: w�U Device: ��(�

This doesn't look good.

Recently the OpenCL device selection logic changed, maybe it is better for you now? Just in case, you should also try the version from the Releases page.

SlyEcho · 2023-05-26T07:28:57Z

There are instructions on llama-cpp-python on how to install it with CUDA or CLBlast: https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast

LeLaboDuGame · 2023-05-26T07:37:48Z

I know how to install clblasy it’s okay thanks 😄
I try to reinstall ggml file because I have a bug with him

kelteseth · 2023-06-13T15:29:26Z

Hi, I have build the latest llama.cpp with opencl on Windows 11 with my Vega VII. It does say it uses my gpu in the output, but actually uses my cpu for all calculations

PS C:\Code\cpp\llama.cpp\build\MSVC_release_clblast\bin> .\main.exe  -m "C:\Users\Eli\Downloads\wizardLM-13B-Uncensored.ggmlv3.q8_0.bin" -p "short introduction for a 4 person d&d session" -n 256 --repeat_penalty 1.0 --color -i -r "User:"
main: build = 638 (32a5f3a)
main: seed  = 1686669891
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx906'
ggml_opencl: device FP16 support: true
llama.cpp: loading model from C:\Users\Eli\Downloads\wizardLM-13B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 15237.96 MB (+ 1608.00 MB per state)
ggml_opencl: offloading 0 layers to GPU
ggml_opencl: total VRAM used: 0 MB
.
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 64, repeat_penalty = 1.000000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 short introduction for a 4 person d&d session

my settings:

$env:GGML_OPENCL_PLATFORM="AMD Accelerated Parallel Processing"
$env:GGML_OPENCL_DEVICE="0"

any ideas?

SlyEcho · 2023-06-13T19:15:27Z

You are not loading the model to the GPU (-ngl flag), so it will generate on the CPU.
You are using 16 CPU threads, which may be a little too much
Task Manager is not showing the GPU compute, it's only showing 3D, copy and video in your screenshot.

github-actions · 2024-04-09T01:09:19Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement CLBLAST ? #1433

How to implement CLBLAST ? #1433

LeLaboDuGame commented May 13, 2023 •

edited

Loading

SlyEcho commented May 13, 2023

LeLaboDuGame commented May 15, 2023 •

edited

Loading

SlyEcho commented May 15, 2023

LeLaboDuGame commented May 16, 2023 •

edited

Loading

LeLaboDuGame commented May 17, 2023

SlyEcho commented May 21, 2023

SlyEcho commented May 26, 2023

LeLaboDuGame commented May 26, 2023 •

edited

Loading

kelteseth commented Jun 13, 2023

SlyEcho commented Jun 13, 2023

github-actions bot commented Apr 9, 2024

How to implement CLBLAST ? #1433

How to implement CLBLAST ? #1433

Comments

LeLaboDuGame commented May 13, 2023 • edited Loading

SlyEcho commented May 13, 2023

LeLaboDuGame commented May 15, 2023 • edited Loading

SlyEcho commented May 15, 2023

LeLaboDuGame commented May 16, 2023 • edited Loading

LeLaboDuGame commented May 17, 2023

SlyEcho commented May 21, 2023

SlyEcho commented May 26, 2023

LeLaboDuGame commented May 26, 2023 • edited Loading

kelteseth commented Jun 13, 2023

SlyEcho commented Jun 13, 2023

github-actions bot commented Apr 9, 2024

LeLaboDuGame commented May 13, 2023 •

edited

Loading

LeLaboDuGame commented May 15, 2023 •

edited

Loading

LeLaboDuGame commented May 16, 2023 •

edited

Loading

LeLaboDuGame commented May 26, 2023 •

edited

Loading