Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement CLBLAST ? #1433

Closed
LeLaboDuGame opened this issue May 13, 2023 · 11 comments
Closed

How to implement CLBLAST ? #1433

LeLaboDuGame opened this issue May 13, 2023 · 11 comments
Labels

Comments

@LeLaboDuGame
Copy link

LeLaboDuGame commented May 13, 2023

Hey ! I want to implement CLBLAST to use llama.cpp with my AMD GPU but I dont how to do it !
Can you explain me ? How to use it with llama-cpp-python ?
PS: I'm on windows ... My linux is bad...
Thanks in advance !
Labo

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 13, 2023

It should be possible.

It needs to have a file llama.dll somewhere in its package. This is not built for the normal Windows releases but it could be enabled with the CMake variable BUILD_SHARED_LIBS=ON.

The commands you can see in build.yml.

Sorry, it may be a little hard to understand this stuff if you are not a developer.

@LeLaboDuGame
Copy link
Author

LeLaboDuGame commented May 15, 2023

Thanks ! What is llama.dll ? and BUILD_Shared_LIBS=on not work (llama.dll dosnt appear).

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 15, 2023

llama-cpp-python needs a library form of llama.cpp which on windows would be in a file called llama.dll or maybe libllama.dll. It must exist somewhere in the directory structure of where you installed llama-cpp-python. When you build llama.cpp on Windows with CMake you can give it the option -DBUILD_SHARED_LIBS=ON and this file will be built, if you add -DLLAMA_CLBLAST=ON then it will build this file with CLBlast support. Then overwrite the old .dll with the new one and add the clblast.dll file too. Something like that.

But you could ask the lllama_cpp_python maintainers to do this.

@LeLaboDuGame
Copy link
Author

LeLaboDuGame commented May 16, 2023

Thanks !
okay and how I update llama-cpp-python ? I have to rebuild the project with env var -DBUILD_SHARED_LIBS=ON -DLLAMA_CLBLAST=ON ??? And how ?
Also in my packet I have just llama.dll and not th llama.cpp dir with the CMAKELists.txt so I dont know if I have to import them

@LeLaboDuGame
Copy link
Author

Ok !!! After long and long time finaly get the llama.dll (very hard a lot of error, it's not very simple ...)
So now I'm stuck here I have this error:

llama.cpp: loading model from D:\ia\ia\ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  90.75 KB
llama_model_load_internal: mem required  = 11359.05 MB (+ 1608.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: wUDevice: ��(�
OpenCL clCreateContext error -33 at D:\ia\ia\llama.cpp\ggml-opencl.c:213 #The error is here !

what I have to do ?

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 21, 2023

Using Platform: w�U Device: ��(�

This doesn't look good.

Recently the OpenCL device selection logic changed, maybe it is better for you now? Just in case, you should also try the version from the Releases page.

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 26, 2023

There are instructions on llama-cpp-python on how to install it with CUDA or CLBlast: https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast

@LeLaboDuGame
Copy link
Author

LeLaboDuGame commented May 26, 2023

I know how to install clblasy it’s okay thanks 😄
I try to reinstall ggml file because I have a bug with him

@kelteseth
Copy link

Hi, I have build the latest llama.cpp with opencl on Windows 11 with my Vega VII. It does say it uses my gpu in the output, but actually uses my cpu for all calculations

PS C:\Code\cpp\llama.cpp\build\MSVC_release_clblast\bin> .\main.exe  -m "C:\Users\Eli\Downloads\wizardLM-13B-Uncensored.ggmlv3.q8_0.bin" -p "short introduction for a 4 person d&d session" -n 256 --repeat_penalty 1.0 --color -i -r "User:"
main: build = 638 (32a5f3a)
main: seed  = 1686669891
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx906'
ggml_opencl: device FP16 support: true
llama.cpp: loading model from C:\Users\Eli\Downloads\wizardLM-13B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 15237.96 MB (+ 1608.00 MB per state)
ggml_opencl: offloading 0 layers to GPU
ggml_opencl: total VRAM used: 0 MB
.
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 64, repeat_penalty = 1.000000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 short introduction for a 4 person d&d session

my settings:

$env:GGML_OPENCL_PLATFORM="AMD Accelerated Parallel Processing"
$env:GGML_OPENCL_DEVICE="0"

image

any ideas?

@SlyEcho
Copy link
Collaborator

SlyEcho commented Jun 13, 2023

  • You are not loading the model to the GPU (-ngl flag), so it will generate on the CPU.
  • You are using 16 CPU threads, which may be a little too much
  • Task Manager is not showing the GPU compute, it's only showing 3D, copy and video in your screenshot.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants