Using CLBlast to call GPU on Android device, what is the relationship between ngl parameters and model output correctness? #6562

qtyandhasee · 2024-04-09T13:13:38Z

with the help of issue2169 I use CLBlast on my qualcomm equipment (Adreno740v2) successfully call the GPU calls

But I found an interesting thing when I tried to do model reasoning, when I used the model stories260K.gguf, the model returned normal for questions and answers, but the GPU was hardly called (showing a call rate of 1% or even 0%).

For the model llama- 2-7B-Chatt.q4_k_M.GGUf and llama- 2-7B-Chat.q5_k_S.GGUf, I could get the output, but the output result was not correct. At this time, the GPU call rate is about 40%.

For the model LLAMA-2-13b-chat.q2_K. gguf and LLAMA-2-7b-chat.q2_K. gguf, I got normal and satisfactory responses when the ngl parameter was set to 2, but when I set the ngl parameter to close to all the GPU parameters that can be unloaded, For example, 40/41, the output of the model is back to random output. Of course, the GPU call rate is displayed at around 50%.
when ngl is 2 or 10 (not very large)

when set ngl as 40(40/41),the answer is rediculious

the cmd i use to run is as follows
GGML_OPENCL_PLATFORM=0 GGML_OPENCL_DEVICE=0 ./bin/main -t 8 -m /data/local/tmp/llama_cpu/llama-2-7b-chat.Q4_K_M.gguf --color -c 2048 -ngl 2 --temp 0.7 -n -1 -i -ins
i didn't change other parameters but ngl and model

This looks interesting, and I wonder if it's because CLBlast is making some kind of error in the GPU call?
Has anyone else found themselves in my situation? I want to know which direction I should take to eliminate this mistake.

Jeximo · 2024-04-09T16:16:02Z

Unfortunately, OpenCL for Android under-performs, and yes, even the output is incorrect: likely a memory alignment/padding issue

You'll likely see wild results if you run the perplexity tool with CLBlast on Android.

Related: CLBlast is more of a OpenCL library than an actual backend

qtyandhasee · 2024-04-10T06:18:03Z

Unfortunately, OpenCL for Android under-performs, and yes, even the output is incorrect: likely a memory alignment/padding issue

You'll likely see wild results if you run the perplexity tool with CLBlast on Android.

Related: CLBlast is more of a OpenCL library than an actual backend

@Jeximo Thank you very much for your answer. Can we simply understand that the support of llama.cpp for GPU call on SoC is not perfect at present? Or is it because none of SoC's OpenCL driver support currently supports LLM-like reasoning?

Jeximo · 2024-04-10T07:09:35Z

support of llama.cpp for GPU call on SoC is not perfect at present?

Yes, it's imperfect.

Or is it because none of SoC's OpenCL driver support currently supports LLM-like reasoning?

Yes, OpenCL for Android is bugged, and no one is currently developing it. Vulkan is better developed, but it's not optimized for Android as it also produces bad output.

To put it simply, there's a lot of progress to make for LLM and GPU Android.

github-actions · 2024-05-25T01:06:42Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

qtyandhasee added the bug-unconfirmed label Apr 9, 2024

JunkFood02 mentioned this issue Apr 17, 2024

whisper.android: How to build with CLBlast ggerganov/whisper.cpp#1809

Merged

Jeximo mentioned this issue Apr 30, 2024

illegal instruction and crash when run llama-bench (build on android device not cross platform compilation )on android #6995

Closed

github-actions bot added the stale label May 11, 2024

github-actions bot closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using CLBlast to call GPU on Android device, what is the relationship between ngl parameters and model output correctness? #6562

Using CLBlast to call GPU on Android device, what is the relationship between ngl parameters and model output correctness? #6562

qtyandhasee commented Apr 9, 2024 •

edited

Loading

Jeximo commented Apr 9, 2024

qtyandhasee commented Apr 10, 2024 •

edited

Loading

Jeximo commented Apr 10, 2024 •

edited

Loading

github-actions bot commented May 25, 2024

Using CLBlast to call GPU on Android device, what is the relationship between ngl parameters and model output correctness? #6562

Using CLBlast to call GPU on Android device, what is the relationship between ngl parameters and model output correctness? #6562

Comments

qtyandhasee commented Apr 9, 2024 • edited Loading

Jeximo commented Apr 9, 2024

qtyandhasee commented Apr 10, 2024 • edited Loading

Jeximo commented Apr 10, 2024 • edited Loading

github-actions bot commented May 25, 2024

qtyandhasee commented Apr 9, 2024 •

edited

Loading

qtyandhasee commented Apr 10, 2024 •

edited

Loading

Jeximo commented Apr 10, 2024 •

edited

Loading