merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

zcxo · 2024-09-05T10:37:57Z

Dear ARM-software :

I am excited and pleasantly surprised to see that ARM has officially launched the KleidiAI solution for inference! I have tried this solution and it has indeed greatly accelerated both the prompts and generation stages. However, I have encountered an issue. Currently, the patch you provide is for the 2024 June llama.cpp version. Do you have a patch plan for the latest version of llama.cpp?
The following are the errors I have encountered while trying to adapt:

0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch

Cmdline: com.algorithm.example
pid: 15209, tid: 15209, name: binder:15122_2 >>> com.algorithm.example <<<
#1 pc 00000000001b4e8c /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (ggml_kai_compute_forward+1968) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#2 pc 0000000000172b64 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#3 pc 00000000001729f4 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#4 pc 00000000000c40c8 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libomp.so (offset 0x321e000) (__kmp_invoke_microtask+152) (BuildId: 420baf65ff745db7ffc43e2d9942756a154fceae)
report fatal app NE crash BR success to write SystemTombstoneOccured to statsd: 15209, com.algorithm.example
buildException pid = 15209 uid = 10157 packageName = com.algorithm.example processName = com.algorithm.example reportConfig = false
buildReporterName pkgName = com.algorithm.example processName = com.algorithm.example
Log base dir: /data/misc/ems_logs/APP_NE@15209_com.algorithm.example_2024-08-22-23-57-22.972/FATAL_2024-08-22-23-57-22
reportFatalNEInner isApp=true, pid=15209, uid=10157, processName=com.algorithm.example, packageName=com.algorithm.example, tombstone=/data/tombstones/tombstone_35
report success reportTombstoneFile path=/data/tombstones/tombstone_35 processname=com.algorithm.example

Hope for your reply, thank you very much!

zcxo · 2024-09-10T01:41:23Z

I have adapted your patch to the llama.cpp version on September 5th, but there aren't too many changes needed. Currently, the only issue left is __memcpy_aarch64_stimd, which seems to be some compatibility issues. I don't know if it's a compilation problem or something else:

#00 pc 0000000000054438 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy_aarch64_simd+248) (BuildId: cdb09e5d494726046776ac6d0238c81f)
09-09 18:49:02.670 F/DEBUG (20317): #1 pc 00000000000b57fc /system/lib64/libggml.so (ggml_kai_prepare_const_data+484) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)
09-09 18:49:02.670 F/DEBUG (20317): #2 pc 000000000003f3d8 /system/lib64/libggml.so (ggml_graph_compute+120) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)

The error code is located in ggml-kleidiai.cpp:
memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));

#if defined(GGML_KLEIDIAI_REUSE_MEMORY)
GGML_ASSERT(reshaped_data_sz <= original_data_size);
memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));
free(reshaped_data);
cur->extra = cur->data;
#else
g_extra_mem[g_extra_mem_idx++] = reshaped_data;
cur->extra = reshaped_data;
#endif

kshitij-sisodia-arm · 2024-09-24T14:29:45Z

Hi @zcxo ,

Thanks for bringing this to our attention. Glad to know that you have found this useful 👍. This patch was created to demonstrate a possible integration point for KleidiAI in llama.cpp. We will work separately with llama.cpp to provide a proper solution.

zcxo · 2024-09-25T03:24:27Z

Dear ARM-software :
and @kshitij-sisodia-arm
Although I waited a bit long, I am still glad to receive your reply. I have tried to adapt and made modifications based on my own understanding of the issues encountered during the process. Currently, I plan to apply them to the formal project. But currently, I am still concerned about whether there may be other impacts, so I hope the official can release a version as soon as possible to help us developers.
Thank you.

zcxo changed the title ~~merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to lastest llama.cpp FAIL~~ merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

zcxo commented Sep 5, 2024 •

edited

Loading

zcxo commented Sep 10, 2024 •

edited

Loading

kshitij-sisodia-arm commented Sep 24, 2024

zcxo commented Sep 25, 2024 •

edited

Loading

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

Comments

zcxo commented Sep 5, 2024 • edited Loading

zcxo commented Sep 10, 2024 • edited Loading

kshitij-sisodia-arm commented Sep 24, 2024

zcxo commented Sep 25, 2024 • edited Loading

zcxo commented Sep 5, 2024 •

edited

Loading

zcxo commented Sep 10, 2024 •

edited

Loading

zcxo commented Sep 25, 2024 •

edited

Loading