Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

Open
zcxo opened this issue Sep 5, 2024 · 3 comments

Comments

@zcxo
Copy link

zcxo commented Sep 5, 2024

Dear ARM-software :

I am excited and pleasantly surprised to see that ARM has officially launched the KleidiAI solution for inference! I have tried this solution and it has indeed greatly accelerated both the prompts and generation stages. However, I have encountered an issue. Currently, the patch you provide is for the 2024 June llama.cpp version. Do you have a patch plan for the latest version of llama.cpp?
The following are the errors I have encountered while trying to adapt:

0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch

Cmdline: com.algorithm.example
pid: 15209, tid: 15209, name: binder:15122_2 >>> com.algorithm.example <<<
#1 pc 00000000001b4e8c /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (ggml_kai_compute_forward+1968) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#2 pc 0000000000172b64 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#3 pc 00000000001729f4 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
#4 pc 00000000000c40c8 /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libomp.so (offset 0x321e000) (__kmp_invoke_microtask+152) (BuildId: 420baf65ff745db7ffc43e2d9942756a154fceae)
report fatal app NE crash BR success to write SystemTombstoneOccured to statsd: 15209, com.algorithm.example
buildException pid = 15209 uid = 10157 packageName = com.algorithm.example processName = com.algorithm.example reportConfig = false
buildReporterName pkgName = com.algorithm.example processName = com.algorithm.example
Log base dir: /data/misc/ems_logs/APP_NE@15209_com.algorithm.example_2024-08-22-23-57-22.972/FATAL_2024-08-22-23-57-22
reportFatalNEInner isApp=true, pid=15209, uid=10157, processName=com.algorithm.example, packageName=com.algorithm.example, tombstone=/data/tombstones/tombstone_35
report success reportTombstoneFile path=/data/tombstones/tombstone_35 processname=com.algorithm.example

Hope for your reply, thank you very much!

@zcxo zcxo changed the title merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to lastest llama.cpp FAIL merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL Sep 5, 2024
@zcxo
Copy link
Author

zcxo commented Sep 10, 2024

I have adapted your patch to the llama.cpp version on September 5th, but there aren't too many changes needed. Currently, the only issue left is __memcpy_aarch64_stimd, which seems to be some compatibility issues. I don't know if it's a compilation problem or something else:

#00 pc 0000000000054438 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy_aarch64_simd+248) (BuildId: cdb09e5d494726046776ac6d0238c81f)
09-09 18:49:02.670 F/DEBUG (20317): #1 pc 00000000000b57fc /system/lib64/libggml.so (ggml_kai_prepare_const_data+484) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)
09-09 18:49:02.670 F/DEBUG (20317): #2 pc 000000000003f3d8 /system/lib64/libggml.so (ggml_graph_compute+120) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)

The error code is located in ggml-kleidiai.cpp:
memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));

#if defined(GGML_KLEIDIAI_REUSE_MEMORY)
GGML_ASSERT(reshaped_data_sz <= original_data_size);
memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));
free(reshaped_data);
cur->extra = cur->data;
#else
g_extra_mem[g_extra_mem_idx++] = reshaped_data;
cur->extra = reshaped_data;
#endif



@kshitij-sisodia-arm
Copy link
Collaborator

Hi @zcxo ,

Thanks for bringing this to our attention. Glad to know that you have found this useful 👍. This patch was created to demonstrate a possible integration point for KleidiAI in llama.cpp. We will work separately with llama.cpp to provide a proper solution.

@zcxo
Copy link
Author

zcxo commented Sep 25, 2024

Dear ARM-software :
and @kshitij-sisodia-arm
Although I waited a bit long, I am still glad to receive your reply. I have tried to adapt and made modifications based on my own understanding of the issues encountered during the process. Currently, I plan to apply them to the formal project. But currently, I am still concerned about whether there may be other impacts, so I hope the official can release a version as soon as possible to help us developers.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants