[suggestion] llama.cpp CLBlast support. #37

MikeLP · 2023-12-14T02:18:27Z

Is it possible to build llama.cpp (I believe it's your binary freechatserver) with CLBlast support?
It supposed to work great on CPU and gives great acceleration for regular Macs!

In case you don't want to change anything, could you please provide instruction how to make/build the freechatserver binary to replace it with my version of llama.cpp

psugihara · 2023-12-14T20:09:07Z

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs. Is that what turns on CLBlast support?

This issue has more details: ggerganov/llama.cpp#3438

If you want to try building locally without that flag, you can pull llama.cpp, run make, then replace freechat/mac/FreeChat/Models/NPC/freechat-server with the produced server binary (I turn that into a universal x86/arm64 binary with lipo but you can just rename server to freechat-server if you just want the one architecture to work). You will also need to copy ggml-metal.metal to the same directory.

Let me know if you see perf gains. For me I didn't see any changes in time to response or tokens/second with the LLAMA_NO_ACCELERATE flag (I'm on m1 pro with 64GB RAM).

sussyboiiii · 2023-12-14T20:24:17Z

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs.

I'm not experienced with this but couldn't you just release the app via github and the appstore so you gan get updates earlier on github and use features apple doesn't want?

psugihara · 2023-12-14T20:28:17Z

Technically I could, but I ideally don't want to support 2 versions of the app and I did not see any performance improvements without that flag. But please let me know if your experience differs and I'll re-asses the trade-off. There are good reasons for them not to allow access to private APIs (OS patches could break the app).

sussyboiiii · 2023-12-14T20:32:24Z

Fair enough.

MikeLP · 2023-12-15T02:51:40Z

@psugihara I appreciate your response, I will let you know if it works. I just was concerned that the size of the server and the size of the binary I've built are pretty different.

psugihara · 2023-12-15T02:54:26Z

yep, should be about half the size of you're just building for one architecture. The lipo command I use just glues 2 together.

MikeLP closed this as completed Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[suggestion] llama.cpp CLBlast support. #37

[suggestion] llama.cpp CLBlast support. #37

MikeLP commented Dec 14, 2023 •

edited

Loading

psugihara commented Dec 14, 2023

sussyboiiii commented Dec 14, 2023

psugihara commented Dec 14, 2023

sussyboiiii commented Dec 14, 2023

MikeLP commented Dec 15, 2023

psugihara commented Dec 15, 2023

[suggestion] llama.cpp CLBlast support. #37

[suggestion] llama.cpp CLBlast support. #37

Comments

MikeLP commented Dec 14, 2023 • edited Loading

psugihara commented Dec 14, 2023

sussyboiiii commented Dec 14, 2023

psugihara commented Dec 14, 2023

sussyboiiii commented Dec 14, 2023

MikeLP commented Dec 15, 2023

psugihara commented Dec 15, 2023

MikeLP commented Dec 14, 2023 •

edited

Loading