Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[suggestion] llama.cpp CLBlast support. #37

Closed
MikeLP opened this issue Dec 14, 2023 · 6 comments
Closed

[suggestion] llama.cpp CLBlast support. #37

MikeLP opened this issue Dec 14, 2023 · 6 comments

Comments

@MikeLP
Copy link

MikeLP commented Dec 14, 2023

Is it possible to build llama.cpp (I believe it's your binary freechatserver) with CLBlast support?
It supposed to work great on CPU and gives great acceleration for regular Macs!

In case you don't want to change anything, could you please provide instruction how to make/build the freechatserver binary to replace it with my version of llama.cpp

@psugihara
Copy link
Owner

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs. Is that what turns on CLBlast support?

This issue has more details: ggerganov/llama.cpp#3438

If you want to try building locally without that flag, you can pull llama.cpp, run make, then replace freechat/mac/FreeChat/Models/NPC/freechat-server with the produced server binary (I turn that into a universal x86/arm64 binary with lipo but you can just rename server to freechat-server if you just want the one architecture to work). You will also need to copy ggml-metal.metal to the same directory.

Let me know if you see perf gains. For me I didn't see any changes in time to response or tokens/second with the LLAMA_NO_ACCELERATE flag (I'm on m1 pro with 64GB RAM).

@sussyboiiii
Copy link

I build llama.cpp with the LLAMA_NO_ACCELERATE=1 flag because otherwise apple rejects the app in review with a report of using private APIs.

I'm not experienced with this but couldn't you just release the app via github and the appstore so you gan get updates earlier on github and use features apple doesn't want?

@psugihara
Copy link
Owner

Technically I could, but I ideally don't want to support 2 versions of the app and I did not see any performance improvements without that flag. But please let me know if your experience differs and I'll re-asses the trade-off. There are good reasons for them not to allow access to private APIs (OS patches could break the app).

@sussyboiiii
Copy link

Fair enough.

@MikeLP
Copy link
Author

MikeLP commented Dec 15, 2023

@psugihara I appreciate your response, I will let you know if it works. I just was concerned that the size of the server and the size of the binary I've built are pretty different.

@psugihara
Copy link
Owner

yep, should be about half the size of you're just building for one architecture. The lipo command I use just glues 2 together.

@MikeLP MikeLP closed this as completed Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants