-
Notifications
You must be signed in to change notification settings - Fork 11.5k
ggml : move LLAMAFILE/tinyBLAS into a backend #10183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I'll have a try to put it in a new backend... It as not the standart sgemm API, nor support thread inside. |
It may be better to keep it in the CPU backend to avoid the overhead of stopping and starting the threads that happens when switching to a different backend. |
I'll have a look but I don't think that thread have to be (is?) started/stopped. It can be leave in thread pool. I have create a backend that only compute matmul for FP8 test and use OpenMP thread inside, it was even faster that tinyBLAS. And that is the case for the other BLAS backend. Never mind, I'll have a try to see how hard it is, and if I success we can bench it. Update: I have look at ggml_graph_compute and #1999 ... I need more time to have complete view on the threads part. My first impression is that maybe we should move the thread provisioning out of the CPU backend and make it usable by other backends. But I didn't spend much time analyzing it. Update: On CPU "backend" (at least), BLAS and AMX only compute part of the graph a have there own thread management. |
See discussion in #10343 (comment) |
Start a Discussion to get a better idea of how to do it |
The
LLAMAFILE
SGEMM routines are currently called directly from withinggml-cpu.c
based on compile-time conditionals:https://github.com/ggerganov/llama.cpp/blob/a9e8a9a0306a8093eef93b0022d9f45510490072/ggml/src/ggml-cpu.c#L7454-L7481
In order to simplify the logic and reduce the coupling of the different BLAS implementations, the
LLAMAFILE
code should be moved into aggml
backend, similar to the other BLAS implementations.Not sure if it has to be a new backend, or if we can move it in the existing
ggml-blas
backend - TBD.The text was updated successfully, but these errors were encountered: