The LLAMAFILE SGEMM routines are currently called directly from within ggml-cpu.c based on compile-time conditionals:
https://github.com/ggerganov/llama.cpp/blob/a9e8a9a0306a8093eef93b0022d9f45510490072/ggml/src/ggml-cpu.c#L7454-L7481
In order to simplify the logic and reduce the coupling of the different BLAS implementations, the LLAMAFILE code should be moved into a ggml backend, similar to the other BLAS implementations.
Not sure if it has to be a new backend, or if we can move it in the existing ggml-blas backend - TBD.