Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using BLIS or MKL #322

Open
gemiduck opened this issue Jul 18, 2023 · 3 comments
Open

Using BLIS or MKL #322

gemiduck opened this issue Jul 18, 2023 · 3 comments

Comments

@gemiduck
Copy link

Hi, is it possible to use either BLIS or MKL instead of OpenBLAS? I'm using a AMD EPYC 7543 and the performance without it is much faster, so I'm wondering if either of the two would help prompt eval time.

python3 koboldcpp.py --threads 4 --smartcontext ../model_13b.bin

Without OpenBLAS:

Processing Prompt (180 / 180 tokens)
Generating (15 / 110 tokens)
Time Taken - Processing:22.3s (124ms/T), Generation:2.7s (180ms/T), Total:25.0s (0.6T/s)

With BLAS:

Processing Prompt [BLAS] (46 / 46 tokens)
Generating (33 / 110 tokens)
Time Taken - Processing:14.5s (315ms/T), Generation:6.2s (187ms/T), Total:20.6s (1.6T/s)

I'm wondering whether this is a multi-thread issue, as the timing I get with one BLAS thread is comparable, but still slightly higher.

@LostRuins
Copy link
Owner

I tried getting BLIS to work previously, but it wouldn't compile correctly on my system for some reason. Anecdotally, from what I've read MKL should be faster than OpenBLAS, but a lot of it appears to be proprietary. You could certainly try swapping the BLAS libraries with BLIS/MKL if you can, assuming your libraries are compiled and installed correctly it should be only a few lines changed, since the function signature for cblas_sgemm should be similar.

@Penguinehis
Copy link

Hi you have the lines needed to compile with MKL?, i'm testing with my I5 11400F (server), to view if i can get it faster than my R5 3600 server

@Jacoby1218
Copy link

if you mean Intel MKL, it's open source now, under Apache license (now called oneMKL), https://github.com/oneapi-src/oneMKL
And I think this might be a big benefit to Intel ARC GPUs, while also being compatible with other systems as well (the library supports both CUDA and ROCm.) Not a dev so no idea if that would actually be useful or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants