-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No cuBLAS #101
Comments
Yes see docker PR for how to |
It seems to be about openblas, not cublas. |
Well you asked about openblas originally. Cublas should work exactly the same way. |
Yeah i wrote BLAS without making it clear what i speak about, my bad. Yet i struggle to realize how to force micromamba that used in ooba's webui to accept -DLLAMA_CUBLAS=ON parameters |
It needs to be compiled with an ENV Var set: |
Well i can compile original llama.cpp with such but when it comes to llama-cpp-python within micromamba env things become too complicated for my limited abilities. If i can read anything that could bring me closer to solution i would deeply appreciate it. |
|
I'm sorry but i struggle to understand how to make llama_cpp_python-0.1.36-cp310-cp310-win_amd64.whl that has n_batch of 512 instead of 8 and cuBLAS enabled. |
Hard to believe but with help of GPT it seems like i managed to learn how to and make it. |
Care to share? 😄 |
Tho looks just fine, no idea why my dll isn't working
WHL looks perfect setting aside it doesn't work at all |
When i use my wheel that made with cublas enabled it creates 720KB dll file that somehow fail to work. Could anyone help with some ideas? |
No luck. regretfully i can't anything about it. My only hope that cublas wheels would be published by someone of a greater intellect later |
The dll file is really at that place E:\LLaMA\oobabooga-windows\installer_files\env\lib\site-packages\llama_cpp\llama.dll ? Looks like a path issue. |
Yeah, it's def there |
If i manually change my dll to normal one it immediately begins to load models, so it's more about
|
This issue is entirely cuda related. If i make dll with cublas off and then manually swap it, it still loads all models and stuff. But for cublas on it doesn't |
cuBLAS will not work with any of the wheels unfortunately, llama.cpp has to be compiled from source for it to work so you either have to install from PyPI or from the Github. |
@abetlen Could you elaborate on how you built to get cublas working? I've been trying this, but I'm not able to match the speeds seen when I just build llama.cpp with cublas (150ms/t vs 40ms/t for prompt eval time). I'm on Ubuntu. Here's essentially what I tried:
I see that BLAS is enabled when loading models with llama-cpp-python, but the performance is still so slow compared to llama.cpp... I also tried copying the libllama.so file into _skbuild, just to see if that changed anything. Any clue where I've gone wrong here? Thanks! |
Do you have n_batch 32 or higher (512)? BLAS shows active but doesn't work for default (8) batch size. |
I do following:
Then i go to
and change
to
after that
now in Any ideas what exactly i'm doing wrong? |
If literally same way that mentioned above i compile .exe using cmake it works flawlessly
Why on earth it doesn't work as dll. |
This should be the same ie. |
@Priestru you should follow the development install instructions ie: git clone git@github.com:abetlen/llama-cpp-python.git
git submodule update --init --recursive
# Will need to be re-run any time vendor/llama.cpp is updated
LLAMA_CUBLAS=1 python3 setup.py develop |
Thanks, I gave it another shot this morning and got it to work. Part of the issue might be that I needed to raise the default n_batch to 512, but I also changed |
I can see the PID running in
|
I'm still suffering with this issue. I even asked my friend to help, but nothing came to fruition.
can't use
Then i do
such command doesn't work for me in windows so i manually change LLAMA_CUBLAS to ON and use everything seems to be fine, but i have no idea, how i can check if anything works at all at this point. I tried to randomly launch python scripts everywhere but no luck. How can i load model using llama-cpp-python directly without oobabooga gui? |
I'm trying to do it on fresh WSL instance now. |
screams n_batch = 8 to me. |
@Priestru n_batch is an argument you can set when initializing the LLM in your Python script. You should just be able to add |
Okay i built a wsl version. Followed and copied those changes ggml-org/llama.cpp#1128 And also did brand new It def updated my version to newer one cus 3.4 can't load q4_3 but mine can. Why for the love of God is this so hard. |
Okay this one worked. Required
But |
I did it on wsl. No idea how to make it on windows, but it works with ooba now.
For anyone who may need this info i acted like a barbarian. I commented out ifs from makefile forcing it always do cuda. I manually placed libllama.so into
|
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Is it possible to add option to enable cuBLAS support like in an original Llama.cpp?
The text was updated successfully, but these errors were encountered: