-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When compiling with cuBLAS, cmake ignores -DLLAMA_AVX2=OFF and builds a binary that attempts to use AVX2 #284
Comments
I wonder if this issue is what is happening on Linux as well on #272 I keep getting ILLEGAL INSTRUCTION on Linux everytime I build a new library with cuBLAS support. |
Is it possible that this is a bug with the llama-cpp instead? ggerganov/llama.cpp#809 |
Can confirm @chen369's suggestion in #272 lets me compile successfully. Cloning the repo, editing the vendor/llama.cpp CMakeLists.txt to set AVX2 OFF on line 56 and CUBLAS ON on line 70 and doing the pip install+setup from there with So a workaround exists - thanks @chen369!
Okay I really don't know how I managed it earlier, I don't think the environment variables would've been affecting it if I accidentally had them activated? But yes after checking again it turns out that cmake building llama.cpp with no arguments does not intelligently detect CUDA and no AVX2. So that's an upstream issue. But cmake building llama.cpp with |
i have the same issue, but i use linux, is there a way for me to make llama not to try to use AVX2 with cuBLAS? |
Expected Behavior
I have a CUDA supporting card and a CPU that doesn't support AVX2, and I want to build llama-cpp-python for CUDA. I can compile the latest llama.cpp in my (x64!!) Visual Studio environment with cmake, and it works,
detecting no AVX2 and CUDA out of the box without any arguments andgiving me a binary that prints the expected system infon_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0
and runs perfectly fine. So theoretically it should be possible.With llama-cpp-python I run these commands:
And I expect to get the same system info as I do for llama.cpp, AVX2=0 and BLAS=1. I also expect to be able to load models and run them!
Current Behavior
Instead I get this system info:
AVX2=1, and obviously when I try to run any model it just errors out with a Windows Error 0xc000001d because I don't actually have any AVX2 for it to use.
It also does the same thing if I transpose the arguments and use
set CMAKE_ARGS="-DLLAMA_CUBLAS=ON -DLLAMA_AVX2=OFF"
Environment and Context
i7-3770, Windows 10 Enterprise 64 bit 10.0.19044, Visual Studio 2022, cl.exe 19.35.32217.1 for x64, cmake version 3.25.1-msvc1, Python 3.10.4, pip 23.1.2
Failure Logs
Here is a verbose compile log:
The text was updated successfully, but these errors were encountered: