-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core dumped on trying to import from llama_cpp module when built with CUBLAS=on #412
Comments
Illegal instruction usually indicates that you've compiled with AVX512 support, but your environment doesn't support AVX512. Are you by chance compiling in a virtualized environment? |
Well, it was compiled in the (base) conda environment, which I assume is not virtualized. Am I wrong about it? And why would it not detect that my CPU doesn't support AVX512? Building without CUBLAS everything is working and AVX512 is set to 0, since it's not supported. How can I prevent building with AVX512 support? Thiis problem also occurs building for text-generation-webui (which of course is inside a virtualized environment). |
If it works without CUBLAS then I was wrong. There's an issue with some OS virtualization environments that they report x86_64 AVX512 support but any code compiled in them with AVX512 cause illegal instruction errors. It is possible something similar is occurring with CUDA's Maybe try and older version of |
Now, that was good suggestion! I tried many versions and it seems to have started with version 0.1.53. So this fixes the basic issue of importing the module by doing:
But since now I can also build this version for text-gen-webui, trying to load the model will cause a new error (probably because of the old version). I will post this error here and I think your idea about CPU instruction codes still might hold true.
|
There's been some changes in the the way that You may want to revert to an older version of It may be a case of A depends on B depends on C. I always try to be a few versions behind the "bleeding edge" so other people find the bugs first 😉 |
Can you make sense of what might cause the issue in the first place? I tried comparing versions |
Update: This issue only happens when building with either CUBLAS, OPENBLAS or CLBLAS, not building without any of those flags for version |
So I used
Looks like it's happening inside |
So could this problem be about memory allocation? I have good amount of memory (64 GB) and 12 GB VRAM, but maybe What are those weird lines that were introduced in llama-cpp-python version 0.1.53 (the version that suddenly doesn't work anymore) called "file magic" (that's soo suspicious):
By the way, just to make clear: building and using |
The file magic is for versioning models and The key question is which version of One way to do this is to build from source
|
Thank you for still being with me. I actually found out what causes the issue, but I don't know what to make of it. Maybe some expert has an idea. When building
When building the same source code of
So this whole charade is about different build methods and has to do with |
At a guess the People seem to want insist the issue is with |
@gjmulder Thanks a bunch for your help. You were right in the beginning, it was about CPU instructions, just not AVX512. Here is how I can build it successfully:
|
My assumption was that you were using a CPU built within the last 10 years. AVX2, FMA and F16C have been supported for around 10 years for most CPUs. AVX512 is much more recent and isn't supported by all desktop CPUs, but is supported by a lot of cloud server CPUs. |
OMG, it took me a whole day trying to install with cuBLAS, please pin this somewhere in the FAQ, this worked for me! (Ubuntu 22.04, WSL2, i7-3770k, GTX 1080) The corollary is that build process for wheels is just horribly hard for people to debug; normally when you build stuff outside of pip, googling errors and installing packages or adding env vars gets you there 90% of the time. |
Hence the suggestion in the issue template to try and build |
This didn't help me, since it worked with |
Same as @RicoElectrico, this took me a whole day to figure out before stumbling on this. Please include in FAQ. Proxmox |
And my Ivy Bridge triggered this again when updating |
Hello everyone. |
llama-cpp-python
successfully with CUBLAS on my system with the following command:CUDACXX=/usr/local/cuda/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
System: Ubuntu 20.04, RTX 3060 12 GB, 64 GB RAM, CUDA 12.1.105
The text was updated successfully, but these errors were encountered: