Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dumped on trying to import from llama_cpp module when built with CUBLAS=on #412

Closed
m-from-space opened this issue Jun 21, 2023 · 20 comments
Labels
build hardware Hardware specific issue

Comments

@m-from-space
Copy link

  1. I installed llama-cpp-python successfully with CUBLAS on my system with the following command:

CUDACXX=/usr/local/cuda/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

  1. When trying to start using it, a severe crash is happening on importing the module:
$ python
Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
Illegal instruction (core dumped)
  1. This also effects using text-generation-webui with CUBLAS on, so I cannot load any llama.cpp model with it.

System: Ubuntu 20.04, RTX 3060 12 GB, 64 GB RAM, CUDA 12.1.105

@gjmulder
Copy link
Contributor

Illegal instruction usually indicates that you've compiled with AVX512 support, but your environment doesn't support AVX512. Are you by chance compiling in a virtualized environment?

@gjmulder gjmulder added build hardware Hardware specific issue labels Jun 21, 2023
@m-from-space
Copy link
Author

Illegal instruction usually indicates that you've compiled with AVX512 support, but your environment doesn't support AVX512. Are you by chance compiling in a virtualized environment?

Well, it was compiled in the (base) conda environment, which I assume is not virtualized. Am I wrong about it? And why would it not detect that my CPU doesn't support AVX512? Building without CUBLAS everything is working and AVX512 is set to 0, since it's not supported.

How can I prevent building with AVX512 support?

Thiis problem also occurs building for text-generation-webui (which of course is inside a virtualized environment).

@gjmulder
Copy link
Contributor

If it works without CUBLAS then I was wrong.

There's an issue with some OS virtualization environments that they report x86_64 AVX512 support but any code compiled in them with AVX512 cause illegal instruction errors. It is possible something similar is occurring with CUDA's nvcc compiler which compiles code for a specific Nvidia architecture.

Maybe try and older version of llama-cpp-python?

@m-from-space
Copy link
Author

Maybe try and older version of llama-cpp-python?

Now, that was good suggestion! I tried many versions and it seems to have started with version 0.1.53.

So this fixes the basic issue of importing the module by doing:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.52 --no-cache-dir

But since now I can also build this version for text-gen-webui, trying to load the model will cause a new error (probably because of the old version). I will post this error here and I think your idea about CPU instruction codes still might hold true.

2023-06-21 20:31:59 INFO:Loading thebloke_llama_30b_ggml_q5_1...
2023-06-21 20:31:59 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/XXX/TextGenWebUI/text-generation-webui/server.py", line 62, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/XXX/TextGenWebUI/text-generation-webui/modules/models.py", line 65, in load_model
    output = load_func_map[loader](model_name)
  File "/XXX/TextGenWebUI/text-generation-webui/modules/models.py", line 237, in llamacpp_loader
    from modules.llamacpp_model import LlamaCppModel
  File "/XXX/TextGenWebUI/text-generation-webui/modules/llamacpp_model.py", line 12, in <module>
    from llama_cpp import Llama, LlamaCache, LogitsProcessorList
ImportError: cannot import name 'LogitsProcessorList' from 'llama_cpp' (/XXX/TextGenWebUI/installer_files/env/lib/python3.10/site-packages/llama_cpp/__init__.py)

@gjmulder
Copy link
Contributor

There's been some changes in the the way that llama.cpp gets built which may explain your issues if an older version works.

You may want to revert to an older version of text-generation-webui that doesn't depend on the newer version of llama-cpp-python if you're now getting import errors.

It may be a case of A depends on B depends on C. I always try to be a few versions behind the "bleeding edge" so other people find the bugs first 😉

@m-from-space
Copy link
Author

Can you make sense of what might cause the issue in the first place? I tried comparing versions 0.1.52 and 0.1.53 of llama-cpp-python, but I am not deep enough into this topic to understand what's going on there.

@m-from-space
Copy link
Author

Update: This issue only happens when building with either CUBLAS, OPENBLAS or CLBLAS, not building without any of those flags for version 0.1.65 (current version).

@m-from-space
Copy link
Author

So I used gdb on my system to make some sense of the error:

Starting program: /usr/bin/python3 -c from\ llama_cpp\ import\ Llama
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x00007ffff71898e2 in ggml_init ()
   from /home/XXX/.local/lib/python3.8/site-packages/llama_cpp/libllama.so

Looks like it's happening inside libllama.so while doing ggml_init()

@m-from-space
Copy link
Author

So could this problem be about memory allocation? I have good amount of memory (64 GB) and 12 GB VRAM, but maybe llama-cpp-python is trying to do something illegal in that field?

What are those weird lines that were introduced in llama-cpp-python version 0.1.53 (the version that suddenly doesn't work anymore) called "file magic" (that's soo suspicious):

LLAMA_FILE_MAGIC_GGML = ctypes.c_uint(0x67676D6C)

By the way, just to make clear: building and using llama.cpp itself (and using CUBLAS) works flawlessly on my system, so it's not about that. It's about llama-cpp-python building with *BLAS support.

@gjmulder
Copy link
Contributor

The file magic is for versioning models and llama.cpp. It was implemented months ago and is useful for determining if you llama.cpp is compatible with your model. Any v3 models will not work with older commits of llama.cpp. Backward compatibility of v2 quantized models with the latest llama.cpp is also not guaranteed. I always test with a fp16 v1 unquantized model as it should be compatible with any version of llama.cpp (and therefore llama-cpp-python).

The key question is which version of llama.cpp and which version of llama.cpp is llama-cpp-python using? There's continuous change in llama.cpp (e.g. model quantization, changes to CMake builds, improved CUDA support, CUBLAST support, etc.), so it is best to revert to the exact llama.cpp commit your llama-cpp-python is using and verify that that compiles and runs with no issues.

One way to do this is to build from source llama-cpp-python and then:

$ cd ./vendor/llama.cpp
(cmake with the same cmake args passed to llama-cpp-python)
$ ./main etc.

@m-from-space
Copy link
Author

One way to do this is to build from source llama-cpp-python

Thank you for still being with me.

I actually found out what causes the issue, but I don't know what to make of it. Maybe some expert has an idea.

When building llama.cpp with make using the following command, everything is working using my GGML model and GPU acceleration is also fine. That's what I stated earlier.

make LLAMA_CUBLAS=1

When building the same source code of llama.cpp with cmake instead, using the following line, the binaries (like main) will throw the core dump error just when starting it. Building will throw no error though, just some warnings.

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release

So this whole charade is about different build methods and has to do with llama.cpp directly, not llama-cpp-python. I just didn't think of the fact that I was building with pure make. There is still the question what is happening here.

@gjmulder
Copy link
Contributor

gjmulder commented Jun 24, 2023

At a guess the llama.cpp build dev has updated the cmake build process but not the make build process. 25% of the bugs reported against llama-cpp-python are build related. I suspect the majority are due to llama.cpp, hence the text I added to the issue template here to check that the same problem doesn't occur when building and running llama.cpp.

People seem to want insist the issue is with llama-cpp-python, so these days I refer them to the supplied Dockerfiles which have tightly controlled OS build environments, unless it is a clearly a duplicate of a previously reported build issue.

@m-from-space
Copy link
Author

@gjmulder Thanks a bunch for your help. You were right in the beginning, it was about CPU instructions, just not AVX512.

Here is how I can build it successfully:

CMAKE_ARGS="-DLLAMA_CUBLAS=ON -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

@gjmulder
Copy link
Contributor

My assumption was that you were using a CPU built within the last 10 years. AVX2, FMA and F16C have been supported for around 10 years for most CPUs. AVX512 is much more recent and isn't supported by all desktop CPUs, but is supported by a lot of cloud server CPUs.

@RicoElectrico
Copy link

CMAKE_ARGS="-DLLAMA_CUBLAS=ON -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

OMG, it took me a whole day trying to install with cuBLAS, please pin this somewhere in the FAQ, this worked for me! (Ubuntu 22.04, WSL2, i7-3770k, GTX 1080)

The corollary is that build process for wheels is just horribly hard for people to debug; normally when you build stuff outside of pip, googling errors and installing packages or adding env vars gets you there 90% of the time.

@gjmulder
Copy link
Contributor

Hence the suggestion in the issue template to try and build llama.cpp standalone as all the above issues are due to the llama.cpp build.

@m-from-space
Copy link
Author

Hence the suggestion in the issue template to try and build llama.cpp standalone as all the above issues are due to the llama.cpp build.

This didn't help me, since it worked with llama.cpp until I realized that the issue was building with cmake instead of make (which worked flawlessly). Maybe it would be wise to suggest building with make instead.

@James328
Copy link

CMAKE_ARGS="-DLLAMA_CUBLAS=ON -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

Same as @RicoElectrico, this took me a whole day to figure out before stumbling on this. Please include in FAQ.

Proxmox
Ubuntu 22.04
Xeon E5 2667 v2
P40

@netrunnereve
Copy link

And my Ivy Bridge triggered this again when updating llama-cpp-python. My PR ggml-org/llama.cpp#3273 should fix this issue permanently once it gets merged.

@dimaioksha
Copy link

Hello everyone.
I've solved this issue by upgrading nvidia-cuda-toolkit from 11.6 to 11.8 version (the latest llama-cpp-python==0.2.29 is working well)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build hardware Hardware specific issue
Projects
None yet
Development

No branches or pull requests

6 participants