You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ X] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
When using a Kaggle notebook with 2xT4 GPU, llama-cpp-python should work as expected.
Current Behavior
#python -m 'llama_cpp'
ggml_init_cublas: found 2 CUDA devices:
Device 0: Tesla T4
Device 1: Tesla T4
CUDA error 222 at /tmp/pip-install-2ecmu5o2/llama-cpp-python_284b4b67e8bf4aecb8c75b3d2715bc08/vendor/llama.cpp/ggml-cuda.cu:1501: the provided PTX was compiled with an unsupported toolchain.
Running llama.cpp directly works as expected.
Environment and Context
Free Kaggle notebook running with the 'T4 x2' GPU accelerator.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 44C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 45C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$ python3 --version => Python 3.10.10
$ make --version => GNU Make 4.3
$ g++ --version => g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
This looks like something to do with the llama.cpp Python binding (separate project) rather than llama.cpp itself. This issue over there looks related: abetlen/llama-cpp-python#250
TL;DR: You probably compiled against (or are using a version compiled against) a different version of CUDA than where you're running it.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When using a Kaggle notebook with 2xT4 GPU,
llama-cpp-python
should work as expected.Current Behavior
Running llama.cpp directly works as expected.
Environment and Context
Free Kaggle notebook running with the 'T4 x2' GPU accelerator.
Failure Information (for bugs)
Steps to Reproduce
I published a repro at https://www.kaggle.com/randombk/bug-llama-cpp-python-cuda-222-repro
The text was updated successfully, but these errors were encountered: