Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

Closed
randombk opened this issue Jun 19, 2023 · 3 comments

Comments

@randombk
Copy link

randombk commented Jun 19, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When using a Kaggle notebook with 2xT4 GPU, llama-cpp-python should work as expected.

Current Behavior

#python -m 'llama_cpp'

ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla T4
  Device 1: Tesla T4
CUDA error 222 at /tmp/pip-install-2ecmu5o2/llama-cpp-python_284b4b67e8bf4aecb8c75b3d2715bc08/vendor/llama.cpp/ggml-cuda.cu:1501: the provided PTX was compiled with an unsupported toolchain.

Running llama.cpp directly works as expected.

Environment and Context

Free Kaggle notebook running with the 'T4 x2' GPU accelerator.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   45C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$ python3 --version => Python 3.10.10
$ make --version => GNU Make 4.3
$ g++ --version => g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Failure Information (for bugs)

Steps to Reproduce

I published a repro at https://www.kaggle.com/randombk/bug-llama-cpp-python-cuda-222-repro

@KerfuffleV2
Copy link
Collaborator

This looks like something to do with the llama.cpp Python binding (separate project) rather than llama.cpp itself. This issue over there looks related: abetlen/llama-cpp-python#250

TL;DR: You probably compiled against (or are using a version compiled against) a different version of CUDA than where you're running it.

@randombk
Copy link
Author

Oops, I filed in the wrong repo. Closing

@randombk randombk closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2023
@wonkyoc
Copy link

wonkyoc commented Jul 26, 2023

I found that this also happens when the system tries to offload layers more than the maximum number of layers to GPUs. For instance,

llama_model_load_internal: offloaded 35/35 layers to GPU

If I try to offload 40 layers in this example, I see the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants