CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

randombk · 2023-06-19T20:29:28Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ X] I carefully followed the README.md.
[ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When using a Kaggle notebook with 2xT4 GPU, llama-cpp-python should work as expected.

Current Behavior

#python -m 'llama_cpp'

ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla T4
  Device 1: Tesla T4
CUDA error 222 at /tmp/pip-install-2ecmu5o2/llama-cpp-python_284b4b67e8bf4aecb8c75b3d2715bc08/vendor/llama.cpp/ggml-cuda.cu:1501: the provided PTX was compiled with an unsupported toolchain.

Running llama.cpp directly works as expected.

Environment and Context

Free Kaggle notebook running with the 'T4 x2' GPU accelerator.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   45C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

$ python3 --version => Python 3.10.10
$ make --version => GNU Make 4.3
$ g++ --version => g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Failure Information (for bugs)

Steps to Reproduce

I published a repro at https://www.kaggle.com/randombk/bug-llama-cpp-python-cuda-222-repro

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-06-19T20:57:18Z

This looks like something to do with the llama.cpp Python binding (separate project) rather than llama.cpp itself. This issue over there looks related: abetlen/llama-cpp-python#250

TL;DR: You probably compiled against (or are using a version compiled against) a different version of CUDA than where you're running it.

randombk · 2023-06-19T21:05:07Z

Oops, I filed in the wrong repo. Closing

wonkyoc · 2023-07-26T05:41:14Z

I found that this also happens when the system tries to offload layers more than the maximum number of layers to GPUs. For instance,

llama_model_load_internal: offloaded 35/35 layers to GPU

If I try to offload 40 layers in this example, I see the error.

randombk closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

randombk commented Jun 19, 2023 •

edited

Loading

KerfuffleV2 commented Jun 19, 2023

randombk commented Jun 19, 2023

wonkyoc commented Jul 26, 2023

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue #1941

Comments

randombk commented Jun 19, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

KerfuffleV2 commented Jun 19, 2023

randombk commented Jun 19, 2023

wonkyoc commented Jul 26, 2023

randombk commented Jun 19, 2023 •

edited

Loading