How to install with GPU support via cuBLAS and CUDA #250

DavidBurela · 2023-05-20T01:46:43Z

Submitting and closing, to help anyone else searching for how to solve this. Including my error message as that is where I was stuck with no results found on the web.
I have also captured an exact step by step in this ReadMe: https://github.com/DavidBurela/edgellm#edgellm

Install CUDA toolkit

You need to ensure you have the CUDA toolkit installed. as you need nvcc etc in your path, to correctly compile when you install via
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Ensure you install the correct version of CUDA toolkit

When I installed with cuBLAS support and tried to run, I would get this error
the provided PTX was compiled with an unsupported toolchain.

I was able to pin the root cause down to the CUDA Toolkit version being installed, was newer than what my GPU Drivers supported.
Run nvidia-smi, and note what version of CUDA is supported in the top right.
Here my GPU drivers support 12.0, so I can install CUDA toolkit 12.0.1

Download & install the correct version

Direct download and install

https://developer.nvidia.com/cuda-toolkit-archive

Conda

If you are using Conda you can also download it directly into your environment

conda create -n condaexample python=3.11 #enter later python version if needed
conda activate condaexample 
# Full list at https://anaconda.org/nvidia/cuda-toolkit
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

Enable in code

# CPU only
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8)

# GPU. Must specify number of layers to load into VRAM
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8, n_gpu_layers=20)

The text was updated successfully, but these errors were encountered:

DavidBurela · 2023-05-20T01:47:35Z

Closing the issue so it doesn't clog up the issues. But should still be searchable.

gjmulder · 2023-05-20T06:48:46Z

Great work @DavidBurela!

We need to document that n_gpu_layers should be set to a number that results in the model using just under 100% of VRAM, as reported by nvidia-smi. e.g. for a 13B model on my 1080Ti, setting n_gpu_layers=40 (i.e. all layers in the model) uses about 10GB of the 11GB VRAM the card provides.

Also the number of threads should be set to the physical number of cores on the system. This is usually half the number of reported hypercores, unless one is running in a VM where the number of threads need to be set to the number of virtual cores.

@abetlen should @DavidBurela's docs be put in the README.md?

Grubbly · 2023-05-20T20:20:43Z

Just a note, if you are not using conda and have taken the route of installing CUDA Toolkit via Nvidia's Developer Portal, you may encounter a cmake error after trying to install llama-cpp-python

The Error

$ CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

...

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.105")
      -- cuBLAS found
      -- The CUDA compiler identification is unknown
      CMake Error at /tmp/pip-build-env-u5gx6grn/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:603 (message):
        Failed to detect a default CUDA architecture.

...

ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

The Fix

Add CUDA's bin folder to $PATH

export PATH="/usr/local/cuda/bin:$PATH"

gjmulder · 2023-06-05T13:27:25Z

Closing. Please reopen if necessary.

uogbuji · 2023-06-13T02:28:50Z

In my case, because I had a non-cuBLAS-enabled wheel hanging around, I had to force pip to rebuild using --no-cache-dir, so:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --update --no-cache-dir llama-cpp-python

Jawn78 · 2023-09-06T00:31:42Z

@DavidBurela, this gives me

Traceback (most recent call last):
  File "\privateGPT\privateGPT.py", line 95, in <module>
    main()
  File "\privateGPT\privateGPT.py", line 48, in main  
    llm = LlamaCpp(model_path=model_path, max_tokens=model_n_ctx, n_batch=model_n_batch, callbacks=callbacks, verbose=False, n_threads=8, n_gpu_layers=20)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ME\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\load\serializable.py", line 74, in __init__
    super().__init__(**kwargs)
  File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for LlamaCpp
__root__
  Could not load Llama model from path: F:\Manticore-13B.ggmlv3.q8_0.bin. Received error  (type=value_error)

ForwardForward · 2023-09-19T04:53:43Z

For the installation and the solution that produced the result, see user jllllllllll's post:

Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0 #721
#721 (comment)

LukeLIN-web · 2024-04-10T16:25:04Z

I use this way

conda create -n condaexample python=3.11 #enter later python version if needed
conda activate condaexample 
# Full list at https://anaconda.org/nvidia/cuda-toolkit
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

But why nvcc gemmwmma.cu -o a.out -lcublas -lcurand -arch=sm_80, it still shows ./a.out: error while loading shared libraries: libcublas.so.12: cannot open shared object file: No such file or directory

DavidBurela changed the title ~~Compiling with cuBLAS support~~ How to install with cuBLAS support with CUDA May 20, 2023

DavidBurela closed this as completed May 20, 2023

DavidBurela changed the title ~~How to install with cuBLAS support with CUDA~~ How to install with GPU support via cuBLAS and CUDA May 20, 2023

gjmulder reopened this May 20, 2023

gjmulder added documentation Improvements or additions to documentation enhancement New feature or request labels May 20, 2023

DavidBurela mentioned this issue May 23, 2023

0.1.54 pip install with cuBLAS hanging #266

Closed

4 tasks

gjmulder closed this as completed Jun 5, 2023

KerfuffleV2 mentioned this issue Jun 19, 2023

CUDA Error 222 when running in Kaggle notebook; Raw llama.cpp works without issue ggerganov/llama.cpp#1941

Closed

jamesbraza mentioned this issue Oct 4, 2023

Installing Llama.cpp without nvcc from nvidia-cuda-toolkit ggerganov/llama.cpp#3459

Closed

tenpai-git mentioned this issue Aug 11, 2024

GPU doesn't work when configured yukiarimo/yuna-ai#98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to install with GPU support via cuBLAS and CUDA #250

How to install with GPU support via cuBLAS and CUDA #250

DavidBurela commented May 20, 2023 •

edited

Loading

DavidBurela commented May 20, 2023

gjmulder commented May 20, 2023

Grubbly commented May 20, 2023 •

edited

Loading

gjmulder commented Jun 5, 2023

uogbuji commented Jun 13, 2023

Jawn78 commented Sep 6, 2023

ForwardForward commented Sep 19, 2023

LukeLIN-web commented Apr 10, 2024

How to install with GPU support via cuBLAS and CUDA #250

How to install with GPU support via cuBLAS and CUDA #250

Comments

DavidBurela commented May 20, 2023 • edited Loading

Install CUDA toolkit

Ensure you install the correct version of CUDA toolkit

Download & install the correct version

Direct download and install

Conda

Enable in code

DavidBurela commented May 20, 2023

gjmulder commented May 20, 2023

Grubbly commented May 20, 2023 • edited Loading

The Error

The Fix

gjmulder commented Jun 5, 2023

uogbuji commented Jun 13, 2023

Jawn78 commented Sep 6, 2023

ForwardForward commented Sep 19, 2023

LukeLIN-web commented Apr 10, 2024

DavidBurela commented May 20, 2023 •

edited

Loading

Grubbly commented May 20, 2023 •

edited

Loading