Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to install with GPU support via cuBLAS and CUDA #250

Closed
DavidBurela opened this issue May 20, 2023 · 8 comments
Closed

How to install with GPU support via cuBLAS and CUDA #250

DavidBurela opened this issue May 20, 2023 · 8 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@DavidBurela
Copy link

DavidBurela commented May 20, 2023

Submitting and closing, to help anyone else searching for how to solve this. Including my error message as that is where I was stuck with no results found on the web.
I have also captured an exact step by step in this ReadMe: https://github.com/DavidBurela/edgellm#edgellm

Install CUDA toolkit

You need to ensure you have the CUDA toolkit installed. as you need nvcc etc in your path, to correctly compile when you install via
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Ensure you install the correct version of CUDA toolkit

When I installed with cuBLAS support and tried to run, I would get this error
the provided PTX was compiled with an unsupported toolchain.

I was able to pin the root cause down to the CUDA Toolkit version being installed, was newer than what my GPU Drivers supported.
Run nvidia-smi, and note what version of CUDA is supported in the top right.
Here my GPU drivers support 12.0, so I can install CUDA toolkit 12.0.1
image

Download & install the correct version

Direct download and install

https://developer.nvidia.com/cuda-toolkit-archive

Conda

If you are using Conda you can also download it directly into your environment

conda create -n condaexample python=3.11 #enter later python version if needed
conda activate condaexample 
# Full list at https://anaconda.org/nvidia/cuda-toolkit
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

Enable in code

# CPU only
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8)

# GPU. Must specify number of layers to load into VRAM
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8, n_gpu_layers=20)
@DavidBurela DavidBurela changed the title Compiling with cuBLAS support How to install with cuBLAS support with CUDA May 20, 2023
@DavidBurela
Copy link
Author

Closing the issue so it doesn't clog up the issues. But should still be searchable.

@DavidBurela DavidBurela changed the title How to install with cuBLAS support with CUDA How to install with GPU support via cuBLAS and CUDA May 20, 2023
@gjmulder gjmulder reopened this May 20, 2023
@gjmulder
Copy link
Contributor

Great work @DavidBurela!

We need to document that n_gpu_layers should be set to a number that results in the model using just under 100% of VRAM, as reported by nvidia-smi. e.g. for a 13B model on my 1080Ti, setting n_gpu_layers=40 (i.e. all layers in the model) uses about 10GB of the 11GB VRAM the card provides.

Also the number of threads should be set to the physical number of cores on the system. This is usually half the number of reported hypercores, unless one is running in a VM where the number of threads need to be set to the number of virtual cores.

@abetlen should @DavidBurela's docs be put in the README.md?

@gjmulder gjmulder added documentation Improvements or additions to documentation enhancement New feature or request labels May 20, 2023
@Grubbly
Copy link

Grubbly commented May 20, 2023

Just a note, if you are not using conda and have taken the route of installing CUDA Toolkit via Nvidia's Developer Portal, you may encounter a cmake error after trying to install llama-cpp-python

The Error

$ CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

...

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.105")
      -- cuBLAS found
      -- The CUDA compiler identification is unknown
      CMake Error at /tmp/pip-build-env-u5gx6grn/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:603 (message):
        Failed to detect a default CUDA architecture.

...

ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

The Fix

Add CUDA's bin folder to $PATH

export PATH="/usr/local/cuda/bin:$PATH"

@gjmulder
Copy link
Contributor

gjmulder commented Jun 5, 2023

Closing. Please reopen if necessary.

@gjmulder gjmulder closed this as completed Jun 5, 2023
@uogbuji
Copy link

uogbuji commented Jun 13, 2023

In my case, because I had a non-cuBLAS-enabled wheel hanging around, I had to force pip to rebuild using --no-cache-dir, so:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --update --no-cache-dir llama-cpp-python

@Jawn78
Copy link

Jawn78 commented Sep 6, 2023

@DavidBurela, this gives me

Traceback (most recent call last):
  File "\privateGPT\privateGPT.py", line 95, in <module>
    main()
  File "\privateGPT\privateGPT.py", line 48, in main  
    llm = LlamaCpp(model_path=model_path, max_tokens=model_n_ctx, n_batch=model_n_batch, callbacks=callbacks, verbose=False, n_threads=8, n_gpu_layers=20)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ME\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\load\serializable.py", line 74, in __init__
    super().__init__(**kwargs)
  File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for LlamaCpp
__root__
  Could not load Llama model from path: F:\Manticore-13B.ggmlv3.q8_0.bin. Received error  (type=value_error)

@ForwardForward
Copy link

For the installation and the solution that produced the result, see user jllllllllll's post:

Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0 #721
#721 (comment)

@LukeLIN-web
Copy link

I use this way

conda create -n condaexample python=3.11 #enter later python version if needed
conda activate condaexample 
# Full list at https://anaconda.org/nvidia/cuda-toolkit
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

But why nvcc gemmwmma.cu -o a.out -lcublas -lcurand -arch=sm_80, it still shows ./a.out: error while loading shared libraries: libcublas.so.12: cannot open shared object file: No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants