Skip to content

Distribute wheels with cuBLAS support for all supported NVIDIA GPU architectures #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jllllll opened this issue Jun 19, 2023 · 2 comments
Labels
build duplicate This issue or pull request already exists enhancement New feature or request hardware Hardware specific issue

Comments

@jllllll
Copy link

jllllll commented Jun 19, 2023

I recently discovered that llama-cpp-python can be compiled with cuBLAS support for all supported GPU architectures by setting the CUDAFLAGS environment variable to -arch=all. On Windows, I can use these commands in CMD:

set FORCE_CMAKE=1
set "CMAKE_ARGS=-DLLAMA_CUBLAS=on"
set "CUDAFLAGS=-arch=all -lcublas"
python -m pip install git+https://github.com/abetlen/llama-cpp-python

Note that, due to an issue with the current llama.cpp version in this repo, -lcublas has to be added as well in order to link the needed cuBLAS library. Setting the VERBOSE environment variable to 1 allows you to see the full output of the build process. Doing this, I can see that it is indeed building for all architectures as it shows a warning about the deprecated Kepler architectures.

The resulting wheel works on my own system, but that is to be expected. I have not been able to test if a wheel works on a different system.

This will greatly improve the user experience for text-generation-webui. Especially for Windows users due to eliminating the need for Visual Studio.

@jllllll jllllll changed the title Distribute wheels with cuBLAS support for all NVIDIA GPU architectures Distribute wheels with cuBLAS support for all supported NVIDIA GPU architectures Jun 19, 2023
@jllllll
Copy link
Author

jllllll commented Jun 19, 2023

Seems this method does not work with the latest llama.cpp due to them changing their CMakeLists.txt to use -arch=native.

@gjmulder
Copy link
Contributor

Duplicate of #243

@gjmulder gjmulder marked this as a duplicate of #243 Jun 20, 2023
@gjmulder gjmulder added the enhancement New feature or request label Jun 20, 2023
@jllllll jllllll closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build duplicate This issue or pull request already exists enhancement New feature or request hardware Hardware specific issue
Projects
None yet
Development

No branches or pull requests

2 participants