Distribute wheels with cuBLAS support for all supported NVIDIA GPU architectures #400
Labels
build
duplicate
This issue or pull request already exists
enhancement
New feature or request
hardware
Hardware specific issue
I recently discovered that llama-cpp-python can be compiled with cuBLAS support for all supported GPU architectures by setting the
CUDAFLAGS
environment variable to-arch=all
. On Windows, I can use these commands in CMD:Note that, due to an issue with the current llama.cpp version in this repo,
-lcublas
has to be added as well in order to link the needed cuBLAS library. Setting theVERBOSE
environment variable to1
allows you to see the full output of the build process. Doing this, I can see that it is indeed building for all architectures as it shows a warning about the deprecated Kepler architectures.The resulting wheel works on my own system, but that is to be expected. I have not been able to test if a wheel works on a different system.
This will greatly improve the user experience for text-generation-webui. Especially for Windows users due to eliminating the need for Visual Studio.
The text was updated successfully, but these errors were encountered: