Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU not being used #7

Closed
drphero opened this issue Mar 25, 2024 · 3 comments
Closed

GPU not being used #7

drphero opened this issue Mar 25, 2024 · 3 comments

Comments

@drphero
Copy link

drphero commented Mar 25, 2024

I'm unable to use the llmware one click installer because I'm using a cloud provider which makes docker a no-go. So I went with the llama_index one. Everything seems to be working, but extremely slowly.

llama_print_timings:        load time =   45200.28 ms
llama_print_timings:      sample time =      36.97 ms /   122 runs   (    0.30 ms per token,  3300.33 tokens per second)
llama_print_timings: prompt eval time =   96395.93 ms /  1019 tokens (   94.60 ms per token,    10.57 tokens per second)
llama_print_timings:        eval time =   21954.99 ms /   121 runs   (  181.45 ms per token,     5.51 tokens per second)
llama_print_timings:       total time =  118764.03 ms /  1140 tokens

This leads me to believe that the GPU (Quadro RTX 6000) is not being used. I saw that there is a check_gpu_enabled.py so I edited the model path in that and got an output that contains BLAS = 0. As a side note, the model that is automatically downloaded is different than what is listed in the readme for llama_index.

Activating the environment and running pip show torch gives:

Name: torch
Version: 2.2.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: C:\Users\Shadow\prompt_quill\llama_index_pq\installer_files\env\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: llama-index-embeddings-huggingface

Automatic1111, ComfyUI, Oobabooga, etc. all work fine with the GPU, so I must be missing something here. Any tips to get it to use the GPU?

@drphero
Copy link
Author

drphero commented Mar 25, 2024

I found the somewhat hidden 'Setup visual studio community for llamacpp.odt'. I completed everything in there (Visual Studio with c++ was already done for some ComfyUI nodes previously) and the problem persists.

@drphero
Copy link
Author

drphero commented Mar 25, 2024

I think I figured out what the problem is. It seems that for some people, setting CMAKE_ARGS doesn't work. So I took the advice from here abetlen/llama-cpp-python#284 (comment).

So first, clone the llama-cpp-python repository with the --recurse-submodules option. Then in vendor/llama.cpp, edit CMakeLists.txt and change LLAMA_CUBLAS to ON.

Then create a venv in the llama-cpp-python directory and run set FORCE_CMAKE = 1 && pip install . -vv. This will take a while to finish. Once that's done, copy the llama_cpp and llama_cpp_python-0.2.57.dist-info directories inside venv/Lib/site-packages and paste them into installer_files\env\Lib\site-packages.

Only after doing it that way was I able to get it to use the GPU.

@osi1880vr
Copy link
Owner

interesting, so I copy that into the somewhat hidden docu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants