GPU not being used #7

drphero · 2024-03-25T15:10:51Z

I'm unable to use the llmware one click installer because I'm using a cloud provider which makes docker a no-go. So I went with the llama_index one. Everything seems to be working, but extremely slowly.

llama_print_timings:        load time =   45200.28 ms
llama_print_timings:      sample time =      36.97 ms /   122 runs   (    0.30 ms per token,  3300.33 tokens per second)
llama_print_timings: prompt eval time =   96395.93 ms /  1019 tokens (   94.60 ms per token,    10.57 tokens per second)
llama_print_timings:        eval time =   21954.99 ms /   121 runs   (  181.45 ms per token,     5.51 tokens per second)
llama_print_timings:       total time =  118764.03 ms /  1140 tokens

This leads me to believe that the GPU (Quadro RTX 6000) is not being used. I saw that there is a check_gpu_enabled.py so I edited the model path in that and got an output that contains BLAS = 0. As a side note, the model that is automatically downloaded is different than what is listed in the readme for llama_index.

Activating the environment and running pip show torch gives:

Name: torch
Version: 2.2.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: C:\Users\Shadow\prompt_quill\llama_index_pq\installer_files\env\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: llama-index-embeddings-huggingface

Automatic1111, ComfyUI, Oobabooga, etc. all work fine with the GPU, so I must be missing something here. Any tips to get it to use the GPU?

The text was updated successfully, but these errors were encountered:

drphero · 2024-03-25T15:45:44Z

I found the somewhat hidden 'Setup visual studio community for llamacpp.odt'. I completed everything in there (Visual Studio with c++ was already done for some ComfyUI nodes previously) and the problem persists.

drphero · 2024-03-25T16:59:25Z

I think I figured out what the problem is. It seems that for some people, setting CMAKE_ARGS doesn't work. So I took the advice from here abetlen/llama-cpp-python#284 (comment).

So first, clone the llama-cpp-python repository with the --recurse-submodules option. Then in vendor/llama.cpp, edit CMakeLists.txt and change LLAMA_CUBLAS to ON.

Then create a venv in the llama-cpp-python directory and run set FORCE_CMAKE = 1 && pip install . -vv. This will take a while to finish. Once that's done, copy the llama_cpp and llama_cpp_python-0.2.57.dist-info directories inside venv/Lib/site-packages and paste them into installer_files\env\Lib\site-packages.

Only after doing it that way was I able to get it to use the GPU.

osi1880vr · 2024-03-26T16:05:55Z

interesting, so I copy that into the somewhat hidden docu

osi1880vr closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU not being used #7

GPU not being used #7

drphero commented Mar 25, 2024

drphero commented Mar 25, 2024

drphero commented Mar 25, 2024

osi1880vr commented Mar 26, 2024

GPU not being used #7

GPU not being used #7

Comments

drphero commented Mar 25, 2024

drphero commented Mar 25, 2024

drphero commented Mar 25, 2024

osi1880vr commented Mar 26, 2024