NVIDIA GPU and numpy #1979

MarioRossiGithub · 2024-06-21T13:48:11Z

Hi,
I'm trying to setup Private GPT on windows WSL.
I followed the instructions here and here but I'm not able to correctly run PGTP.
If I follow this instructions:
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
I'm able to run PGPT with numpy 1.26.4 but with BLAS=0 (CPU).

If I run this instead:
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
I get BLAS=1 (GPU) but it automatically upgrades numpy to a 2.x version and PGPT doesn't work because it gives an error like "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash".

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

packagex requires numpy x.y.z but you have numpy 2.0.0 which is incompatible.

Is there a way I can downgrade numpy AND use GPU (BLAS=1)?

The text was updated successfully, but these errors were encountered:

MarioRossiGithub · 2024-06-21T15:45:07Z

After several hours of troubleshooting I finally managed to solve the issue.

Install

First of all you have to install llama-cpp forcing a specific version of numpy<2:

CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0

Ensure to:

Update your windows drivers to the latest (I'm not really sure if this helped solve the issue but I did it anyway).
Reboot your system.

Run Private GPT:

PGPT_PROFILES=local make run

If this solves your problem, good, you're done.

If you instead stumble upon another error about "CUDA error: out of memory" and "TOKENIZERS_PARALLELISM=(true | false)", ensure to set this variable to true:

TOKENIZERS_PARALLELISM=true

Then rerun Private GPT as always:

PGPT_PROFILES=local make run

This solved the issue for me.
Now Private GPT uses my NVIDIA GPU, is super fast and replies in 2-3 seconds.

I also suppose the first command should be updated on the official documentation.

On a side note:
I have this warning at the end of the run that I do not quite understand and that I cannot solve. If someone has a suggestion, thanks in advance.

py.warnings - /home/<user>/.cache/pypoetry/virtualenvs/private-gpt-ta_62_V8-py3.11/lib/python3.11/site-packages/llama_cpp/llama.py:1054: RuntimeWarning: Detected duplicate leading "<s>" in prompt, this will likely reduce response quality, consider removing it...
  warnings.warn(

theodufort · 2024-06-21T23:44:40Z

After several hours of troubleshooting I finally managed to solve the issue.

Install

First of all you have to install llama-cpp forcing a specific version of numpy<2:
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
Ensure to:

Update your windows drivers to the latest (I'm not really sure if this helped solve the issue but I did it anyway).

Reboot your system.

Run Private GPT:
PGPT_PROFILES=local make run
If this solves your problem, good, you're done.

If you instead stumble upon another error about "CUDA error: out of memory" and "TOKENIZERS_PARALLELISM=(true | false)", ensure to set this variable to true:
TOKENIZERS_PARALLELISM=true
Then rerun Private GPT as always:
PGPT_PROFILES=local make run
This solved the issue for me. Now Private GPT uses my NVIDIA GPU, is super fast and replies in 2-3 seconds.

I also suppose the first command should be updated on the official documentation.

On a side note: I have this warning at the end of the run that I do not quite understand and that I cannot solve. If someone has a suggestion, thanks in advance.
py.warnings - /home/<user>/.cache/pypoetry/virtualenvs/private-gpt-ta_62_V8-py3.11/lib/python3.11/site-packages/llama_cpp/llama.py:1054: RuntimeWarning: Detected duplicate leading "<s>" in prompt, this will likely reduce response quality, consider removing it...
  warnings.warn(

Hey thank you for that numpy part!
Like you, I am also having a GPU memory problem, it seems that 7 of 8GBs fill up as soon as i start the UI and then sometime when the file is too big I see 8GB in NVTOP and then I get a memory error message:
CUDA out of memory. Tried to allocate 22.00 MiB. GPU 0 has a total capacty of 7.92 GiB of which 4.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 2.99 GiB is allocated by PyTorch, and 41.72 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jaluma · 2024-08-05T15:51:01Z

I've just opened a PR to add a CUDA-compatible dockerfile with these problems fixes, can you try?
#2044
@MarioRossiGithub @theodufort

MarioRossiGithub · 2024-08-06T11:28:15Z

Sorry, I cannot do it anymore.

3x3cut0r mentioned this issue Jun 24, 2024

Latest container issue will not start 3x3cut0r/docker#15

Closed

jaluma added the bug Something isn't working label Jul 8, 2024

jaluma mentioned this issue Aug 7, 2024

docs: add numpy issue to troubleshooting #2048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA GPU and numpy #1979

NVIDIA GPU and numpy #1979

MarioRossiGithub commented Jun 21, 2024 •

edited

Loading

MarioRossiGithub commented Jun 21, 2024 •

edited

Loading

theodufort commented Jun 21, 2024

Install

Ensure to:

Run Private GPT:

jaluma commented Aug 5, 2024

MarioRossiGithub commented Aug 6, 2024

NVIDIA GPU and numpy #1979

NVIDIA GPU and numpy #1979

Comments

MarioRossiGithub commented Jun 21, 2024 • edited Loading

MarioRossiGithub commented Jun 21, 2024 • edited Loading

Install

Ensure to:

Run Private GPT:

theodufort commented Jun 21, 2024

Install

Ensure to:

Run Private GPT:

jaluma commented Aug 5, 2024

MarioRossiGithub commented Aug 6, 2024

MarioRossiGithub commented Jun 21, 2024 •

edited

Loading

MarioRossiGithub commented Jun 21, 2024 •

edited

Loading