-
Notifications
You must be signed in to change notification settings - Fork 414
Running LLamaSharp on gpu #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, which backend package did you install for your project? There's three kinds of backends: cpu, cuda and Metal (MAC), be care that cpu and cuda backend shouldn't be installed at the same time. |
so when using LLamaSharp.Backend.cuda12 and LLamaSharp both versions 0.5.1
|
With this new setup do you get the |
No errors just running on cpu
…On Thu, 12 Oct 2023, 2:24 pm Martin Evans, ***@***.***> wrote:
With this new setup do you get the RuntimeError: The native library
cannot be found message, or does it run but on CPU only?
—
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATR54HTUX3TYUFIOTYQAHTDX67ORFANCNFSM6AAAAAA5YU6RUI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think if you've got all of the dependencies it prefers to use the CPU ones at the moment, so you may have to delete all the non-CUDA DLLs. |
yea it seems to pop out that error is there a way i can see what is missing |
Unfortunately no, as far as I know. I've always wondered why that exception doesn't just list exactly what it's missing :'( A couple of things to check:
|
okay here is the output of the cmd +---------------------------------------------------------------------------------------+ |
so after downloading and installing (https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
|
I'm at a bit of a loss what could be the problem in that case :/ Does the normal llama.cpp demo application (main.exe) work on GPU? |
did you rename the file This has tripped up a few people I have helped with LLamaSharp |
I have a very similar issue. I have one machine, a 10 core I9 with 128gb and 3090 GPU. LlamaSharp runs very fast and is certainly using the GPU. On another machine, a 56 core Xeon with 128GB, 4 NVIDIA A6000 GPUs, it runs dog slow, uses up almost all the CPU. It does seem to use one GPU, but not much. I have CUDA 12 and the latest NVIDIA drivers on each system. Output from both say they are using the GPU, but 3090 is super fast and the other is super slow. I am using the 0.5.1 Nuget package and the 0.5.1 Backend.Cuda12 on both. Also, how would I get LlamaSharp to use more than 1 GPU? |
Currently not possible, but I've just opened this PR to add support for it: #202 I don't personally use CUDA and I don't have multiple GPUs to test with, so if you could test it and confirm that it works for you that would be much appreciated! |
I would be delighted to try it but I can't get it working at all with any GPU on the multi-GPU system. Also, FYI, the 70b model doesn't seem to work on my single GPU system (128gb ram, 3090 card). Nonetheless, I will try the updated code, maybe it will fix the underlying issue. |
So, I cloned the MultiGpu branch and tried it. Got SystemAccessViolationException when trying to load the 13b model in native.cs, SafeLlamaModelHandle at Here is the model output: ggml_init_cublas: found 4 CUDA devices:
llm_load_print_meta: EOS token = 2 ' |
Maybe you need to make changes in the CUDA 12 backend? Maybe you could debug on Azure VM with multiple GPUs? I might could get one for a small while. |
How did you setup your test? The error here:
Makes it looks like you've set I assume you didn't actually do that, so I guess that there must be some mismatch between the C# and C++ sides. For example a field in the wrong place in the |
Must be a mismatch because I didn't set that in the ModelParams, just these:
|
I don't have any good ideas to debug that I'm afraid. The relevent bit of llama.cpp is this struct and the equivalent in that PR is this. As far as I can see those two agree with each other. I can't reproduce this on my PC (with just one GPU), which is odd because if there really was something misaligned in that struct I'd expect everything to fail very quickly (even basic loading of weights). |
is there a way i can test this for you or if we can have a discord call ? |
I can get llama.cpp main to work with cpu, but it just drops back to command line after a couple seconds with cuda:
I also tried renaming libllama-cuda12.dll (and the .so file for good measure). When that didn't work, I went a step further and copied libllama-cuda12.dll and .so and pasted them with the names of the other dll and so files:
I followed some advice to replace the dll and run it from the command line from the debug folder
CUDA error 209 at D:\a\LLamaSharp\LLamaSharp\ggml-cuda.cu:6862: no kernel image is available for execution on the device BTW, I don't have a "D" drive, and the code isn't in an "a" directory. Is this path hardwired somewhere in the code? Maybe it doesn't matter here, but it's curious. |
That's just a path that was baked in from the build environment, it's not a problem.
This is the error that's causing you problems - it means that the compiled code does not support your specific CUDA version. It's odd that llama.cpp isn't printing the same error, but I think that's the fundamental problem in both cases. |
I thought I saw it over at llama.cpp, but the error message there was "invalid configuration file" coming from the same line of code (ggml-cuda.cu:6862). I don't know if it's related or not. |
EDIT: I swapped GPUs from one of my PCs to the other. The RTX 2070 has intermittent performance on both PCs. The GTX 1070 has none at all. If it helps, I'm experiencing similar GPU issues to OP. I have tried the following combinations and have seen only one case where one of my GPUs intermittently functions: PC1: Windows 10, GTX 1070
PC2: Windows 10, RTX 2070, CUDA 11.8 Results displayed below: I think this is a great library. I really hope to be able to use it. Thank you. |
@atonalfreerider Hey, noticed that you have compiled the DLL from llama.cpp yourself, could you please further test if GPU is used when running directly with llama.cpp examples? That will help us to see if it's an issue of llama.cpp or LLamaSharp. :) |
Someone posted a similar issue here to llama.cpp today: From llama.cpp I downloaded the latest release b1429 I downgraded my CUDA SDK to 12.2 to match the release version and ran on RTX 2070 I ran main.exe with several different models. I ran it with the exact same model that I was using in my comment above, with the same prompt. Experiencing the same behavior as posted above. I also ran LLaMaSharp.Examples, with choice (14) coding assistant and gave the same prompt, and experienced the same behavior. Here is the output from llama-bench.exe. I'll post to the llamacpp issues:
|
I may have found a partial explanation for the behavior. I explained on the llama.cpp issue ticket referenced above: It would appear from testing that the model needs to fit into the GPU memory in order to run efficiently. Please take this result with a grain of salt. |
Thank you for the feedback! |
IMPORTANT UPDATE from the llamacpp thread. It is now possible to disable VRAM overflow with driver version 564.01: Guide: You have to individually add the path to your program executable to the NVIDIA 3D settings and opt out of RAM fallback. WARNING: This will cause your program to crash, rather than just slow down, if you overflow the memory with a large model. But it should make full use of VRAM. |
It seems to have been resolved by ggml-org/llama.cpp#3906. I think it will be included in the future release of LLamaSharp. |
That 2 character change has caused a lot of grief 😆 😢 New binaries have been merged into master (thru #249) which should speed everything up. |
HI, how did you run it on gpu? i cannot find .dll files for cuda from where i can get that? |
At the moment you need to have the CUDA toolkit installed (https://developer.nvidia.com/cuda-toolkit). |
so i am currently using LLamaSharp like this
but the issue i am encountering is that i cant seem to let this run on my gpu it only uses my cpu and ram
The text was updated successfully, but these errors were encountered: