-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Won't Use GPU+CPU in 1.78. #1225
Comments
How many layers is it currently offloading? Try offloading 1 or 2 fewer layers. |
I've tried the max it thinks being 49 layers. to as low as 33 with same results. no matter the layer offload it says it failed and goes to cpu only backend. |
Alright, could you try running it in a command prompt terminal, then copying the console output (including the crash message) here? |
Sorry for the long delay, Here is hopefully what you wanted. while using a llm 13b model. CLblast: Processing Prompt [BLAS] (512 / 1247 tokens)CLBlast: OpenCL error: clEnqueueNDRangeKernel: -4 QF32 Matmul Failed (-4): [dims: 5120,5120,5120,512] You may be out of VRAM. Please check if you have enough. Cublas: vulkan: |
Can you try with cublas, with lowvram enabled and flash attention disabled? |
Cublas, lowvram, flash memory i always have disabled. |
And it just crashes after that line? |
It doesn't crash but as the report says is auto falls back to CPU only mode, it just makes the GPU do nothing for 40 of 41 layers and saddles the rest on the CPU turning my Linux into a slow slide show until i tell the terminal to stop. |
If you mean |
Thank you, yet as i made this issue for, why is it only using 1 tensor? as i explained at the beginning. the last version i used lets me use all my tensors for the ai then the cpu for the rest. How ever currently it seems to that outside of 11B models the program cannot actively use my gpu with the cpu henceforth making the statement and as you already said just makes the gpu use 1 tensor whilst throttling my cpu for both operations. |
No it's the opposite. Everything is working with GPU fine, beside one tensor that's on the cpu. |
oh i see, sorry for all of this. |
Describe the Issue
Using the latest update has made issues with all models i run mostly in anything above 11B. Upon which in Vulkan, Clblast, Cublas and all legacy's. the ai with character card injected. crashes with over flow vram rather then to use cpu and gpu together.
essentially instead of ai model being held by gpu and cpu it just only does gpu and crashes.
Additional Information:
using UBUNTU 24.04 cinnamon fully updated with latest Silly Tavern as well. for reference 1.73 works well with 11B and 13B using ai fall back rather then now where the 13B model won't fit into 10G's RTX 3080 LHR, ryzen 7 5700G 128 gig's ddr4. to help gauge specs.
(I've never made issues in git hub often to know if I'm doing it right. Sorry.)
The text was updated successfully, but these errors were encountered: