-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback from Vulkan to CPU #2411
Comments
Do you have any suggestions on how we can improve the stability of ggml and whisper.cpp to reduce crashes (aborts) and ensure they consistently return errors instead? |
Hm, I haven't tested the Vulkan backend with The other error seems like the GPU device runs out of memory. I think your application can check if there is enough available memory before trying to load the Whisper model. |
There's a lot of different issues with vulkan. for instance new issue reported that vulkan failed because it doesn't support fp16 storage ggerganov/llama.cpp#7620 How can we fallback to CPU in case it failed? I consider using OpenVino instead on Windows, but last time I checked it requires special files to be installed / special model file so it won't work better than Vulkan in dekstop app. |
I've noticed that CoreML/Metal includes a fallback mechanism to CPU. Since Vulkan has compatibility issues on many modern PCs, it would be great if Vulkan could have a similar fallback. Would you be able to outline the steps needed to implement a CPU fallback for Vulkan? I'm willing to work on it and collaborate with others to push this forward. Should I focus on this in the ggml repository or in whisper.cpp? Thanks! |
I think the fallback mechanism only applies to operators that are not yet implemented on the backend. Are there such operators in the Vulkan backend? With the change that I just pushed, the memory usage should be reduced significantly. I will make a new |
Tiny model still fail to load on latest commit with vulkan. 1GB of gpu is available
Not that I'm aware of. I thought that it fallback completely to cpu. That should be useful |
@thewh1teagle Can you confirm that the memory allocation issue is now fixed with the latest commit on |
The memory allocation issue seems to be fixed in the latest version. However, many users are still reporting problems related to Vulkan. For example:
I believe providing an option to fall back to CPU-only inference would still be very useful, especially on Windows. |
Vulkan has a lot of bugs on Windows / Linux. but when it works, it works much faster than CPU. (10-20x faster)
I'm forced to use Vulkan in the project vibe but many users report that it's crash on Windows / Linux.
Some of the errors:
PopOS
thewh1teagle/vibe#269
Ubuntu
Arch
thewh1teagle/vibe#267
Windows
thewh1teagle/vibe#266
thewh1teagle/vibe#263
Windows
The text was updated successfully, but these errors were encountered: