-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Enhancement] Simultaneous CLBLAS/CUBLAS instances. #1494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like the idea of this because many folk will be scraping together whatever RAM, old or new or different GPU hardware they can find to maximise VRAM and throughput / model size (and having clarity of specification would help with this as well as maybe future things like chaining across machines). Having a clear way of specifying which layers go to which device might also help debugging any problems with code or performance on different GPUs because anyone with both could simply try relative throughout switching in different layers of model to different devices and running a test again. |
Also, while I am here, is simultaneous OpenBLAS/CUBLAS possible? I can't build with both at the same time, but it seems like OpenBLAS would be beneficial for CPU offloading unless CUBLAS is replicating that functionality. |
I don't think that will work fine though. Many copys from devices will simply reduce the speed. |
Hmmm, does CLBlast reduce generation speed on IGPs now? I would think the transfers would be fine over 1 PCIe bus and to 1 IGP. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Enhancement
If not already possible through a config I missed, would offloading some layers to CLBLAS and other layers to CUBLAS be viable? Or maybe offloading layers to multiple CLBLAS devices?
A common hardware config is a CPU with an IGP + discrete gpu, and this would allow the IGP to be utilized on systems with weak CPUs and low-vram dGPUs. And much more powerful, 4 channel IGPs are rumored to be in development at Intel/AMD.
With the extra transfers and possible CPU bandwidth starvation, this may or may not even improve performance much... I'm not sure.
The text was updated successfully, but these errors were encountered: