Skip to content

[Enhancement] Simultaneous CLBLAS/CUBLAS instances. #1494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlphaAtlas opened this issue May 17, 2023 · 5 comments
Closed

[Enhancement] Simultaneous CLBLAS/CUBLAS instances. #1494

AlphaAtlas opened this issue May 17, 2023 · 5 comments
Labels

Comments

@AlphaAtlas
Copy link

AlphaAtlas commented May 17, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ x] I carefully followed the README.md.
  • [ x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ ]x I reviewed the Discussions, and have a new bug or useful enhancement to share.

Enhancement

If not already possible through a config I missed, would offloading some layers to CLBLAS and other layers to CUBLAS be viable? Or maybe offloading layers to multiple CLBLAS devices?

A common hardware config is a CPU with an IGP + discrete gpu, and this would allow the IGP to be utilized on systems with weak CPUs and low-vram dGPUs. And much more powerful, 4 channel IGPs are rumored to be in development at Intel/AMD.

With the extra transfers and possible CPU bandwidth starvation, this may or may not even improve performance much... I'm not sure.

@deep-pipeline
Copy link

I like the idea of this because many folk will be scraping together whatever RAM, old or new or different GPU hardware they can find to maximise VRAM and throughput / model size (and having clarity of specification would help with this as well as maybe future things like chaining across machines).

Having a clear way of specifying which layers go to which device might also help debugging any problems with code or performance on different GPUs because anyone with both could simply try relative throughout switching in different layers of model to different devices and running a test again.

@AlphaAtlas
Copy link
Author

Also, while I am here, is simultaneous OpenBLAS/CUBLAS possible? I can't build with both at the same time, but it seems like OpenBLAS would be beneficial for CPU offloading unless CUBLAS is replicating that functionality.

@FNsi
Copy link
Contributor

FNsi commented May 21, 2023

I don't think that will work fine though. Many copys from devices will simply reduce the speed.

@AlphaAtlas
Copy link
Author

Hmmm, does CLBlast reduce generation speed on IGPs now?

I would think the transfers would be fine over 1 PCIe bus and to 1 IGP.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants