Skip to content

Use GPU for prompt ingestion and CPU for inference #1342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
regstuff opened this issue May 6, 2023 · 2 comments
Closed

Use GPU for prompt ingestion and CPU for inference #1342

regstuff opened this issue May 6, 2023 · 2 comments
Labels
need more info The OP should provide more details about the issue

Comments

@regstuff
Copy link

regstuff commented May 6, 2023

Not sure if a Github issue is the right forum for this question, but was wondering if it's possible to use the GPU for prompt ingestion. I have an AMD GPU and with ClBlast I get about 3X faster ingestion on long prompts compared to a CPU.
But a 12-thread CPU is faster than the GPU for inference by around 30%.
Was wondering if I could combine the two so I can eat my cake and have it too!

@Green-Sky
Copy link
Collaborator

But a 12-thread CPU is faster than the GPU for inference by around 30%.
Was wondering if I could combine the two so I can eat my cake and have it too!

That is (should be) already the case! can you tell us more about your setup etc?

quote from the README:

Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance.

@Green-Sky Green-Sky added the need more info The OP should provide more details about the issue label May 6, 2023
@SlyEcho
Copy link
Collaborator

SlyEcho commented May 7, 2023

It would be possible if the ggml executor could compute multiple nodes in parallel and choose where to do it on. Right now it can only split up one operation onto multiple threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info The OP should provide more details about the issue
Projects
None yet
Development

No branches or pull requests

3 participants