Closed
Description
Not sure if a Github issue is the right forum for this question, but was wondering if it's possible to use the GPU for prompt ingestion. I have an AMD GPU and with ClBlast I get about 3X faster ingestion on long prompts compared to a CPU.
But a 12-thread CPU is faster than the GPU for inference by around 30%.
Was wondering if I could combine the two so I can eat my cake and have it too!