Can't we use multiple GPUs independently? #2165

gotzmann · 2023-07-10T20:15:48Z

I'm trying to use llama.cpp as a backend for scalable inference and it seems the current architecture just doesn't allow to use multiple GPUs working in parallel with different models.

From what I've understand reading code, it always suggest we are going to SPLIT the same model between multiple GPUs, not to use 1 .. N models for 1 .. N GPUs.

There global vars like g_main_gpu, etc, and from my POV this should be set within context, thus allow inference within GPU0 from CTX0, GPU1 from CTX1 - all at the same time.

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2023-07-10T20:51:52Z

I won't implement it but feel free to make a PR yourself if you want that feature.

slaren · 2023-07-10T21:19:58Z

I am working on refactoring the CUDA implementation, and one of the goals is to remove the global state. After that, associating a device or set of devices to a ggml-cuda context should be fairly straightforward.

gotzmann · 2023-07-10T21:59:35Z

@slaren are there estimate for when it will be implemented? week or two .. or some months?

slaren · 2023-07-10T22:05:31Z

I hope to open a draft PR sometime this week, but it will still take some time (weeks) until it is ready to merge. If you need this now, I would suggest hacking it yourself in the meanwhile.

SlyEcho · 2023-07-13T09:38:26Z

Couldn't you run multiple processes with each using a different model on a different GPU? Or did I not understand correctly?

gotzmann · 2023-07-13T13:49:00Z

@SlyEcho In my project llama.cpp is integrated into Golang server, so it much easier to have init llama.cpp once and manipulate with different contexts after that. It works fine with CPU. It's possible to use CPU + GPU the same time too. It just doesn't work with multiple GPUs and that's a shame. llama.cpp always assumes we are going to split one model between them, not the case when we are going to have independent inference on each GPU.

gotzmann · 2023-08-03T15:59:25Z

How it's going? MultiGPU parallel independent inference will be very useful for cloud LLM farms

Rajansharma44 · 2023-08-06T06:36:29Z

Multiple GPUs can help render frames much faster, Higher FPS in games, improved multitasking, 4K gaming becomes a reality and it might also enable having a multi-monitor setup

daddydrac · 2023-10-31T16:07:26Z

@slaren any movement on this feature?

slaren · 2023-10-31T16:19:32Z

We are getting closer, you can track the progress in the ggml repository. ggerganov/ggml#586 is the most recent update of the framework that will allow us to support this and other features, and it was just merged yesterday. It is still going to take a few weeks before this is ready to be used in llama.cpp, but we are working on it.

github-actions · 2024-04-09T01:08:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't we use multiple GPUs independently? #2165

Can't we use multiple GPUs independently? #2165

gotzmann commented Jul 10, 2023 •

edited

Loading

JohannesGaessler commented Jul 10, 2023

slaren commented Jul 10, 2023

gotzmann commented Jul 10, 2023

slaren commented Jul 10, 2023

SlyEcho commented Jul 13, 2023

gotzmann commented Jul 13, 2023

gotzmann commented Aug 3, 2023

Rajansharma44 commented Aug 6, 2023

daddydrac commented Oct 31, 2023

slaren commented Oct 31, 2023

github-actions bot commented Apr 9, 2024

Can't we use multiple GPUs independently? #2165

Can't we use multiple GPUs independently? #2165

Comments

gotzmann commented Jul 10, 2023 • edited Loading

JohannesGaessler commented Jul 10, 2023

slaren commented Jul 10, 2023

gotzmann commented Jul 10, 2023

slaren commented Jul 10, 2023

SlyEcho commented Jul 13, 2023

gotzmann commented Jul 13, 2023

gotzmann commented Aug 3, 2023

Rajansharma44 commented Aug 6, 2023

daddydrac commented Oct 31, 2023

slaren commented Oct 31, 2023

github-actions bot commented Apr 9, 2024

gotzmann commented Jul 10, 2023 •

edited

Loading