Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : add support for dynamic loading of backends #10469

Merged
merged 11 commits into from
Nov 25, 2024
Merged

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Nov 23, 2024

Adds support for loading backends dynamically at load time, without needing to link to them in the build.

  • Building the backends as dynamic libraries can be enabled by building with cmake with GGML_BACKEND_DL enabled
  • Adds the function ggml_backend_load(const char * path) to load a backend dynamically
  • Adds the convenience function ggml_backend_load_all(void) to load all the known backends
  • Adds the function ggml_backend_unload(ggml_backend_reg_t reg) to unregister and unload a backend
  • Adds the optional function ggml_backend_get_features to obtain a list of flags of a backend. This replaces the calls to the ggml_cpu_has_xx functions from the CPU backend in llama.cpp
  • In addition to the CPU backend, the CUDA backend also implements ggml_backend_get_features, which returns the list of archs included in the build and the build flags used such as GGML_CUDA_FORCE_MMQ. Other backends should also implement this function to report compile-time flags and features.

TODO

  • Version checking to avoid loading incompatible backends
  • Fix ggml_backend_load_all search paths

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs examples ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 23, 2024
ggml/src/ggml-backend-impl.h Outdated Show resolved Hide resolved
slaren and others added 3 commits November 24, 2024 19:12
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
use MODULE target type for dl backend
set backend output directory to the runtime directory
ggml_backend_load_all searches backends in the system path first, then in the executable directory

ggml-ci
@@ -251,7 +251,7 @@ endif
#

# keep standard at C11 and C++11
MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon
MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -DGGML_USE_CPU
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGML_USE_CPU now needs to be defined to use the CPU backend with the backend registry. This is necessary because the CPU backend now may be loaded dynamically, so it cannot be assumed that it is linked in the build. This may break other build scripts.

In Linux, it may also be necessary to link to dl for dlopen.

Copy link

@Vali-98 Vali-98 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to mention this change in: #9289

I spent a few hours scratching my head on why I had no devices.

On the side, when no devices are loaded, this causes a segfault due to cpu_dev being a nullptr:

llama.cpp/src/llama.cpp

Lines 7291 to 7292 in c9b00a7

auto * cpu_dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
auto * cpu_reg = ggml_backend_dev_backend_reg(cpu_dev);

We probably should assert or something here, or perhaps anywhere when 0 devices are present. Let the user know something is wrong.

@slaren slaren merged commit 5931c1f into master Nov 25, 2024
55 checks passed
@slaren slaren deleted the sl/dl-backend-2 branch November 25, 2024 14:13
@slaren slaren mentioned this pull request Nov 25, 2024
4 tasks
@MaggotHATE
Copy link
Contributor

Is ggml_backend_load_all() supposed to be called in static builds too? If I don't use it, there's a noticeable reduction in quality of generated answers. When used, it tries to load any backend .dll it can find, which probably shouldn't happen on a static build (especially with a backend - OPENBLAS, for example). Am I doing something wrong?

@slaren
Copy link
Collaborator Author

slaren commented Dec 12, 2024

It is ok to call ggml_backend_load_all even on static builds since it allows loading external backends. If it cannot find dynamically loadable backends present in the search paths, it won't do anything. The reduction in quality that you are observing is not likely to be caused by this.

@MaggotHATE
Copy link
Contributor

it allows loading external backends.

It does, but allocates memory again, essentially duplicating total memory usage. I suppose it's a mistake on my end? It shouldn't behave like that on a combination of a static build with a dynamic backend?

@slaren
Copy link
Collaborator Author

slaren commented Dec 12, 2024

It could happen if you have a static backend and the same backend as a dynamic backend, but that does not happen normally, because backends build without GGML_BACKEND_DL enabled cannot be loaded dynamically.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* ggml : add support for dynamic loading of backends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants