Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA backend #2310

Merged
merged 52 commits into from
May 15, 2024
Merged

CUDA backend #2310

merged 52 commits into from
May 15, 2024

Conversation

cebtenzzre
Copy link
Member

@cebtenzzre cebtenzzre commented May 6, 2024

This PR adds opt-in CUDA support in the GPT4All UI and python bindings using the llama.cpp CUDA backend.

CUDA-enabled devices will appear as e.g. "CUDA: Tesla P40" on supported platforms, alongside their Vulkan counterparts. When one is selected, the CUDA backend will be used instead of the Kompute backend.

When CUDA is not available (e.g. a compatible driver or GPU is not installed), CUDA devices simply do not appear. OOM will cause fallback to CPU, just like with Kompute.

The CUDA runtime libraries are installed to the lib/ directory on Linux and Windows (cudart and cublas). Care is taken to make sure the driver component of CUDA, libnvcuda.dll/libcuda.so, is not installed, as it should come with graphics driver.

Other changes:

  • Scaffolding for the Vulkan and ROCm backends (both have serious limitations atm)
  • llama.cpp updated to commit ggerganov/llama.cpp@83330d8cd from May 8th
  • Device selection is now meaningful on macOS: "Metal" (new option) and "Auto" do the same thing, "CPU" now uses the CPU backend instead of Metal

While I was working on packaging the CUDA libraries, I cleaned up a lot of junk that was installed but not needed:

  • DLL import libraries on Windows
  • Static libraries, including Kompute and fmtlib
  • Various headers and cmake scripts from Kompute, Vulkan itself, and fmtlib
  • An extra copy of the llmodel lib (in bin/ on Linux, in lib/ on Windows)

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Kompute is no longer working.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
General:
- Proper implementation of gpuDeviceName()
- Make usingGPUDevice() consistent with Kompute impl
- Disable multi-GPU when selecting a specific device (currently: always)

For the bindings:
- Abort instead of segfaulting if multiple LLMs are loaded
- Implement GPU device selection by name/vendor

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This file is part of the graphics driver and should not be bundled with
GPT4All.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
llama.cpp itself is unconditionally built as a static library.
Installing it with the GUI is pointless.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre linked an issue May 10, 2024 that may be closed by this pull request
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre self-assigned this May 15, 2024
@manyoso
Copy link
Collaborator

manyoso commented May 15, 2024

Can you create an offline installer with this change for linux for testing please?

gpt4all-backend/llamamodel.cpp Show resolved Hide resolved
gpt4all-backend/llamamodel.cpp Show resolved Hide resolved
gpt4all-chat/chatllm.cpp Show resolved Hide resolved
This will be a big release, so increment the minor version.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre merged commit d2a99d9 into main May 15, 2024
6 of 19 checks passed
@cebtenzzre cebtenzzre deleted the add-cuda-support branch May 15, 2024 19:27
cebtenzzre added a commit that referenced this pull request May 15, 2024
I don't understand why this is needed, but it works.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 15, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
manyoso pushed a commit that referenced this pull request May 15, 2024
I don't understand why this is needed, but it works.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 15, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 21, 2024
This matters now that #2310 removed the default of "Release" in
llama.cpp.cmake.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 28, 2024
n_ubatch defaults to 512, but as of the latest llama.cpp you cannot pass
more than n_ubatch tokens to the embedding model without hitting an
assertion failure.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 29, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

[Feature] Add support for StarCoder2 model architecture
2 participants