CUDA backend #2310

cebtenzzre · 2024-05-06T23:07:16Z

This PR adds opt-in CUDA support in the GPT4All UI and python bindings using the llama.cpp CUDA backend.

CUDA-enabled devices will appear as e.g. "CUDA: Tesla P40" on supported platforms, alongside their Vulkan counterparts. When one is selected, the CUDA backend will be used instead of the Kompute backend.

When CUDA is not available (e.g. a compatible driver or GPU is not installed), CUDA devices simply do not appear. OOM will cause fallback to CPU, just like with Kompute.

The CUDA runtime libraries are installed to the lib/ directory on Linux and Windows (cudart and cublas). Care is taken to make sure the driver component of CUDA, libnvcuda.dll/libcuda.so, is not installed, as it should come with graphics driver.

Other changes:

Scaffolding for the Vulkan and ROCm backends (both have serious limitations atm)
llama.cpp updated to commit ggml-org/llama.cpp@83330d8cd from May 8th
Device selection is now meaningful on macOS: "Metal" (new option) and "Auto" do the same thing, "CPU" now uses the CPU backend instead of Metal

While I was working on packaging the CUDA libraries, I cleaned up a lot of junk that was installed but not needed:

DLL import libraries on Windows
Static libraries, including Kompute and fmtlib
Various headers and cmake scripts from Kompute, Vulkan itself, and fmtlib
An extra copy of the llmodel lib (in bin/ on Linux, in lib/ on Windows)

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Kompute is no longer working. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

General: - Proper implementation of gpuDeviceName() - Make usingGPUDevice() consistent with Kompute impl - Disable multi-GPU when selecting a specific device (currently: always) For the bindings: - Abort instead of segfaulting if multiple LLMs are loaded - Implement GPU device selection by name/vendor Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This file is part of the graphics driver and should not be bundled with GPT4All. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llama.cpp itself is unconditionally built as a static library. Installing it with the GUI is pointless. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso · 2024-05-15T16:44:03Z

Can you create an offline installer with this change for linux for testing please?

gpt4all-backend/llama.cpp.cmake

gpt4all-backend/llamamodel.cpp

gpt4all-chat/chatllm.cpp

This will be a big release, so increment the minor version. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

I don't understand why this is needed, but it works. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

I don't understand why this is needed, but it works. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This matters now that #2310 removed the default of "Release" in llama.cpp.cmake. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

n_ubatch defaults to 512, but as of the latest llama.cpp you cannot pass more than n_ubatch tokens to the embedding model without hitting an assertion failure. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 17 commits April 29, 2024 15:34

remove lingering references to SOM license

dc03f81

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: bring some upstream changes into llama.cpp.cmake

c482bae

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: update llama.cpp and list of supported models

1684f7b

Kompute is no longer working. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: bring more upstream changes into llama.cpp.cmake

dc8cd8c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: remove dependency on internal llama_token_to_piece

75d05c5

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: initial port to Occam's Vulkan backend

1ebd3cf

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: also build CUDA backend by default

ce2164e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: make CUDA build useful

011935e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: add option to build ROCm backend

9e457bf

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cuda: implement device enumeration

c47900e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cmake: don't build CPU variant on Linux/Windows

fef041c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: add a backend field to LLModel::GPUDevice

bbe5cc0

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: select a backend, not a build variant

dc334ae

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: list GPU devices from all backends

b54151f

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

python: implement selectable GPU backend

d4feaeb

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

chat: implement basic UI backend selection

1a10587

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the add-cuda-support branch from c67f868 to 1a10587 Compare May 7, 2024 14:54

cebtenzzre added 3 commits May 7, 2024 12:57

cmake: set RUNPATH of llamamodel-mainline-cuda correctly

a9ffe5a

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: fix kompute-avxonly build

d30d5b2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cuda: fix dependency bundling on Windows

16170f4

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the add-cuda-support branch from 9474231 to 52370d9 Compare May 7, 2024 21:20

ci: install CUDA toolkit

2417105

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the add-cuda-support branch from 52370d9 to 2417105 Compare May 7, 2024 21:27

cuda: ignore libcuda.so.* on Linux

9e8f7c3

This file is part of the graphics driver and should not be bundled with GPT4All. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the add-cuda-support branch from ea6e118 to 9e8f7c3 Compare May 7, 2024 22:31

cebtenzzre added 4 commits May 8, 2024 10:52

ci: install additional deps to make linuxdeployqt happy

9e54277

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cmake: do not ship static llama library

335b129

llama.cpp itself is unconditionally built as a static library. Installing it with the GUI is pointless. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llama.cpp: do not install kompute or fmt

b14992c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cmake: let linuxdeployqt handle CUDA deps on Linux

2b5d6d3

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre linked an issue May 10, 2024 that may be closed by this pull request

[Feature] Add support for StarCoder2 model architecture #2100

Closed

cebtenzzre added 3 commits May 11, 2024 11:22

build_and_run: mention compiler and CUDA

21bc8c8

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

settings: prefix vk devices with "Vulkan: ", update old names

0cbca27

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

kompute: fix device name leaks

8dbe93d

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre self-assigned this May 15, 2024

manyoso reviewed May 15, 2024

View reviewed changes

gpt4all-backend/llama.cpp.cmake Show resolved Hide resolved

manyoso reviewed May 15, 2024

View reviewed changes

gpt4all-backend/llamamodel.cpp Show resolved Hide resolved

manyoso requested changes May 15, 2024

View reviewed changes

gpt4all-backend/llamamodel.cpp Show resolved Hide resolved

gpt4all-backend/llamamodel.cpp Show resolved Hide resolved

gpt4all-chat/chatllm.cpp Show resolved Hide resolved

cebtenzzre added 2 commits May 15, 2024 14:12

chat: bump version to 2.8.0

dbf38b2

This will be a big release, so increment the minor version. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Merge branch 'main' into add-cuda-support

1c09b6d

manyoso approved these changes May 15, 2024

View reviewed changes

cebtenzzre added 2 commits May 15, 2024 15:23

python: update README to reflect CUDA build dependency

c7f8c93

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

python: bump version for CUDA support

5875d83

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre merged commit d2a99d9 into main May 15, 2024
6 of 19 checks passed

cebtenzzre deleted the add-cuda-support branch May 15, 2024 19:27

cebtenzzre added a commit that referenced this pull request May 15, 2024

cmake: fix Metal build after #2310

7c90f2a

I don't understand why this is needed, but it works. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 15, 2024

cmake: fix Metal build after #2310 (#2350)

bf9aaab

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso pushed a commit that referenced this pull request May 15, 2024

cmake: fix Metal build after #2310 (#2350)

a92d266

I don't understand why this is needed, but it works. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 15, 2024

cmake: fix Metal build after #2310 (#2350)

7fe0639

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 21, 2024

ci: explicitly use -DCMAKE_BUILD_TYPE=Release

a275447

This matters now that #2310 removed the default of "Release" in llama.cpp.cmake. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre mentioned this pull request May 21, 2024

ci: fix Python build after CUDA PR #2373

Merged

iimez mentioned this pull request May 26, 2024

typescript bindings maintenance #2363

Merged

cebtenzzre mentioned this pull request May 28, 2024

llamamodel: fix BERT tokenization after llama.cpp update #2381

Merged

cebtenzzre mentioned this pull request May 28, 2024

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

Merged

cebtenzzre added a commit that referenced this pull request May 29, 2024

llamamodel: fix embedding crash for >512 tokens after #2310 (#2383)

e94177e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This was referenced Jun 4, 2024

typescript bindings maintenance iimez/gpt4all#1

Closed

typescript bindings updates, cuda support #2403

Closed

cebtenzzre mentioned this pull request Jun 13, 2024

GPT-J models fail to load unless device is set to "CPU" #2438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA backend #2310

CUDA backend #2310

cebtenzzre commented May 6, 2024 •

edited

Loading

manyoso commented May 15, 2024

CUDA backend #2310

CUDA backend #2310

Conversation

cebtenzzre commented May 6, 2024 • edited Loading

manyoso commented May 15, 2024

cebtenzzre commented May 6, 2024 •

edited

Loading