[User] CUDA is broken #1756

howard0su · 2023-06-08T11:23:39Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

Crash when I enable two GPU cards. Gabage output when I enable one GPU card.

Two Cards:

PS C:\gpt\llama.cpp> .\build\bin\RelWithDebInfo\main.exe -m ..\en-models\7B\ggml-alpaca-7b-q4.bin -p "what is cuda?" -ngl 40
main: build = 635 (5c64a09)
main: seed  = 1686202333
ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla P100-PCIE-16GB
  Device 1: NVIDIA GeForce GTX 1070
llama.cpp: loading model from ..\en-models\7B\ggml-alpaca-7b-q4.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (Tesla P100-PCIE-16GB) as main device
llama_model_load_internal: mem required  = 1932.71 MB (+ 1026.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: offloading output layer to GPU
llama_model_load_internal: total VRAM used: 3987 MB
...................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 what is cuda?CUDA error 9 at C:\GPT\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

The text was updated successfully, but these errors were encountered:

FlareP1 · 2023-06-08T13:08:27Z

I suggest looking at these issues that are currently being worked on. There is a pull that addresses the gibberish for 1 GPU maybe this also fixes the 2 GPU issue?
Issue #1735
and
Pull #1735

howard0su · 2023-06-08T13:24:08Z

Yeah it fixes this.

howard0su closed this as completed Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[User] CUDA is broken #1756

[User] CUDA is broken #1756

howard0su commented Jun 8, 2023 •

edited

Loading

FlareP1 commented Jun 8, 2023 •

edited

Loading

howard0su commented Jun 8, 2023

[User] CUDA is broken #1756

[User] CUDA is broken #1756

Comments

howard0su commented Jun 8, 2023 • edited Loading

Prerequisites

Current Behavior

FlareP1 commented Jun 8, 2023 • edited Loading

howard0su commented Jun 8, 2023

howard0su commented Jun 8, 2023 •

edited

Loading

FlareP1 commented Jun 8, 2023 •

edited

Loading