CUDA error 719 #4563

Dyke-F · 2023-12-21T13:59:35Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X ] I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Using Mixtal mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf to summarize scientic article abstracts. Using via./server or via chat examples with Bob. Error occurs in both cases, mostly with medium sized inputs (approx. 500 tokens), even thought context window and number of completion tokens are set accordingly.

Current Behavior

CUDA error 719 at ggml-cuda.cu:8021: unspecified launch failure
current device: 1
GGML_ASSERT: ggml-cuda.cu:8021: !"CUDA error"
[New LWP 2448911]
[New LWP 2448912]
[New LWP 2448924]
[New LWP 2449059]
[New LWP 2449065]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#0 0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#1 0x0000000000424a2b in ggml_print_backtrace ()
#2 0x00000000004f4972 in ggml_cuda_op_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&), bool) [clone .constprop.0] ()
#3 0x00000000004f76a0 in ggml_cuda_compute_forward ()
#4 0x0000000000450972 in ggml_graph_compute_thread ()
#5 0x0000000000454ea8 in ggml_graph_compute ()
#6 0x000000000047f1f9 in llama_decode_internal(llama_context&, llama_batch) ()
#7 0x000000000047ff36 in llama_decode ()
#8 0x0000000000418c12 in main ()
[Inferior 1 (process 2448904) detached]
Aborted (core dumped)

Environment and Context

Run on Linux, using all current versions. Failure

The text was updated successfully, but these errors were encountered:

paryska99 · 2023-12-22T03:05:44Z

Same happens on windows with build 1680...
The messege is a little different though.
If I send one word messege in main, then it works. If it's a secound time I send or the initial prompt is a little longer then this error occurs:
CUDA error 1 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: invalid argument
current device: 0
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: !"CUDA error"

github-actions · 2024-03-18T01:35:39Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-04-02T01:10:19Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Dyke-F added the bug-unconfirmed label Dec 21, 2023

Ttl mentioned this issue Dec 22, 2023

Fix CudaMemcpy direction #4599

Merged

github-actions bot added the stale label Mar 18, 2024

github-actions bot closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error 719 #4563

CUDA error 719 #4563

Dyke-F commented Dec 21, 2023

paryska99 commented Dec 22, 2023

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024

CUDA error 719 #4563

CUDA error 719 #4563

Comments

Dyke-F commented Dec 21, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

paryska99 commented Dec 22, 2023

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024