You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Using Mixtal mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf to summarize scientic article abstracts. Using via./server or via chat examples with Bob. Error occurs in both cases, mostly with medium sized inputs (approx. 500 tokens), even thought context window and number of completion tokens are set accordingly.
Current Behavior
CUDA error 719 at ggml-cuda.cu:8021: unspecified launch failure
current device: 1
GGML_ASSERT: ggml-cuda.cu:8021: !"CUDA error"
[New LWP 2448911]
[New LWP 2448912]
[New LWP 2448924]
[New LWP 2449059]
[New LWP 2449065]
This GDB supports auto-downloading debuginfo from the following URLs: https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#0 0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6 #1 0x0000000000424a2b in ggml_print_backtrace () #2 0x00000000004f4972 in ggml_cuda_op_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&), bool) [clone .constprop.0] () #3 0x00000000004f76a0 in ggml_cuda_compute_forward () #4 0x0000000000450972 in ggml_graph_compute_thread () #5 0x0000000000454ea8 in ggml_graph_compute () #6 0x000000000047f1f9 in llama_decode_internal(llama_context&, llama_batch) () #7 0x000000000047ff36 in llama_decode () #8 0x0000000000418c12 in main ()
[Inferior 1 (process 2448904) detached]
Aborted (core dumped)
Environment and Context
Run on Linux, using all current versions. Failure
The text was updated successfully, but these errors were encountered:
Same happens on windows with build 1680...
The messege is a little different though.
If I send one word messege in main, then it works. If it's a secound time I send or the initial prompt is a little longer then this error occurs:
CUDA error 1 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: invalid argument
current device: 0
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: !"CUDA error"
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Using Mixtal mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf to summarize scientic article abstracts. Using via./server or via chat examples with Bob. Error occurs in both cases, mostly with medium sized inputs (approx. 500 tokens), even thought context window and number of completion tokens are set accordingly.
Current Behavior
CUDA error 719 at ggml-cuda.cu:8021: unspecified launch failure
current device: 1
GGML_ASSERT: ggml-cuda.cu:8021: !"CUDA error"
[New LWP 2448911]
[New LWP 2448912]
[New LWP 2448924]
[New LWP 2449059]
[New LWP 2449065]
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#0 0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#1 0x0000000000424a2b in ggml_print_backtrace ()
#2 0x00000000004f4972 in ggml_cuda_op_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&), bool) [clone .constprop.0] ()
#3 0x00000000004f76a0 in ggml_cuda_compute_forward ()
#4 0x0000000000450972 in ggml_graph_compute_thread ()
#5 0x0000000000454ea8 in ggml_graph_compute ()
#6 0x000000000047f1f9 in llama_decode_internal(llama_context&, llama_batch) ()
#7 0x000000000047ff36 in llama_decode ()
#8 0x0000000000418c12 in main ()
[Inferior 1 (process 2448904) detached]
Aborted (core dumped)
Environment and Context
Run on Linux, using all current versions. Failure
The text was updated successfully, but these errors were encountered: