Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error 719 #4563

Closed
3 tasks done
Dyke-F opened this issue Dec 21, 2023 · 3 comments
Closed
3 tasks done

CUDA error 719 #4563

Dyke-F opened this issue Dec 21, 2023 · 3 comments

Comments

@Dyke-F
Copy link

Dyke-F commented Dec 21, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [X ] I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Using Mixtal mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf to summarize scientic article abstracts. Using via./server or via chat examples with Bob. Error occurs in both cases, mostly with medium sized inputs (approx. 500 tokens), even thought context window and number of completion tokens are set accordingly.

Current Behavior

CUDA error 719 at ggml-cuda.cu:8021: unspecified launch failure
current device: 1
GGML_ASSERT: ggml-cuda.cu:8021: !"CUDA error"
[New LWP 2448911]
[New LWP 2448912]
[New LWP 2448924]
[New LWP 2449059]
[New LWP 2449065]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#0 0x00007f40b80f9577 in wait4 () from /lib64/libc.so.6
#1 0x0000000000424a2b in ggml_print_backtrace ()
#2 0x00000000004f4972 in ggml_cuda_op_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&), bool) [clone .constprop.0] ()
#3 0x00000000004f76a0 in ggml_cuda_compute_forward ()
#4 0x0000000000450972 in ggml_graph_compute_thread ()
#5 0x0000000000454ea8 in ggml_graph_compute ()
#6 0x000000000047f1f9 in llama_decode_internal(llama_context&, llama_batch) ()
#7 0x000000000047ff36 in llama_decode ()
#8 0x0000000000418c12 in main ()
[Inferior 1 (process 2448904) detached]
Aborted (core dumped)

Environment and Context

Run on Linux, using all current versions. Failure

@paryska99
Copy link

Same happens on windows with build 1680...
The messege is a little different though.
If I send one word messege in main, then it works. If it's a secound time I send or the initial prompt is a little longer then this error occurs:
CUDA error 1 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: invalid argument
current device: 0
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:8893: !"CUDA error"

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants