Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with multiple whisper states running at the same time CUDA #2177

Closed
bradmit opened this issue May 23, 2024 · 4 comments · Fixed by #2182
Closed

Crash with multiple whisper states running at the same time CUDA #2177

bradmit opened this issue May 23, 2024 · 4 comments · Fixed by #2182

Comments

@bradmit
Copy link
Contributor

bradmit commented May 23, 2024

I didn't have this issue with 1.5.5 but with 1.6.1 (haven't tried 1.6.0), running multiple whisper_full_with_state ends up with some issues in the cuda back end with freeing up memory?

Relevant part of the stack trace below. I was testing the library with CUDA 12.4 with a new L4 card. Was previously testing with a T4 card. I don't know if that bares any relevance however. I haven't tried 1.6.1 on the T4 test build. Running a single thread has no issue.

#0 0x00007f85d6f88b8f in raise () from /lib64/libc.so.6
#1 0x00007f85d6f5bea5 in abort () from /lib64/libc.so.6
#2 0x00007f8600ad947a in ggml_cuda_pool_vmm::free(void*, unsigned long) () from /opt1/resource/lib/libwhisper.so
#3 0x00007f8600ad1f00 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) () from /opt1/resource/lib/libwhisper.so
#4 0x00007f8600ad4bdc in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_backend_cuda_context&, ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), bool) ()
from /opt1/resource/lib/libwhisper.so
#5 0x00007f8600ad5c79 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt1/resource/lib/libwhisper.so
#6 0x00007f8600ad7eee in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt1/resource/lib/libwhisper.so
#7 0x00007f8600bd2b39 in ggml_backend_graph_compute () from /opt1/resource/lib/libwhisper.so
#8 0x00007f8600c217e1 in whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) () from /opt1/resource/lib/libwhisper.so
#9 0x00007f8600c2194f in whisper_encode_with_state () from /opt1/resource/lib/libwhisper.so
#10 0x00007f8600c267f3 in whisper_lang_auto_detect_with_state () from /opt1/resource/lib/libwhisper.so
#11 0x00007f8600c345d5 in whisper_full_with_state () from /opt1/resource/lib/libwhisper.so

@ggerganov
Copy link
Owner

@bradmit Could you check if #2182 resolves the issues?

@bradmit
Copy link
Contributor Author

bradmit commented May 27, 2024

Doesn't look like it. I downloaded the master branch and used that...

[Current thread is 1 (Thread 0x7fef323f1000 (LWP 3420))]
Missing separate debuginfos, use: yum debuginfo-install boost-date-time-1.66.0-13.el8.x86_64 bzip2-libs-1.0.6-26.el8.x86_64 glibc-2.28-236.0.1.el8_9.12.x86_64 libgcc-8.5.0-20.0.3.el8.x86_64 libstdc++-8.5.0-20.0.3.el8.x86_64 libxml2-2.9.7-18.el8_9.x86_64 xz-libs-5.2.4-4.el8_6.x86_64 zlib-1.2.11-25.el8.x86_64
(gdb) where
#0 0x00007ff10dd25b8f in raise () from /lib64/libc.so.6
#1 0x00007ff10dcf8ea5 in abort () from /lib64/libc.so.6
#2 0x00007ff13787647a in ggml_cuda_pool_vmm::free(void*, unsigned long) () from /opt1/resource/lib/libwhisper.so
#3 0x00007ff13786ef00 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) () from /opt1/resource/lib/libwhisper.so
#4 0x00007ff137871bdc in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_backend_cuda_context&, ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), bool) () from /opt1/resource/lib/libwhisper.so
#5 0x00007ff137872c79 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt1/resource/lib/libwhisper.so
#6 0x00007ff137874eee in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt1/resource/lib/libwhisper.so
#7 0x00007ff13796fb39 in ggml_backend_graph_compute () from /opt1/resource/lib/libwhisper.so
#8 0x00007ff1379be838 in whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) () from /opt1/resource/lib/libwhisper.so
#9 0x00007ff1379be94f in whisper_encode_with_state () from /opt1/resource/lib/libwhisper.so
#10 0x00007ff1379c37f3 in whisper_lang_auto_detect_with_state () from /opt1/resource/lib/libwhisper.so
#11 0x00007ff1379d15d5 in whisper_full_with_state () from /opt1/resource/lib/libwhisper.so

@ggerganov
Copy link
Owner

It's not merged, so you need to use the gg/backend-per-state branch

@bradmit
Copy link
Contributor Author

bradmit commented May 27, 2024

My mistake. The branch is good. No crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants