Crash with multiple whisper states running at the same time CUDA #2177

bradmit · 2024-05-23T22:43:38Z

I didn't have this issue with 1.5.5 but with 1.6.1 (haven't tried 1.6.0), running multiple whisper_full_with_state ends up with some issues in the cuda back end with freeing up memory?

Relevant part of the stack trace below. I was testing the library with CUDA 12.4 with a new L4 card. Was previously testing with a T4 card. I don't know if that bares any relevance however. I haven't tried 1.6.1 on the T4 test build. Running a single thread has no issue.

#0 0x00007f85d6f88b8f in raise () from /lib64/libc.so.6
#1 0x00007f85d6f5bea5 in abort () from /lib64/libc.so.6
#2 0x00007f8600ad947a in ggml_cuda_pool_vmm::free(void*, unsigned long) () from /opt1/resource/lib/libwhisper.so
#3 0x00007f8600ad1f00 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) () from /opt1/resource/lib/libwhisper.so
#4 0x00007f8600ad4bdc in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_backend_cuda_context&, ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), bool) ()
from /opt1/resource/lib/libwhisper.so
#5 0x00007f8600ad5c79 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt1/resource/lib/libwhisper.so
#6 0x00007f8600ad7eee in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt1/resource/lib/libwhisper.so
#7 0x00007f8600bd2b39 in ggml_backend_graph_compute () from /opt1/resource/lib/libwhisper.so
#8 0x00007f8600c217e1 in whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) () from /opt1/resource/lib/libwhisper.so
#9 0x00007f8600c2194f in whisper_encode_with_state () from /opt1/resource/lib/libwhisper.so
#10 0x00007f8600c267f3 in whisper_lang_auto_detect_with_state () from /opt1/resource/lib/libwhisper.so
#11 0x00007f8600c345d5 in whisper_full_with_state () from /opt1/resource/lib/libwhisper.so

ggerganov · 2024-05-25T07:56:07Z

@bradmit Could you check if #2182 resolves the issues?

bradmit · 2024-05-27T02:20:31Z

Doesn't look like it. I downloaded the master branch and used that...

[Current thread is 1 (Thread 0x7fef323f1000 (LWP 3420))]
Missing separate debuginfos, use: yum debuginfo-install boost-date-time-1.66.0-13.el8.x86_64 bzip2-libs-1.0.6-26.el8.x86_64 glibc-2.28-236.0.1.el8_9.12.x86_64 libgcc-8.5.0-20.0.3.el8.x86_64 libstdc++-8.5.0-20.0.3.el8.x86_64 libxml2-2.9.7-18.el8_9.x86_64 xz-libs-5.2.4-4.el8_6.x86_64 zlib-1.2.11-25.el8.x86_64
(gdb) where
#0 0x00007ff10dd25b8f in raise () from /lib64/libc.so.6
#1 0x00007ff10dcf8ea5 in abort () from /lib64/libc.so.6
#2 0x00007ff13787647a in ggml_cuda_pool_vmm::free(void*, unsigned long) () from /opt1/resource/lib/libwhisper.so
#3 0x00007ff13786ef00 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) () from /opt1/resource/lib/libwhisper.so
#4 0x00007ff137871bdc in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_backend_cuda_context&, ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), bool) () from /opt1/resource/lib/libwhisper.so
#5 0x00007ff137872c79 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt1/resource/lib/libwhisper.so
#6 0x00007ff137874eee in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt1/resource/lib/libwhisper.so
#7 0x00007ff13796fb39 in ggml_backend_graph_compute () from /opt1/resource/lib/libwhisper.so
#8 0x00007ff1379be838 in whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) () from /opt1/resource/lib/libwhisper.so
#9 0x00007ff1379be94f in whisper_encode_with_state () from /opt1/resource/lib/libwhisper.so
#10 0x00007ff1379c37f3 in whisper_lang_auto_detect_with_state () from /opt1/resource/lib/libwhisper.so
#11 0x00007ff1379d15d5 in whisper_full_with_state () from /opt1/resource/lib/libwhisper.so

ggerganov · 2024-05-27T07:08:01Z

It's not merged, so you need to use the gg/backend-per-state branch

bradmit · 2024-05-27T07:18:40Z

My mistake. The branch is good. No crash.

ggerganov mentioned this issue May 25, 2024

Revert "whisper : remove extra backend instance (huh?)" #2182

Merged

ggerganov closed this as completed in #2182 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash with multiple whisper states running at the same time CUDA #2177

Crash with multiple whisper states running at the same time CUDA #2177

bradmit commented May 23, 2024

ggerganov commented May 25, 2024

bradmit commented May 27, 2024

ggerganov commented May 27, 2024

bradmit commented May 27, 2024

Crash with multiple whisper states running at the same time CUDA #2177

Crash with multiple whisper states running at the same time CUDA #2177

Comments

bradmit commented May 23, 2024

ggerganov commented May 25, 2024

bradmit commented May 27, 2024

ggerganov commented May 27, 2024

bradmit commented May 27, 2024