Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm port - compilation ok, but garbage output when run. #1453

Closed
mega-ice opened this issue Nov 8, 2023 · 5 comments
Closed

ROCm port - compilation ok, but garbage output when run. #1453

mega-ice opened this issue Nov 8, 2023 · 5 comments

Comments

@mega-ice
Copy link

mega-ice commented Nov 8, 2023

When compiling with hipBLAS support for ROCm, the test run takes very long time and produces garbage.
(I have working llama.cpp / exllamav2 / etc.. ROCm implementations on the same machine. Ubuntu, ROCm 7.1, RDNA3 amd card)

I have tested both cmake and make build options. I tried the --debug-mode switch, but no log was generated.
P.S. When monitoring GPU VRAM usage, it appears that only 1/3 of the model size is loaded.

run output:

./bin/main -m models/ggml-large-v2.bin -f samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load: model ctx     = 2951.27 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7800 XT, compute capability 11.0
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   40.47 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   59.40 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.980 --> 00:00:30.000]     later  ... f  as-  awe lo .  ... amP go cl g    ... l 'm   -  acer   Pittsburgh   l


whisper_print_timings:     load time =  3813.30 ms
whisper_print_timings:     fallbacks =   2 p /   0 h
whisper_print_timings:      mel time =    14.47 ms
whisper_print_timings:   sample time =   597.67 ms /  1100 runs (    0.54 ms per run)
whisper_print_timings:   encode time =  3796.02 ms /     1 runs ( 3796.02 ms per run)
whisper_print_timings:   decode time = 73741.14 ms /  1095 runs (   67.34 ms per run)
whisper_print_timings:   prompt time =   465.94 ms /     3 runs (  155.31 ms per run)
whisper_print_timings:    total time = 82719.89 ms

compilation output:

Details

/ai/whisper_project/whisper.cpp$ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DWHISPER_HIPBLAS=ON

-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ice/ai/whisper_project/whisper.cpp
ice@ubuntu:~/ai/whisper_project/whisper.cpp$ cmake --build .
[ 6%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:242:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:252:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:262:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:266:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:474:75: warning: suggest braces around initialization of subobject [-Wmissing-braces]
static cudaStream_t g_cudaStreams[GGML_CUDA_MAX_DEVICES][MAX_STREAMS] = { nullptr };
^~~~~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2235:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2235:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2256:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2245:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * restrict vx, int * restrict x_ql, half2 * restrict x_dm, int * restrict x_qh,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2246:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * restrict x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2296:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2292:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2292:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2329:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2329:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2339:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * restrict vx, int * restrict x_ql, half2 * restrict x_dm, int * restrict x_qh,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2340:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * restrict x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2383:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2383:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2421:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2421:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2431:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * restrict vx, int * restrict x_ql, half2 * restrict x_dm, int * restrict x_qh,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2432:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * restrict x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2495:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2495:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * restrict x_ql, const half2 * restrict x_dm, const int * restrict x_qh, const int * restrict x_sc,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2535:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2535:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template static device forceinline void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
...
redacted
...

/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3626:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5169:9: note: in instantiation of function template specialization 'mul_mat_q4_0' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3625:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_0, 64, 64, 8, &allocate_tiles_q4_0, &load_tiles_q4_0, 4, &vec_dot_q4_0_q8_1_mul_mat>' requested here
mul_mat_q<QK4_0, QR4_0, QI4_0, true, block_q4_0, mmq_x, mmq_y, nwarps, allocate_tiles_q4_0<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5169:9: note: in instantiation of function template specialization 'mul_mat_q4_0' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3695:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5210:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3694:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_1, 64, 64, 8, &allocate_tiles_q4_1, &load_tiles_q4_1, 4, &vec_dot_q4_1_q8_1_mul_mat>' requested here
mul_mat_q<QK4_1, QR4_1, QI4_1, true, block_q4_1, mmq_x, mmq_y, nwarps, allocate_tiles_q4_1<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5210:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3695:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5214:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3694:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_1, 64, 64, 8, &allocate_tiles_q4_1, &load_tiles_q4_1, 4, &vec_dot_q4_1_q8_1_mul_mat>' requested here
mul_mat_q<QK4_1, QR4_1, QI4_1, true, block_q4_1, mmq_x, mmq_y, nwarps, allocate_tiles_q4_1<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5214:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3762:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5255:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3761:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, false, block_q5_0, 64, 64, 8, &allocate_tiles_q5_0, &load_tiles_q5_0, 4, &vec_dot_q5_0_q8_1_mul_mat>' requested here
mul_mat_q<QK5_0, QR5_0, QI5_0, false, block_q5_0, mmq_x, mmq_y, nwarps, allocate_tiles_q5_0<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5255:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3762:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5259:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3761:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, false, block_q5_0, 64, 64, 8, &allocate_tiles_q5_0, &load_tiles_q5_0, 4, &vec_dot_q5_0_q8_1_mul_mat>' requested here
mul_mat_q<QK5_0, QR5_0, QI5_0, false, block_q5_0, mmq_x, mmq_y, nwarps, allocate_tiles_q5_0<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5259:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3829:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5300:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3828:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q5_1, 64, 64, 8, &allocate_tiles_q5_1, &load_tiles_q5_1, 4, &vec_dot_q5_1_q8_1_mul_mat>' requested here
mul_mat_q<QK5_1, QR5_1, QI5_1, true, block_q5_1, mmq_x, mmq_y, nwarps, allocate_tiles_q5_1<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5300:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3829:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5304:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3828:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q5_1, 64, 64, 8, &allocate_tiles_q5_1, &load_tiles_q5_1, 4, &vec_dot_q5_1_q8_1_mul_mat>' requested here
mul_mat_q<QK5_1, QR5_1, QI5_1, true, block_q5_1, mmq_x, mmq_y, nwarps, allocate_tiles_q5_1<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5304:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2663:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3896:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5345:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3895:5: note: in instantiation of function template specialization 'mul_mat_q<32, 1, 8, false, block_q8_0, 64, 64, 8, &allocate_tiles_q8_0, &load_tiles_q8_0, 8, &vec_dot_q8_0_q8_1_mul_mat>' requested here
mul_mat_q<QK8_0, QR8_0, QI8_0, false, block_q8_0, mmq_x, mmq_y, nwarps, allocate_tiles_q8_0<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5345:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2663:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3896:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5349:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3895:5: note: in instantiation of function template specialization 'mul_mat_q<32, 1, 8, false, block_q8_0, 64, 64, 8, &allocate_tiles_q8_0, &load_tiles_q8_0, 8, &vec_dot_q8_0_q8_1_mul_mat>' requested here
mul_mat_q<QK8_0, QR8_0, QI8_0, false, block_q8_0, mmq_x, mmq_y, nwarps, allocate_tiles_q8_0<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5349:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2753:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3963:9: note: in instantiation of function template specialization 'load_tiles_q2_K<32, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5390:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3962:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q2_K, 128, 32, 8, &allocate_tiles_q2_K, &load_tiles_q2_K, 2, &vec_dot_q2_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR2_K, QI2_K, false, block_q2_K, mmq_x, mmq_y, nwarps, allocate_tiles_q2_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5390:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2753:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3963:9: note: in instantiation of function template specialization 'load_tiles_q2_K<32, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5394:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3962:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q2_K, 128, 32, 8, &allocate_tiles_q2_K, &load_tiles_q2_K, 2, &vec_dot_q2_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR2_K, QI2_K, false, block_q2_K, mmq_x, mmq_y, nwarps, allocate_tiles_q2_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5394:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2874:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4032:9: note: in instantiation of function template specialization 'load_tiles_q3_K<128, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5437:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4031:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q3_K, 32, 128, 8, &allocate_tiles_q3_K, &load_tiles_q3_K, 2, &vec_dot_q3_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR3_K, QI3_K, false, block_q3_K, mmq_x, mmq_y, nwarps, allocate_tiles_q3_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5437:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2874:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4032:9: note: in instantiation of function template specialization 'load_tiles_q3_K<128, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5441:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4031:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q3_K, 32, 128, 8, &allocate_tiles_q3_K, &load_tiles_q3_K, 2, &vec_dot_q3_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR3_K, QI3_K, false, block_q3_K, mmq_x, mmq_y, nwarps, allocate_tiles_q3_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5441:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3092:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4101:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5483:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3137:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4100:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q4_K, 32, 64, 8, &allocate_tiles_q4_K, &load_tiles_q4_K, 8, &vec_dot_q4_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR4_K, QI4_K, true, block_q4_K, mmq_x, mmq_y, nwarps, allocate_tiles_q4_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5483:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3092:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4101:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5487:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3137:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4100:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q4_K, 32, 64, 8, &allocate_tiles_q4_K, &load_tiles_q4_K, 8, &vec_dot_q4_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR4_K, QI4_K, true, block_q4_K, mmq_x, mmq_y, nwarps, allocate_tiles_q4_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5487:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3273:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4168:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5528:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3329:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4167:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q5_K, 32, 64, 8, &allocate_tiles_q5_K, &load_tiles_q5_K, 8, &vec_dot_q5_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR5_K, QI5_K, true, block_q5_K, mmq_x, mmq_y, nwarps, allocate_tiles_q5_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5528:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3273:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4168:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5532:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3329:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4167:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q5_K, 32, 64, 8, &allocate_tiles_q5_K, &load_tiles_q5_K, 8, &vec_dot_q5_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR5_K, QI5_K, true, block_q5_K, mmq_x, mmq_y, nwarps, allocate_tiles_q5_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5532:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3402:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4237:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5573:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4236:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, false, block_q6_K, 32, 64, 8, &allocate_tiles_q6_K, &load_tiles_q6_K, 8, &vec_dot_q6_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR6_K, QI6_K, false, block_q6_K, mmq_x, mmq_y, nwarps, allocate_tiles_q6_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5573:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3402:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4237:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5577:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces]
float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f};
^~~~
{ }
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4236:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, false, block_q6_K, 32, 64, 8, &allocate_tiles_q6_K, &load_tiles_q6_K, 8, &vec_dot_q6_K_q8_1_mul_mat>' requested here
mul_mat_q<QK_K, QR6_K, QI6_K, false, block_q6_K, mmq_x, mmq_y, nwarps, allocate_tiles_q6_K<mmq_y>,
^
/home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5577:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
107 warnings generated when compiling for host.
[ 6%] Built target ggml-rocm
[ 12%] Building C object CMakeFiles/whisper.dir/ggml.c.o
[ 18%] Building C object CMakeFiles/whisper.dir/ggml-alloc.c.o
[ 25%] Building C object CMakeFiles/whisper.dir/ggml-backend.c.o
[ 31%] Building C object CMakeFiles/whisper.dir/ggml-quants.c.o
[ 37%] Building CXX object CMakeFiles/whisper.dir/whisper.cpp.o
/home/ice/ai/whisper_project/whisper.cpp/whisper.cpp:161:29: warning: unused function 'ggml_mul_mat_pad' [-Wunused-function]
static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
^
1 warning generated.
[ 43%] Linking CXX shared library libwhisper.so
[ 43%] Built target whisper
[ 50%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 56%] Building CXX object examples/CMakeFiles/common.dir/common-ggml.cpp.o
[ 62%] Linking CXX static library libcommon.a
[ 62%] Built target common
[ 68%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 75%] Linking CXX executable ../../bin/main
[ 75%] Built target main
[ 81%] Building CXX object examples/bench/CMakeFiles/bench.dir/bench.cpp.o
[ 87%] Linking CXX executable ../../bin/bench
[ 87%] Built target bench
[ 93%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
[100%] Linking CXX executable ../../bin/quantize
[100%] Built target quantize

@mega-ice mega-ice mentioned this issue Nov 8, 2023
@mkiol
Copy link
Contributor

mkiol commented Nov 8, 2023

I can confirm that on RX6950XT it works fine. I use ROCm 5.6.

CMake by default generates compilation for the following architectures:

  • gfx900
  • gfx906
  • gfx908
  • gfx90a
  • gfx1030

You can add more with CMAKE_CXX_FLAGS, for instance:

cmake -DWHISPER_HIPBLAS=ON -DCMAKE_CXX_FLAGS="--offload-arch=gfx1100 --offload-arch=gfx1102 --offload-arch=gfx1103"

Maybe the problem is in wrong GPU architecture?

@ardfork
Copy link
Contributor

ardfork commented Nov 9, 2023

Tested latest commit (953419c) both with Makefile and CMake on my machine with the same sample and got correct result.

As for what mkiol mentioned, Makefile should use your correct arch. CMake defaults to multiple arch, not sure if AMD included gfx1100 in current ROCm version. I Believe that the best way to force it is to add -DAMDGPU_TARGETS='gfx1100'. You can check for which ISA you compiled with roc-obj-ls libwhisper.so.

@mega-ice
Copy link
Author

I did a clean compile again and everything works even without the gpu architecture flag specification (which is what I was using originally). Most likely it was a glitch in my system, because I used different implementations of LLMs before compiling whisper.cpp. So let this thread be a warning for people who don't try to restart when the program behaves strangely. :-$

but anyway, thanks for the ideas! :)

@ccbadd
Copy link

ccbadd commented Jun 15, 2024

What is the command line to compile for rocm using make?

@mega-ice
Copy link
Author

What is the command line to compile for rocm using make?

A lot of code has changed since the end of last year. Compiling with make is no longer possible (not all targets have been defined).
As for cmake, I am unable to compile for HIPBLAS due to the cuda flash attention code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants