Bug: GGML assert with bf16, RTX3090

### What happened?
```
./llama-server -ngl 99 -cb -c 65536 -np 32 -m models/Phi-3-mini-128k-instruct/ggml-model-bf16.gguf 
...
GGML_ASSERT: ggml/src/ggml-cuda.cu:1257: to_fp32_cuda != nullptr
[New LWP 934430]
[New LWP 934432]
[New LWP 934433]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fb1ba523c7f in __GI___wait4 (pid=934542, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x00007fb1ba523c7f in __GI___wait4 (pid=934542, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000559119a6c7eb in ggml_print_backtrace ()
#2  0x000055911992c1b5 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) ()
#3  0x000055911992e781 in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), void (*)(float const*, void*, long, long, long, long, ggml_type, CUstream_st*)) ()
#4  0x000055911992f7a5 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#5  0x0000559119933cff in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#6  0x0000559119abb4bb in ggml_backend_sched_graph_compute_async ()
#7  0x0000559119b0d7b0 in llama_decode ()
#8  0x0000559119bcd039 in llama_init_from_gpt_params(gpt_params&) ()
#9  0x0000559119c78495 in server_context::load_model(gpt_params const&) ()
#10 0x0000559119913d7a in main ()
[Inferior 1 (process 934429) detached]
./start_phi.sh: line 1: 934429 Aborted 
```
### Name and Version

./llama-server --version
version: 3265 (72272b83)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

### What operating system are you seeing the problem on?

Linux, Windows

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: GGML assert with bf16, RTX3090 #8234

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: GGML assert with bf16, RTX3090 #8234

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions