Commit f5ef5cf

authored

ggml-cuda : perform cublas mat mul of quantized types as f16 (ggml-org#3412)

* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU

1 parent 40e07a6 commit f5ef5cfCopy full SHA for f5ef5cf

1 file changed

+122

-72

lines changed

ggml-cuda.cu

1 file changed

+122

-72

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit f5ef5cf

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments