Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken #3276

Closed
ggerganov opened this issue Sep 20, 2023 · 0 comments · Fixed by #4705
Closed

metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken #3276

ggerganov opened this issue Sep 20, 2023 · 0 comments · Fixed by #4705
Labels
bug Something isn't working

Comments

@ggerganov
Copy link
Member

The following commands fail to generate coherent text:

LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q4_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1

LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q3_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1

It works on the CPU (Arm and x86).
It also works with the following patch:

diff --git a/ggml-metal.m b/ggml-metal.m
index 1139ee3..ed9857f 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -889,7 +889,7 @@ void ggml_metal_graph_compute(
                                 src1t == GGML_TYPE_F32 &&
                                 [ctx->device supportsFamily:MTLGPUFamilyApple7] &&
                                 ne00%32 == 0 &&
-                                ne11 > 1) {
+                                ne11 >= 1) {
                                 switch (src0->type) {
                                     case GGML_TYPE_F32:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f32_f32];  break;
                                     case GGML_TYPE_F16:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f16_f32];  break;

So it seems the issue is in the kernel_mul_mat_q4_K_f32 kernel in the QK_K == 64 branch:

https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364ce0aff98dbefe9bd9c3721cc9/ggml-metal.metal#L1576-L1663

Might have been broken with #2615 , but I haven't tested this yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant