Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal : small-batch mat-mul kernels #10581

Merged
merged 4 commits into from
Dec 3, 2024
Merged

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 29, 2024

Improve Metal performance for small batch sizes (<= 8)

make -j && ./bin/llama-bench -m ../models/qwen2.5-3b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-3b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-7b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-14b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_0.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_1.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q5_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q6_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-f16.gguf -p 1,2,3,4,5,6,7,8 -n 0 -fa 1 -t 1
M2 Ultra
model size backend fa test t/s t/s speedup
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp1 103.51 ± 0.44 101.58 ± 3.72 0.98
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp2 148.82 ± 4.18 141.95 ± 4.29 0.95
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp3 181.31 ± 2.40 173.71 ± 4.32 0.96
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp4 195.01 ± 5.86 201.25 ± 1.53 1.03
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp5 91.39 ± 0.28 209.06 ± 4.30 2.29
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp6 110.09 ± 1.10 223.83 ± 3.34 2.03
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp7 128.26 ± 1.98 229.85 ± 4.12 1.79
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp8 144.31 ± 1.91 260.05 ± 3.82 1.80
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp1 100.27 ± 0.40 98.12 ± 2.25 0.98
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp2 150.89 ± 1.07 145.80 ± 0.34 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp3 184.53 ± 0.53 177.67 ± 2.02 0.96
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp4 198.12 ± 0.45 205.09 ± 0.89 1.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp5 93.77 ± 0.07 215.70 ± 1.41 2.30
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp6 112.26 ± 0.16 229.39 ± 1.35 2.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp7 130.85 ± 0.10 235.22 ± 1.83 1.80
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp8 149.93 ± 0.15 269.63 ± 1.84 1.80
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp1 91.69 ± 1.38 86.70 ± 2.47 0.95
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp2 122.44 ± 0.16 121.30 ± 1.36 0.99
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp3 142.49 ± 0.17 141.52 ± 0.76 0.99
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp4 151.12 ± 0.22 156.37 ± 1.58 1.03
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp5 79.39 ± 0.04 177.85 ± 1.30 2.24
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp6 95.17 ± 0.25 179.18 ± 1.31 1.88
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp7 111.06 ± 0.34 175.66 ± 1.04 1.58
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp8 126.79 ± 0.12 198.07 ± 1.00 1.56
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp1 72.53 ± 0.23 69.08 ± 2.89 0.95
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp2 93.73 ± 0.28 92.89 ± 0.93 0.99
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp3 104.63 ± 0.26 103.75 ± 0.60 0.99
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp4 108.82 ± 0.17 146.24 ± 0.52 1.34
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp5 73.65 ± 0.09 167.83 ± 0.84 2.28
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp6 88.14 ± 0.14 165.03 ± 0.61 1.87
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp7 102.66 ± 0.12 162.18 ± 0.68 1.58
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp8 117.24 ± 0.15 182.82 ± 0.85 1.56
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp1 78.30 ± 0.16 76.63 ± 2.31 0.98
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp2 101.74 ± 0.13 101.38 ± 1.12 1.00
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp3 114.76 ± 0.23 113.90 ± 0.71 0.99
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp4 118.91 ± 0.09 153.46 ± 1.50 1.29
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp5 75.34 ± 0.05 172.64 ± 0.95 2.29
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp6 90.45 ± 0.20 170.42 ± 0.99 1.88
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp7 105.34 ± 0.18 168.04 ± 0.97 1.60
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp8 120.47 ± 0.14 186.88 ± 0.66 1.55
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp1 87.77 ± 0.42 84.21 ± 1.19 0.96
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp2 112.54 ± 0.15 124.63 ± 1.17 1.11
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp3 130.12 ± 0.40 141.64 ± 0.74 1.09
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp4 139.27 ± 0.45 158.47 ± 1.28 1.14
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp5 81.86 ± 0.15 185.89 ± 0.69 2.27
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp6 97.98 ± 0.11 182.10 ± 0.21 1.86
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp7 113.93 ± 0.08 178.31 ± 0.21 1.57
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp8 130.36 ± 0.46 205.57 ± 0.37 1.58
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp1 69.44 ± 1.83 70.41 ± 0.45 1.01
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp2 96.25 ± 2.72 126.66 ± 0.20 1.32
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp3 111.80 ± 2.43 163.08 ± 0.68 1.46
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp4 118.85 ± 0.49 200.40 ± 0.41 1.69
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp5 89.39 ± 0.62 207.24 ± 0.26 2.32
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp6 107.13 ± 0.62 210.91 ± 0.39 1.97
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp7 124.46 ± 1.41 229.20 ± 0.05 1.84
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp8 142.26 ± 1.47 260.76 ± 0.27 1.83
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp1 39.48 ± 0.55 39.50 ± 0.67 1.00
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp2 49.12 ± 0.69 83.67 ± 1.24 1.70
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp3 52.16 ± 0.34 117.29 ± 2.09 2.25
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp4 60.95 ± 0.43 149.43 ± 2.68 2.45
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp5 90.45 ± 0.11 174.40 ± 3.87 1.93
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp6 108.13 ± 0.28 150.62 ± 3.31 1.39
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp7 125.99 ± 1.45 170.60 ± 3.20 1.35
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp8 143.42 ± 2.57 193.36 ± 3.40 1.35
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp1 55.74 ± 0.18 55.56 ± 0.18 1.00
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp2 81.41 ± 0.11 76.23 ± 0.17 0.94
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp3 99.10 ± 0.15 96.47 ± 0.24 0.97
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp4 106.32 ± 0.08 112.64 ± 0.36 1.06
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp5 48.50 ± 0.06 116.69 ± 0.11 2.41
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp6 58.17 ± 0.17 125.42 ± 0.17 2.16
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp7 67.79 ± 0.16 128.60 ± 0.12 1.90
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp8 77.46 ± 0.22 146.78 ± 0.17 1.89
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp1 100.01 ± 0.31 99.96 ± 0.51 1.00
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp2 150.73 ± 0.50 146.20 ± 0.33 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp3 184.06 ± 0.62 178.79 ± 0.29 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp4 198.40 ± 0.43 206.25 ± 0.51 1.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp5 94.08 ± 0.06 217.42 ± 0.88 2.31
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp6 112.73 ± 0.06 230.23 ± 0.53 2.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp7 130.81 ± 0.05 236.16 ± 0.16 1.81
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp8 149.56 ± 0.14 270.31 ± 0.35 1.81
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp1 49.75 ± 0.79 49.94 ± 0.19 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp2 67.93 ± 0.15 67.93 ± 0.15 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp3 78.51 ± 0.16 78.38 ± 0.14 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp4 83.13 ± 0.42 82.00 ± 0.12 0.99
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp5 40.19 ± 0.13 89.64 ± 0.04 2.23
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp6 48.05 ± 0.04 95.06 ± 0.13 1.98
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp7 55.96 ± 0.06 90.80 ± 0.07 1.62
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp8 63.98 ± 0.06 101.02 ± 0.17 1.58
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp1 40.31 ± 0.42 40.25 ± 0.20 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp2 52.07 ± 0.04 51.84 ± 0.14 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp3 56.94 ± 0.34 56.77 ± 0.35 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp4 58.97 ± 0.08 77.26 ± 0.12 1.31
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp5 36.78 ± 0.06 86.95 ± 0.09 2.36
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp6 44.11 ± 0.10 88.79 ± 0.22 2.01
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp7 51.27 ± 0.07 86.39 ± 0.16 1.69
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp8 58.59 ± 0.05 97.49 ± 0.10 1.66
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp1 42.24 ± 0.17 42.06 ± 0.15 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp2 52.53 ± 0.03 52.47 ± 0.12 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp3 57.91 ± 0.04 57.70 ± 0.04 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp4 59.81 ± 0.08 79.01 ± 0.09 1.32
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp5 38.59 ± 0.01 87.93 ± 0.10 2.28
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp6 46.36 ± 0.10 91.72 ± 0.29 1.98
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp7 53.99 ± 0.14 88.80 ± 0.15 1.64
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp8 61.65 ± 0.10 97.00 ± 0.24 1.57
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp1 43.42 ± 0.15 43.43 ± 0.08 1.00
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp2 59.08 ± 0.12 64.13 ± 0.11 1.09
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp3 67.03 ± 0.07 73.73 ± 0.17 1.10
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp4 71.00 ± 0.18 82.61 ± 0.29 1.16
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp5 42.53 ± 0.06 97.26 ± 0.15 2.29
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp6 50.93 ± 0.06 92.80 ± 0.28 1.82
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp7 59.31 ± 0.12 88.06 ± 0.49 1.48
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp8 67.61 ± 0.03 100.83 ± 0.58 1.49
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp1 35.42 ± 0.65 35.47 ± 0.43 1.00
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp2 48.77 ± 0.65 66.12 ± 0.27 1.36
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp3 54.75 ± 0.47 86.90 ± 0.78 1.59
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp4 57.09 ± 0.09 105.82 ± 1.47 1.85
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp5 44.95 ± 0.11 109.84 ± 1.94 2.44
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp6 53.93 ± 0.45 109.45 ± 0.98 2.03
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp7 62.50 ± 0.20 118.56 ± 1.76 1.90
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp8 71.09 ± 0.24 133.41 ± 1.53 1.88
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp1 21.04 ± 0.10 21.07 ± 0.07 1.00
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp2 25.01 ± 0.05 43.02 ± 0.41 1.72
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp3 27.19 ± 0.04 60.72 ± 0.20 2.23
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp4 32.91 ± 0.05 74.23 ± 2.22 2.26
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp5 48.01 ± 0.31 89.55 ± 0.49 1.87
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp6 57.61 ± 0.53 75.01 ± 0.49 1.30
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp7 66.53 ± 0.14 85.14 ± 0.65 1.28
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp8 76.36 ± 0.63 97.31 ± 0.97 1.27
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp1 29.25 ± 0.05 29.20 ± 0.06 1.00
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp2 37.11 ± 0.07 37.93 ± 0.05 1.02
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp3 42.27 ± 0.03 50.55 ± 0.04 1.20
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp4 44.16 ± 0.08 58.00 ± 0.08 1.31
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp5 25.01 ± 0.03 60.65 ± 0.05 2.43
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp6 29.97 ± 0.03 60.26 ± 0.11 2.01
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp7 34.86 ± 0.01 60.86 ± 0.05 1.75
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp8 39.79 ± 0.02 69.41 ± 0.09 1.74
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp1 27.03 ± 0.06 27.03 ± 0.06 1.00
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp2 34.15 ± 0.05 37.19 ± 0.04 1.09
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp3 38.02 ± 0.03 49.71 ± 0.22 1.31
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp4 39.36 ± 0.03 57.54 ± 0.11 1.46
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp5 24.96 ± 0.04 59.83 ± 0.07 2.40
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp6 29.96 ± 0.05 59.29 ± 0.01 1.98
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp7 34.83 ± 0.05 59.96 ± 0.12 1.72
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp8 39.83 ± 0.04 68.54 ± 0.08 1.72
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp1 25.89 ± 0.06 25.94 ± 0.06 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp2 32.06 ± 0.06 32.00 ± 0.03 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp3 35.42 ± 0.11 35.34 ± 0.05 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp4 36.82 ± 0.02 42.05 ± 0.05 1.14
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp5 20.64 ± 0.03 46.80 ± 0.07 2.27
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp6 24.79 ± 0.04 45.84 ± 0.05 1.85
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp7 28.85 ± 0.05 42.90 ± 0.04 1.49
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp8 32.92 ± 0.08 48.09 ± 0.13 1.46
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp1 20.66 ± 0.13 20.51 ± 0.02 0.99
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp2 24.03 ± 0.01 24.05 ± 0.02 1.00
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp3 25.49 ± 0.04 25.48 ± 0.02 1.00
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp4 26.06 ± 0.02 39.32 ± 0.12 1.51
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp5 18.89 ± 0.05 44.53 ± 0.04 2.36
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp6 22.64 ± 0.05 42.35 ± 0.02 1.87
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp7 26.33 ± 0.02 40.34 ± 0.12 1.53
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp8 30.10 ± 0.04 45.64 ± 0.10 1.52
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp1 20.55 ± 0.03 20.54 ± 0.04 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp2 23.18 ± 0.01 23.16 ± 0.02 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp3 24.45 ± 0.03 24.41 ± 0.01 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp4 24.86 ± 0.03 40.36 ± 0.04 1.62
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp5 19.61 ± 0.03 44.46 ± 0.03 2.27
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp6 23.52 ± 0.05 43.93 ± 0.02 1.87
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp7 27.37 ± 0.02 42.27 ± 0.03 1.54
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp8 31.28 ± 0.03 46.69 ± 0.05 1.49
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp1 22.45 ± 0.06 22.56 ± 0.07 1.00
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp2 28.20 ± 0.02 31.86 ± 0.08 1.13
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp3 30.84 ± 0.01 37.83 ± 0.12 1.23
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp4 32.05 ± 0.04 41.81 ± 0.05 1.30
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp5 22.18 ± 0.04 49.22 ± 0.07 2.22
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp6 26.55 ± 0.04 44.49 ± 0.16 1.68
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp7 30.90 ± 0.02 37.65 ± 0.12 1.22
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp8 35.24 ± 0.03 43.38 ± 0.30 1.23
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp1 17.84 ± 0.08 17.85 ± 0.05 1.00
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp2 21.15 ± 0.02 32.86 ± 0.02 1.55
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp3 22.40 ± 0.01 45.93 ± 0.09 2.05
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp4 22.84 ± 0.02 55.71 ± 0.07 2.44
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp5 23.98 ± 0.02 57.98 ± 0.07 2.42
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp6 28.76 ± 0.04 53.49 ± 0.04 1.86
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp7 33.45 ± 0.02 57.51 ± 0.12 1.72
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp8 38.24 ± 0.06 65.18 ± 0.10 1.70
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp1 9.76 ± 0.06 9.73 ± 0.04 1.00
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp2 11.14 ± 0.01 20.23 ± 0.04 1.82
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp3 11.75 ± 0.01 29.31 ± 0.05 2.49
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp4 14.26 ± 0.02 37.59 ± 0.32 2.64
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp5 26.38 ± 0.11 45.37 ± 0.31 1.72
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp6 31.58 ± 0.11 34.41 ± 0.18 1.09
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp7 36.69 ± 0.14 39.02 ± 0.07 1.06
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp8 41.82 ± 0.10 44.55 ± 0.10 1.07

build: 991f8aa (4239)

M1 Pro
model size backend fa test t/s t/s speedup
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp1 71.88 ± 0.91 72.07 ± 0.73 1.00
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp1 72.24 ± 0.12 71.35 ± 2.35 0.99
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp2 94.02 ± 0.08 89.35 ± 0.26 0.95
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp3 103.71 ± 0.12 125.13 ± 0.14 1.21
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp4 110.12 ± 0.06 140.56 ± 0.14 1.28
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp5 83.42 ± 0.07 155.18 ± 0.18 1.86
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp6 99.85 ± 0.04 151.34 ± 0.11 1.52
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp7 116.20 ± 0.04 145.15 ± 0.09 1.25
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp8 132.87 ± 0.07 164.57 ± 0.07 1.24
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp1 61.36 ± 0.10 61.34 ± 0.16 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp1 61.41 ± 0.10 61.43 ± 0.19 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp2 73.71 ± 0.92 74.27 ± 0.12 1.01
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp3 79.77 ± 0.11 79.91 ± 0.05 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp4 83.43 ± 0.04 105.77 ± 0.12 1.27
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp5 71.31 ± 0.08 110.56 ± 0.11 1.55
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp6 85.53 ± 0.03 107.73 ± 0.09 1.26
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp7 99.54 ± 0.08 110.39 ± 0.10 1.11
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp8 113.82 ± 0.07 122.02 ± 0.14 1.07
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp1 46.28 ± 0.13 46.03 ± 0.32 0.99
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp1 46.33 ± 0.08 45.26 ± 0.78 0.98
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp2 60.96 ± 0.08 76.28 ± 0.10 1.25
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp3 67.21 ± 0.03 114.75 ± 0.31 1.71
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp4 71.15 ± 0.07 135.15 ± 0.34 1.90
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp5 82.02 ± 0.02 149.47 ± 0.15 1.82
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp6 98.18 ± 0.07 137.05 ± 0.07 1.40
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp7 114.26 ± 0.03 139.25 ± 0.14 1.22
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp8 130.61 ± 0.05 153.15 ± 0.99 1.17
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp1 40.60 ± 0.05 39.92 ± 0.22 0.98
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp1 40.50 ± 0.26 40.00 ± 0.45 0.99
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp2 47.66 ± 0.56 44.10 ± 0.13 0.93
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp3 49.87 ± 0.41 66.90 ± 0.30 1.34
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp4 51.91 ± 0.16 71.58 ± 0.26 1.38
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp5 41.91 ± 0.07 77.09 ± 0.26 1.84
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp6 50.22 ± 0.15 73.08 ± 0.19 1.46
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp7 58.22 ± 0.38 68.96 ± 0.14 1.18
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp8 66.55 ± 0.23 77.88 ± 0.17 1.17
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp1 33.15 ± 0.18 33.16 ± 0.19 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp1 33.14 ± 0.19 33.18 ± 0.19 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp2 36.75 ± 0.13 36.75 ± 0.12 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp3 38.25 ± 0.12 38.26 ± 0.11 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp4 39.18 ± 0.09 52.31 ± 0.15 1.34
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp5 36.56 ± 0.01 54.24 ± 0.12 1.48
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp6 43.78 ± 0.01 52.56 ± 0.04 1.20
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp7 50.95 ± 0.09 52.13 ± 0.08 1.02
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp8 58.25 ± 0.01 58.03 ± 0.12 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp1 23.74 ± 0.11 23.75 ± 0.10 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp1 23.77 ± 0.10 23.76 ± 0.10 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp2 26.14 ± 0.02 37.02 ± 0.12 1.42
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp3 27.29 ± 0.04 58.21 ± 0.22 2.13
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp4 27.92 ± 0.03 66.85 ± 0.20 2.39
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp5 41.06 ± 0.03 72.93 ± 0.21 1.78
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp6 49.16 ± 0.18 63.97 ± 0.19 1.30
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp7 56.99 ± 0.15 64.78 ± 0.19 1.14
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp8 65.09 ± 0.17 71.21 ± 0.15 1.09
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp1 21.82 ± 0.03 21.83 ± 0.04 1.00
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp1 21.83 ± 0.02 21.85 ± 0.01 1.00
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp2 24.47 ± 0.02 22.65 ± 0.01 0.93
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp3 25.54 ± 0.01 35.48 ± 0.02 1.39
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp4 25.96 ± 0.04 37.49 ± 0.04 1.44
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp5 22.53 ± 0.01 41.14 ± 0.05 1.83
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp6 27.02 ± 0.01 37.65 ± 0.05 1.39
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp7 31.50 ± 0.01 35.59 ± 0.01 1.13
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp8 35.94 ± 0.03 40.10 ± 0.03 1.12
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp1 17.75 ± 0.02 17.68 ± 0.03 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp1 17.76 ± 0.02 17.69 ± 0.05 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp2 19.17 ± 0.02 19.11 ± 0.02 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp3 19.76 ± 0.03 19.72 ± 0.02 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp4 20.01 ± 0.01 27.45 ± 0.02 1.37
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp5 19.28 ± 0.01 28.70 ± 0.03 1.49
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp6 23.10 ± 0.02 27.31 ± 0.00 1.18
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp7 26.91 ± 0.01 26.84 ± 0.01 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp8 30.72 ± 0.01 29.84 ± 0.02 0.97
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp1 12.49 ± 0.03 12.48 ± 0.02 1.00
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp1 12.48 ± 0.04 12.49 ± 0.02 1.00
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp2 13.58 ± 0.02 19.03 ± 0.02 1.40
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp3 14.02 ± 0.01 31.95 ± 0.04 2.28
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp4 14.21 ± 0.01 34.42 ± 0.08 2.42
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp5 22.25 ± 0.02 37.49 ± 0.02 1.68
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp6 26.66 ± 0.02 33.12 ± 0.04 1.24
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp7 31.07 ± 0.02 33.16 ± 0.03 1.07
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp8 35.45 ± 0.03 36.25 ± 0.04 1.02

build: 64ed209 (4240)

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 29, 2024
@ggerganov ggerganov force-pushed the gg/metal-mul-mv-new-save4 branch from 8074ca8 to f45c40e Compare December 2, 2024 09:00
@ggerganov ggerganov marked this pull request as ready for review December 2, 2024 18:31
@ggerganov ggerganov merged commit 0115df2 into master Dec 3, 2024
48 checks passed
@ggerganov ggerganov deleted the gg/metal-mul-mv-new-save4 branch December 3, 2024 09:52
netrunnereve pushed a commit to netrunnereve/llama.cpp that referenced this pull request Dec 4, 2024
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant