vulkan: Increase workgroup size for GLU, for performance #14345

jeffbolznv · 2025-06-23T13:33:44Z

@CISC @0cc4m I noticed Vulkan perf was much worse for tg in #14158 due to the small workgroup size. This change restores the performance:

before:
Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\glm-4-9b-chat-Q4_0.gguf -fa 1 -n 128 -p 512 --prio 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| chatglm 9B Q4_0                |   5.08 GiB |     9.40 B | Vulkan     |  99 |  1 |           pp512 |      3369.87 ± 10.71 |
| chatglm 9B Q4_0                |   5.08 GiB |     9.40 B | Vulkan     |  99 |  1 |           tg128 |         57.09 ± 0.21 |

build: ab46d11d (5752)

after:
Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\glm-4-9b-chat-Q4_0.gguf -fa 1 -n 128 -p 512 --prio 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| chatglm 9B Q4_0                |   5.08 GiB |     9.40 B | Vulkan     |  99 |  1 |           pp512 |      3404.32 ± 11.38 |
| chatglm 9B Q4_0                |   5.08 GiB |     9.40 B | Vulkan     |  99 |  1 |           tg128 |         73.71 ± 0.24 |

build: 065b990f (5753)

CISC

I honestly don't know anything about the Vulkan backend, but if you say so I'm sure this is good. :)

CISC · 2025-06-23T15:57:28Z

Out of curiosity, is there a similar tg boost for models with split up/gate?

jeffbolznv · 2025-06-23T15:58:47Z

This was fixing a regression vs what's in master, so it's just recovering the performance we already had. I've only tested this one model.

CISC · 2025-06-23T16:09:45Z

This was fixing a regression vs what's in master, so it's just recovering the performance we already had. I've only tested this one model.

I understood, I guess what I was asking if you could check if there was a similar regression for split up/gate too?

jeffbolznv · 2025-06-23T17:30:26Z

There very likely was. Can you suggest a model to test?

CISC · 2025-06-23T18:01:38Z

Qwen3 or something?

jeffbolznv · 2025-06-23T18:41:08Z

Yes, there is a similar issue with Qwen3, which this mostly fixes. But it's still 1-2% slower. I think I need to change the shader to do one element per thread rather than a row per workgroup. I'll push another commit later today.

…an one row per workgroup

jeffbolznv · 2025-06-23T19:53:41Z

tg perf with qwen3 is now marginally faster than with master.

vulkan: Increase workgroup size for GLU, for performance

2d67196

jeffbolznv requested review from CISC and 0cc4m June 23, 2025 13:33

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 23, 2025

CISC approved these changes Jun 23, 2025

View reviewed changes

vulkan: change GLU shaders to do one element per invocation rather th…

398bcba

…an one row per workgroup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Increase workgroup size for GLU, for performance #14345

vulkan: Increase workgroup size for GLU, for performance #14345

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC left a comment

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

Uh oh!

vulkan: Increase workgroup size for GLU, for performance #14345

Are you sure you want to change the base?

vulkan: Increase workgroup size for GLU, for performance #14345

Uh oh!

Conversation

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

CISC commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

jeffbolznv commented Jun 23, 2025

Uh oh!

Uh oh!