-
Notifications
You must be signed in to change notification settings - Fork 12.2k
vulkan: Increase workgroup size for GLU, for performance #14345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cisc/unary-reglu-geglu-swiglu
Are you sure you want to change the base?
vulkan: Increase workgroup size for GLU, for performance #14345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't know anything about the Vulkan backend, but if you say so I'm sure this is good. :)
Out of curiosity, is there a similar tg boost for models with split up/gate? |
This was fixing a regression vs what's in master, so it's just recovering the performance we already had. I've only tested this one model. |
I understood, I guess what I was asking if you could check if there was a similar regression for split up/gate too? |
There very likely was. Can you suggest a model to test? |
Qwen3 or something? |
Yes, there is a similar issue with Qwen3, which this mostly fixes. But it's still 1-2% slower. I think I need to change the shader to do one element per thread rather than a row per workgroup. I'll push another commit later today. |
…an one row per workgroup
tg perf with qwen3 is now marginally faster than with master. |
@CISC @0cc4m I noticed Vulkan perf was much worse for tg in #14158 due to the small workgroup size. This change restores the performance: