Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LWS autotune ends up with a poor choice #5661

Open
magnumripper opened this issue Feb 8, 2025 · 0 comments
Open

LWS autotune ends up with a poor choice #5661

magnumripper opened this issue Feb 8, 2025 · 0 comments

Comments

@magnumripper
Copy link
Member

I frequently see that a manual bump of LWS makes for a boost you wouldn't want to miss. I'm not sure why but the current autotune somehow fails. I'm currently working on a kernel that does 4373 c/s with the autotuned worksize of only 32 while it achieves 5150 c/s using 256. I'd like to have those 17% please!

Interestingly enough, the current autotune seems to do the right thing:

Calculating best LWS for GWS=4096
Testing LWS=32 GWS=4096 ... 199.775 ms+
Testing LWS=64 GWS=4096 ... 200.720 ms
Testing LWS=128 GWS=4096 ... 205.342 ms
Testing LWS=256 GWS=4096 ... 383.430 ms

No wonder it picks 32. Yet, specifying LWS=256 manually ends up much faster. How come?

I tried adding __attribute__((work_group_size_hint(256, 1, 1))) to the kernels but it doesn't help - a query of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE still returns 32 on nvidia and 64 on AMD, which is probably intended (it does say multiple).

Using __attribute__((reqd_work_group_size(256, 1, 1))) feels like overkill even if it had worked. More importantly it fails miserably with our self-tests (can be worked around with --skip-self-test) and then with our autotune. The latter is not fixable as I'm not aware of any OpenCL query that actually tells us that number so it probably can't be fixed. So we must also hard code that size into the host code... and then we'd not need to query it 🙄. The runtime optimizer can benefit from it though, so there's that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant