-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance gemv vs gemm #83
Comments
You are right, the GEMV kernel is not particularly fast if the matrix is rotated. It's been a while since I looked at it and I completely forgot about it. But you can see similar results if you look in the Also on my system with the latest version of CLBlast I see this behaviour:
For now, you can get decent performance again if you rotate the matrix (either use column-major layout or set the transpose option). I'll take a more in-depth look at the kernel soon and try to improve it for rotated matrices. I'll keep you up-to-date. |
I've designed a new kernel for the rotated case. It has much better data locality since it now loads a tile of matrix A into the local memory. This also enables coalescing. On my device this already improves performance to the clBLAS level (old and new experiments below each other for comparison):
The new kernel can already be found in the |
when this is ready for rhe main, please put a visible reminder, and i'll retune clblast for the devices i have. |
This is now merged into the |
It improved quite a bit to 2,17 ms. Thanks for the help! |
Hello,
I encountered a weird runtime difference between the gemv and the gemm routine.
When I run both with the Input: M=4096, N=1, K=4096 on my GTX480 the runtime of the gemm routine is 3.04ms and the runtime of the gemv routine is 5.51ms. I would have expected that gemv would be faster than the gemm routine because it is made for such an input. Could it be that gemv isn't yet optimized for a GTX480 or is it normal that it is slower? The cuBLASSgemm is slower than cuBLASSgemv (almost 2 times faster).
I call the gemv routine like this:
./clblast_client_xgemv -m 4096 -n 4096 -alpha 1 -beta 0 -warm_up true -runs 100
Greetings,
Jan
The text was updated successfully, but these errors were encountered: