-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blas_ API: for sgemm of armv8a, only 4x4 microkernel can be used? #133
Comments
Hi, where did you see that only the 4x4 microkernel can be used? |
Does code size affect the performance of small GEMMs? |
In general I don't expect code side to affect the performance of small GEMMs much, at least once it is loaded in instruction cache, in case of multiple calls to GEMM routines. |
What if it runs only once? |
Then for small matrices it may be that the overhead of loading data and code from main memory is the limiting factor. |
( Some answers given here are also applicable here as well. ) |
@hfp thanks for sharing the link to your issue, interesting reading! |
Thank you for your previous reply. I would like to ask: is it reasonable to run multiple times and average the performance of small-scale GEMM? |
IMO it is, as this is a rather common case in practice. On the other hand, you can always build an example where both code and data are cold. |
No description provided.
The text was updated successfully, but these errors were encountered: