You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The plots show the improvement of performance of the respective BLAS framework plus copying over naïve matrix multiplication.
AOCL is based on BLIS. It is clearly visible that for the case n=100, AOCL provides a substantial improvement over BLIS (see purple shimmer). However, that is not the case for smaller matrices. Some countermeasures have been taken and left a triangular pattern in the performance chart.
I wonder whether with the help of these plots performance can be improved for smaller matrices. I can do more benchmarks and plots like that if interested and also provide some code.
Best from Berlin, Michael
The text was updated successfully, but these errors were encountered:
Hi Michael,
Thanks for sharing this data. This is very useful.
What is the data-precision we are dealing with? Is it single precision or double precision?
Which version of AOCL was tried out ? -
Dear AOCL Team,
I'm currently working to improve Numpy's
matmul
for the strided case and I ran a large grid search with different BLAS frameworks, seenumpy/numpy#23752 (comment)
Here a repost of the plots:
blas_benchmark_v2.pdf
The plots show the improvement of performance of the respective BLAS framework plus copying over naïve matrix multiplication.
AOCL is based on BLIS. It is clearly visible that for the case
n=100
, AOCL provides a substantial improvement over BLIS (see purple shimmer). However, that is not the case for smaller matrices. Some countermeasures have been taken and left a triangular pattern in the performance chart.I wonder whether with the help of these plots performance can be improved for smaller matrices. I can do more benchmarks and plots like that if interested and also provide some code.
Best from Berlin, Michael
The text was updated successfully, but these errors were encountered: