-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of halut matmul_online #18
Comments
Hi :-) Thank you for the question. I get your thinking :-) np.matmul
The linking in numpy happens around here: Reference to the halutmatmulI did some (very simple) optimization in python for
With halutmatmul/src/python/halutmatmul/functions.py Lines 16 to 27 in 4655152
This is done just in time. So if you run one warmup of But in the end, you will probably not beat the BLAS implementation in terms of speed. That is why we argue for very simple custom hardware support (see paper). I hope this helps :-) |
Thanks for the information. Very useful. |
Hi, I am testing the example python code on an Intel Xeon box. Basically, np.matmul(A_test, B) and hm.matmul_online(A_test) are both executed 1000 times to compare the time difference. I suppose halutmatmul should be much faster. However, it turned out that
halutmatmul took much longer.
Total time taken to np matmul 1000 times: 0.05877375602722168 seconds
Total time taken to halut matmul 1000 times: 1.6328861713409424 seconds
Is there anything I am missing? Thanks!
The text was updated successfully, but these errors were encountered: