-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SGEMM broken with 1.6.2 on Intel ARC #533
Comments
This is most likely because of #503, which added new tuning results for the A770, as shared here. Apparently they are not (always?) valid, or there is a bug in either of your A770 drivers. |
Dear Cedric, |
Thanks, but I don't have an A770 to test it. Could you or @0cc4m apply those tuning results, re-compile the library, and then run the tests? |
I suggest @0cc4m give it a try to check urgent test cases. |
No, that tuning doesn't work for me. I even tuned it myself and it still fails the tests. It seems the Linux driver (mesa version 24.0.1) for A770 has some issue that means the results are wrong when CLBlast is tuned. Very odd. |
I see. Please try installing intel compute runtime if this is not the case. The tuning is optimized for the Intel NEO Platform, not the open-source Mesa Platform. |
@tangjinchuan Sorry, my bad. Somehow I had it in my head that the Intel OpenCL driver is in mesa, but of course it's separate. I'm using |
@0cc4m Please give the latest version [24.05.28454.6] a go: https://github.com/intel/compute-runtime/releases, all the install steps are there. If the problem persists try turning the PC on and off again for I once had such a problem after updating in Windows. Meanwhile, it would be very helpful to have a test case on two matrices multiplication (just give two matrices that can have such a problem) so that I can see if it produces the same problems on my card. Locating problems in LLM inference framework would be too much for me. |
I'm not using Windows, the issue is persistent across reboots and I'm not using llama.cpp to test it. I'm just running It still fails on intel-compute-runtime version |
Thanks! Will have the test case a try next day. In the meantime, for any windows users, you can also give it a go. |
@CNugteren Could you please have a look at the results? It only produced ":". Is this correct? I used openBLAS as the comparison library and the exe did not show any test cases like @0cc4m. |
@0cc4m I remembered one test update from 1.6.1 to 1.6.2 is here afb3d8a |
No that is not testing anything. I think it just crashes silently? The test results should look like output_1.6.1.txt in the first message in this thread. BTW, it could also be that there is a bug in CLBlast or in the tuner. If @0cc4m has time, perhaps he/she could change some values in the tuning database, re-compile, and re-run the tests. E.g. starting from the GEMM values themselves in https://github.com/CNugteren/CLBlast/pull/503/files#diff-aad42a9516d39f8f4e703348e73e0de9a15f068041676d7a3d32510864abb9d9R164. If we know which line in that PR causes the issues, then we are one step further. Then, next we could play with individual values of the tuning parameters, and try to find out which combinations of parameters are invalid, and which ones are still valid. But that might be some work. |
The Xe laptop used to compile the test cases also only produce one ":". I guess it is not a single problem for A770 only. |
It could be an issue with your reference library (your CBLAS library). Perhaps you can open a separate issue for this, but it might not be CLBlast related. Please report the output of the test binary with the |
@CNugteren Mistaken for tuner as a test case. It is still not working for neither openBLAS nor clBLAS.
|
@tangjinchuan Perhaps you can open a separate issue for this. Please report the output of the test binary with the -verbose flag given and give some details about the reference library and your system. |
@CNugteren It seems that A770 has problems with versions 153 -162. I only tried these 3 versions. For my own app, I only used SGEMM with A*B while other parameters are set to satisfy this case. As a result, I did not discover something wrong. In the meantime, I would like to mention that intel has 32bit support while long type is done by emulations. |
Dear all, |
@tangjinchuan Thanks for testing and filing a report. @0cc4m Thanks for reporting the issue. Since you didn't provide any further debug info and since I don't have a A770 myself, I decided to simply revert the tuning results for now and add a note to the README about this. See #539. I'll leave this issue open if there is anyone who wants to debug this further. |
This issue is very likely solved with #543. If anyone with an A770 has the time to build CLBlast and the tests from source and run them, that would be great. |
Thank you, I'll test it within the next few days if noone else does it first. |
That's wonderful. |
@CNugteren I built the latest master branch and ran |
output_1.6.1.txt
output_1.6.2.txt
In 1.6.1 SGEMM gives the correct results on Intel ARC (tested on an A770 on Arch Linux), but with 1.6.2 the results are wrong.
Let me know if more data is needed.
The text was updated successfully, but these errors were encountered: