-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch64: fixed cscal and zscal #5078
base: develop
Are you sure you want to change the base?
Conversation
I wonder if this will lead us down the same path of adding a special flag for array zeroing vs IEEE compliance as with non-complex SCAL :( |
cb8cc3f
to
038e0fb
Compare
You reminded me that we also need to add flags for cscal and zscal.
Test output using MKL 2024.2 version:
The same output as MKL when using reference BLAS Version 3.12.0.
|
This PR will introduce new issues to s/zscal and needs to be revised. (It seems that other platforms also need modifications to avoid the above issues.) |
I submitted a PR #5081 attempting to fix the implementation in C. |
Thanks, I'll try to take a stab at the other implementations over the weekend. |
038e0fb
to
6b27f17
Compare
Unfortunately this appears to have broken most of the pre-existing special handling of |
I removed some special-case handling code because I felt its correctness was questionable.
The output is: |
6b27f17
to
2da86b8
Compare
Lines 61 to 67 in 76db346
Adding special value checks for each number is likely to cause a performance drop and make optimization with assembly much more difficult. Should we consider simplifying it? |
For the parameters
float x[2] = {NaN, NaN}
andfloat alpha[2] = {0.0, 0.0}
, the optimizedcscal
interface does not directly copy0.0
tox
but continues performing complex multiplication, resulting in an output of{NaN, NaN}
.The optimized
zscal
has the same issue. This problem was detected in LAPACK tests, but the existing OpenBLAS test cases do not cover this scenario. It may be considered for inclusion in future test cases.