You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found a few issues in some of the floating point conversion functions that only occur in non-standard rounding modes (round to positive infinity, negative infinity, or zero - not the default and probably-sane round to nearest).
I've been working on "making SIMDe on x86 match ARMv8 exactly," and this shakes out some really weird bugs over time.
The core of the issue is that the use of pow(2, n) in the simde_vcvth_n_f16_s16(int16_t a, const int n) functions only works properly in round-to-nearest. In round-to-zero or round-to-negative-infinity modes, this generates a result that is one LSB off the correct result.
The fix is simple enough:
- HEDLEY_STATIC_CAST(simde_float64_t, a) / pow(2, n)));
+ HEDLEY_STATIC_CAST(simde_float64_t, a) / (UINT64_C(1) << n)));
Or for the 64-bit types, because 64 is a valid fixed point shift (and 1 << 64 is not a number), this matches hardware exactly.
The problem is that proving this in the test suite is... a challenge. It starts getting into very hardware specific sort of incantations to set the floating point rounding modes, and I'm not sure a lot of x86-isms in the test suite are welcome.
This, however, is a simple test program on x86 that shows the nature of the problem and the subtle difference in results. If you're not used to reading floating point as hex... sorry. :/
Again, this is only an issue in the non-standard rounding modes. But the fix is also fairly straightforward to exactly match hardware.
If this is something of interest, I can push a fix for it. But as SIMDe generally doesn't deal with non-standard rounding modes properly, I wanted to ask first.
The text was updated successfully, but these errors were encountered:
Syonyk
added a commit
to Syonyk/simde
that referenced
this issue
Jan 6, 2025
As demonstrated by test code in
simd-everywhere#1260
the behavior of pow() in non-round-to-nearest rounding modes is not
exact. This causes behavior divergence from ARMv8 hardware when not
using round-to-nearest. The updated forms match hardware properly
across a range of values. The tests are not updated to handle
rounding modes, as doing this in a cross-platform way is not trivial.
However, all existing test vectors pass properly, and in more
detailed testing, these changes are closer to hardware.
I've found a few issues in some of the floating point conversion functions that only occur in non-standard rounding modes (round to positive infinity, negative infinity, or zero - not the default and probably-sane round to nearest).
I've been working on "making SIMDe on x86 match ARMv8 exactly," and this shakes out some really weird bugs over time.
The core of the issue is that the use of
pow(2, n)
in thesimde_vcvth_n_f16_s16(int16_t a, const int n)
functions only works properly in round-to-nearest. In round-to-zero or round-to-negative-infinity modes, this generates a result that is one LSB off the correct result.The fix is simple enough:
Or for the 64-bit types, because 64 is a valid fixed point shift (and 1 << 64 is not a number), this matches hardware exactly.
The problem is that proving this in the test suite is... a challenge. It starts getting into very hardware specific sort of incantations to set the floating point rounding modes, and I'm not sure a lot of x86-isms in the test suite are welcome.
This, however, is a simple test program on x86 that shows the nature of the problem and the subtle difference in results. If you're not used to reading floating point as hex... sorry. :/
MXCSR is set to round to negative infinity: https://help.totalview.io/current/HTML/index.html#page/TotalView/Intelx86MXSCRRegister.html
The results ought be:
Again, this is only an issue in the non-standard rounding modes. But the fix is also fairly straightforward to exactly match hardware.
If this is something of interest, I can push a fix for it. But as SIMDe generally doesn't deal with non-standard rounding modes properly, I wanted to ask first.
The text was updated successfully, but these errors were encountered: