-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floats are problematic #39
Comments
IEEE 754 states that implementations should use the MSB of the mantissa being set to indicate a quiet NaN, however MIPS before the 2008 revision and PA-RISC use the MSB of the mantissa being clear to indicate a quiet NaN. |
@programmerjake Ah, thank you, I kept hearing the detail of swapped qNaN and sNaNs but for some reason "it's the most significant bit in the mantissa" kept sliding from my mind while writing from this up. This makes sense. MIPS doesn't seem to have a lot of usage, but it is on our Platform Support list and we do build hosts on it. Anyways, for this reason we might run into troubles exploiting its vector ISAs, like we might with Neon v1, but this difference at least seems more soluble and I would prefer if we could at least get basic functionality together (i.e. scalar fallbacks). And really it was IEEE754-1985's fault for needlessly avoiding specifying which way around it should go. It doesn't seem like there's a clear target division between the different eras of MIPS targets in rustc's targets list, and MIPS64 is older than 2008. I suppose we have to assume all MIPS FPUs are Like That, then? |
AFAICT, the mips32r6 and mips64r6 targets are the ones that switch to the non-weird NaN encodings: |
Ah. Those are on tier 3 support under the mipsisa{32,64}{,el} names, and it's not even clear we can build |
So, I saw this thread and wanted to add something since I didn't see it discussed: do we want to provide any guarantee on the rounding behaviour of the reduce operations? In particular, for x86, I did some searching around and didn't find anything… all the docs I've seen places just say that HADDPD "adds everything" and doesn't specify the rounding behaviour, which I assume would be equivalent to performing a single round after the operation (all other versions don't make sense). However, if we guaranteed this at the library level, that means that any implementation that's forced to rely on software instead of dedicated instructions will have a performance penalty. I also don't know how the existing software implementations deal with this. |
Somewhat related to #235. IEEE 754 specifies that reductions may associate in any order, and may use any wider intermediate format. I'm partial to that because it's specified in the standard, fast, and I don't think it sacrifices much. If you want to guarantee any particular behavior, there's always the option to use swizzles. |
I'm a bit confused how swizzles guarantee the behaviour if the ordering is always unspecified; do you mean just performing the sums manually instead of using the reduction methods? |
Yeah, sorry, I just mean manually implementing the order and reduction function. |
It is well known that floating point code often merits extra caution due to the details of rounding math, but the fact that floating point numbers are more interesting to handle applies not just to software interacting with floating point numbers and the floating point environment (all of which can affect our API significantly), but also applies to hardware implementations of both vector and scalar floating point units, so to both our SIMD code and our scalar fallbacks. These aren't bugs per se because it's not a bug, it's a feature, but they are features requiring enhanced attention to detail. This is related to rust-lang/unsafe-code-guidelines#237 and rust-lang/rust#73328 as well.
Most domains involving SIMD code use floats extensively, so this library has a "front-row seat" to problems with floating point operations. Thus, in order for SIMD code to be reasonably portable, we should try to discover where these... unique... implementations lie and decide how to work around them, or at least inform our users of the Fun Facts we learn.
Arch-specific concerns remain for:
32-bit x86
The x87 80-bit "long double" float registers can do interesting things to NaN, and in general if a floating point value ends up in them and experiences an operation, this can introduce extra precision that may lead to incorrect mathematical conclusions later on. Further, Rust no longer supports MMX code at all because their interaction with the x87 registers were just entirely too much trouble. Altogether, this is probably why the x86-64 System V ABI specifies usage of the XMM registers for handling floating point values, which do not have these problems.
32-bit ARM
There are so many different floating point implementations on 32-bit ARM that
armclang
includes a compiler flag for FPU architecture and another one for float ABI.MIPS
NaNs sometimes do weird things on this platform, resulting in some of the packed_simd tests getting the very interesting number
-3.0
.Wasm
Technically Wasm is IEEE754 compliant but it is very "...technically!" here because it specifies a canonicalizing behavior on NaNs that constitutes an interesting choice amongst architectures and may not be expected by a programmer that is used to e.g. being able to rely on NaN bitfields being fairly stable, so we will want to watch out for it when implementing our initial floating point test suite.
The Good News
We have reasonable confidence in float behavior in x86's XMM-and-later registers (so from SSE2 onwards), aarch64's Neon v2 registers, PowerPC, and z/Architecture (
s390x
). As far as we know, on these architectures ordinary binary32 and binary64s are what they say they are, support basic operations in a reasonably consistent fashion, and nothing particularly weird occurs even with NaNs. So we are actually in a pretty good position for actually using most vector ISAs! It's just all the edge cases that pile up.Main Takeaway
We want to extensively test even "simple" scalar operations on floating point numbers, especially casts and bitwhacking, or anything else that might possibly be affected by denormal numbers or NaN, so as to surface known and unknown quirks in floating point architectures and how LLVM handles them.
The text was updated successfully, but these errors were encountered: