-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Quiet NaNs on x86 hosts #75338
JIT: Quiet NaNs on x86 hosts #75338
Conversation
Since the ABI specifies that float values are returned on x87 stack, and since the fld instruction will quiet NaNs, it is very easy for the bit representation of float literals to change during JIT. For example, inlining of a function or link-time code generation changing the ABI of a function can cause such behavior. This changes the JIT to quiet all floating point NaNs stored as JIT IR constants on this host platform. Fix dotnet#75221
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsSince the ABI specifies that float values are returned on x87 stack, and since the fld instruction will quiet NaNs, it is very easy for the bit representation of float literals to change during JIT. For example, inlining of a function or link-time code generation changing the ABI of a function can cause such behavior. This changes the JIT to quiet all floating point NaNs stored as JIT IR constants on this host platform. Also adds printing of the bit representation when NaNs are printed as part of IR. Fix #75221
|
@jkotas @tannergooding does this seem reasonable? It potentially introduces differences in cross compilation and host compilation scenarios (although those are likely there today in this scenario too). If you think the fix is reasonable we can also do this generally on all platforms instead. |
// returns the value is inlined or not by the C++ compiler. To get around the | ||
// nondeterminism we quiet the NaNs ahead of time as a best-effort fix. | ||
// | ||
double FloatingPointUtils::normalize(double value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me and is should be within the allowed handling by IEEE 754.
The potential downside is that some advanced user is trying to utilize such constants with an advanced SIMD algorithm and the relevant floating-point bitwise manipulation instructions (andps
, orps
, xorps
, etc).
I expect the most likely bitwise values used will be:
- AllBitsSet - A qNaN
- NegativeZero - Only sign bit set
- PositiveInfinity - All exponent bits set
- All mantissa bits set - A subnormal number
None of these should be prone to complications from this normalization. All mantissa bits set
has a potential separate issue where it "can" be subject to an FPU mode where subnormal values are normalized to Zero. However, we don't support this and none of our current target platforms do this except for Arm32 where it doesn't have a relevant control switch for SIMD (scalar logic is not subject to the same issue).
Unfortunately we cannot do something clever like utilize the relevant MMX instructions to bypass FLD
. This is because while mm0
maps to FP(0)
, the MMX registers only expose the lower 64-bits and set the upper 16-bits to all ones, there-by changing the value that a x87 FPU instruction would see.
We could, however, change the managed calling convention to just not use the x87 FPU entirely. Then this value normalization would only be a consideration when interoping with native code. That may also be faster and simplify some of the overall handling given that we don't use the x87 FPU for anything other than returning floating-point values on x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, however, change the managed calling convention to just not use the x87 FPU entirely. Then this value normalization would only be a consideration when interoping with native code. That may also be faster and simplify some of the overall handling given that we don't use the x87 FPU for anything other than returning floating-point values on x86.
The original problem here is in the JIT's code generated by MSVC. We get non-determinism because the code produced by MSVC sometimes will quiet the NaNs and sometimes won't, depending on inlining/link-time code generation.
Changing the managed calling convention here is a bit more work than I'd like to take on at this point, but I agree that it is likely users can hit the same behavior with the JIT that we are hitting with MSVC, depending on inlining.
We seem to be wrapping more and more of C/C++ float operations to make them predictable for the JIT purposes. Should we introduce a new a type in the JIT codebase for this purpose?
|
I've talked about this a bit with @AndyAyersMS who did this on a previous compiler. Not sure if we have a ready-to-go implementation of such a type somewhere that we could just use.
I don't think it would be enough to only treat floating point constants this way, at least if we want to guarantee determinism between debug/checked/release and cross-compiling vs host-compiling. The recent other example I can remember in the area was #58063 (comment) which was just internal JIT calculations using floats. Both these examples are x86 only, so seems like this would primarily be an investment in x86 (but maybe there are other cases I have missed?). |
I don't know of any.
I wonder if we could mitigate most of these issues by building the x86 jit with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should take this fix but consider thinking whether something more comprehensive is warranted.
Right, I mentioned this as a solution in #75221 too. I tried it but I hit many linkers errors, so sort of gave up on it as I wasn't even sure if changing the default was feasible for us. It does fix the problem if you mark the problematic functions as |
I'm going to merge this as-is at this point. If we identify more broadly impacting issues due to the use of floats we should definitely consider investing in some of the alternatives to solve this for-good. |
Since the ABI specifies that float values are returned on x87 stack, and since the fld instruction will quiet NaNs, it is very easy for the bit representation of float literals to change during JIT. For example, inlining of a function or link-time code generation changing the ABI of a function can cause such behavior. This changes the JIT to quiet all floating point NaNs stored as JIT IR constants on this host platform.
Also adds printing of the bit representation when NaNs are printed as part of IR.
Fix #75221