Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Quiet NaNs on x86 hosts #75338

Merged
merged 1 commit into from
Sep 12, 2022
Merged

JIT: Quiet NaNs on x86 hosts #75338

merged 1 commit into from
Sep 12, 2022

Conversation

jakobbotsch
Copy link
Member

Since the ABI specifies that float values are returned on x87 stack, and since the fld instruction will quiet NaNs, it is very easy for the bit representation of float literals to change during JIT. For example, inlining of a function or link-time code generation changing the ABI of a function can cause such behavior. This changes the JIT to quiet all floating point NaNs stored as JIT IR constants on this host platform.

Also adds printing of the bit representation when NaNs are printed as part of IR.

Fix #75221

Since the ABI specifies that float values are returned on x87 stack, and
since the fld instruction will quiet NaNs, it is very easy for the bit
representation of float literals to change during JIT. For example,
inlining of a function or link-time code generation changing the ABI of
a function can cause such behavior. This changes the JIT to quiet all
floating point NaNs stored as JIT IR constants on this host platform.

Fix dotnet#75221
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 9, 2022
@ghost ghost assigned jakobbotsch Sep 9, 2022
@ghost
Copy link

ghost commented Sep 9, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Since the ABI specifies that float values are returned on x87 stack, and since the fld instruction will quiet NaNs, it is very easy for the bit representation of float literals to change during JIT. For example, inlining of a function or link-time code generation changing the ABI of a function can cause such behavior. This changes the JIT to quiet all floating point NaNs stored as JIT IR constants on this host platform.

Also adds printing of the bit representation when NaNs are printed as part of IR.

Fix #75221

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Sep 9, 2022

@jkotas @tannergooding does this seem reasonable? It potentially introduces differences in cross compilation and host compilation scenarios (although those are likely there today in this scenario too). If you think the fix is reasonable we can also do this generally on all platforms instead.

// returns the value is inlined or not by the C++ compiler. To get around the
// nondeterminism we quiet the NaNs ahead of time as a best-effort fix.
//
double FloatingPointUtils::normalize(double value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me and is should be within the allowed handling by IEEE 754.


The potential downside is that some advanced user is trying to utilize such constants with an advanced SIMD algorithm and the relevant floating-point bitwise manipulation instructions (andps, orps, xorps, etc).

I expect the most likely bitwise values used will be:

  • AllBitsSet - A qNaN
  • NegativeZero - Only sign bit set
  • PositiveInfinity - All exponent bits set
  • All mantissa bits set - A subnormal number

None of these should be prone to complications from this normalization. All mantissa bits set has a potential separate issue where it "can" be subject to an FPU mode where subnormal values are normalized to Zero. However, we don't support this and none of our current target platforms do this except for Arm32 where it doesn't have a relevant control switch for SIMD (scalar logic is not subject to the same issue).


Unfortunately we cannot do something clever like utilize the relevant MMX instructions to bypass FLD. This is because while mm0 maps to FP(0), the MMX registers only expose the lower 64-bits and set the upper 16-bits to all ones, there-by changing the value that a x87 FPU instruction would see.

We could, however, change the managed calling convention to just not use the x87 FPU entirely. Then this value normalization would only be a consideration when interoping with native code. That may also be faster and simplify some of the overall handling given that we don't use the x87 FPU for anything other than returning floating-point values on x86.

Copy link
Member Author

@jakobbotsch jakobbotsch Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, however, change the managed calling convention to just not use the x87 FPU entirely. Then this value normalization would only be a consideration when interoping with native code. That may also be faster and simplify some of the overall handling given that we don't use the x87 FPU for anything other than returning floating-point values on x86.

The original problem here is in the JIT's code generated by MSVC. We get non-determinism because the code produced by MSVC sometimes will quiet the NaNs and sometimes won't, depending on inlining/link-time code generation.

Changing the managed calling convention here is a bit more work than I'd like to take on at this point, but I agree that it is likely users can hit the same behavior with the JIT that we are hitting with MSVC, depending on inlining.

@jkotas
Copy link
Member

jkotas commented Sep 9, 2022

We seem to be wrapping more and more of C/C++ float operations to make them predictable for the JIT purposes. Should we introduce a new a type in the JIT codebase for this purpose?

  • This type would use int32/int64 as underlying type
  • All floating point operations would be implemented on this type in a predictable way
  • Change all places where we pass around floating point constants in the JIT to use this type

@jakobbotsch
Copy link
Member Author

We seem to be wrapping more and more of C/C++ float operations to make them predictable for the JIT purposes. Should we introduce a new a type in the JIT codebase for this purpose?

  • This type would use int32/int64 as underlying type
  • All floating point operations would be implemented on this type in a predictable way

I've talked about this a bit with @AndyAyersMS who did this on a previous compiler. Not sure if we have a ready-to-go implementation of such a type somewhere that we could just use.

  • Change all places where we pass around floating point constants in the JIT to use this type

I don't think it would be enough to only treat floating point constants this way, at least if we want to guarantee determinism between debug/checked/release and cross-compiling vs host-compiling. The recent other example I can remember in the area was #58063 (comment) which was just internal JIT calculations using floats. Both these examples are x86 only, so seems like this would primarily be an investment in x86 (but maybe there are other cases I have missed?).

@AndyAyersMS
Copy link
Member

I've talked about this a bit with @AndyAyersMS who did this on a previous compiler. Not sure if we have a ready-to-go implementation of such a type somewhere that we could just use.

I don't know of any.

this would primarily be an investment in x86

I wonder if we could mitigate most of these issues by building the x86 jit with _vectorcall? As far as I know the MSVC++ compiler (when SSE2 is allowed) tries to avoid x87 except at ABI boundaries or when it runs into inline asm or similar.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should take this fix but consider thinking whether something more comprehensive is warranted.

@jakobbotsch
Copy link
Member Author

I wonder if we could mitigate most of these issues by building the x86 jit with _vectorcall? As far as I know the MSVC++ compiler (when SSE2 is allowed) tries to avoid x87 except at ABI boundaries or when it runs into inline asm or similar.

Right, I mentioned this as a solution in #75221 too. I tried it but I hit many linkers errors, so sort of gave up on it as I wasn't even sure if changing the default was feasible for us. It does fix the problem if you mark the problematic functions as __vectorcall.

@jakobbotsch
Copy link
Member Author

I'm going to merge this as-is at this point. If we identify more broadly impacting issues due to the use of floats we should definitely consider investing in some of the alternatives to solve this for-good.

@jakobbotsch jakobbotsch merged commit 99d21b9 into dotnet:main Sep 12, 2022
@jakobbotsch jakobbotsch deleted the fix-75221 branch September 12, 2022 09:40
@ghost ghost locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: check/release diff in arm SPMI libraries_tests.pmi
4 participants