Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong signs on division producing NaN #55131

Closed
dtolnay opened this issue Oct 16, 2018 · 16 comments
Closed

Wrong signs on division producing NaN #55131

dtolnay opened this issue Oct 16, 2018 · 16 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@dtolnay
Copy link
Member

dtolnay commented Oct 16, 2018

Noticed this while playing with #54235.

fn f(x: f64) -> f64 {
    0f64 / x
}

fn main() {
    println!("{:?}", (0f64 / 0f64).is_sign_negative());
    println!("{:?}", f(0f64).is_sign_negative());
}

As of rustc 1.31.0-nightly (46880f4 2018-10-15) on x86_64-unknown-linux-gnu, in debug mode this program prints false true and in release mode prints false false. Two of my expectations are violated:

  • The output should be consistent between debug mode and release mode.
  • The first and second println should print the same value.

(Happy to reconsider if these expectations are unfounded.)

@dtolnay
Copy link
Member Author

dtolnay commented Oct 16, 2018

Compilers 1.19 and older consistently print false false which aligns with my expectations; 1.20 and newer behave as above.

@hanna-kruppe
Copy link
Contributor

Evidently LLVM does not guarantee the sign of NaNs, just as it does not guarantee the signaling bit or payload. I can't say I would have known that, but it doesn't surprise me either.

Two observations that explain these discrepancies:

  • (0f64 / 0f64) is constant folded even in debug mode (by IRBuilder), while f(0f64) obviously is only constant folded when inlined, i.e., in release mode.
  • When constant folding a floating point computation that results in a NaN, LLVM prefers 0x7FF8000000000000 (which has positive sign). Apparently your CPU differs and produces a negative NaN for the runtime division.

@estebank estebank added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jan 19, 2019
@RalfJung
Copy link
Member

RalfJung commented Mar 3, 2020

In other words, the semantics of floating point operations would be something like "if the result is a NaN, non-deterministically pick any legal NaN representation". This non-determinism explains why debug and release builds differ in behavior.

I wonder if we should make Miri pick a random NaN payload and sign and signalling bit, just to drive home this point...

@RalfJung RalfJung added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Mar 3, 2020
@RalfJung
Copy link
Member

@hanna-kruppe notes that "NaNs are unstable under copying" seems rather excessive and in fact people might rely on NaN payloads being preserved on copy.

A less drastic alternative is to say that every single FP operation (arithmetic and intrinsics and whatnot, but not copying), when it returns a NaN, non-deterministically picks any NaN representation.

@hanna-kruppe
Copy link
Contributor

I believe it was @Lokathor who made this point, though I don't disagree.

However, I have doubts whether either option is enough to explain away the behavior LLVM can produce today. Don't have time to summarize but here's a link to the Zulip discussion for future reference: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics

@RalfJung
Copy link
Member

However, I have doubts whether either option is enough to explain away the behavior LLVM can produce today.

I did not see anything in that discussion that makes it sound like either option wouldn't work -- by current impression is that both correctly describe LLVM behavior. What did I miss? (Not urgent, just respond when you got time again.)

@hanna-kruppe
Copy link
Contributor

Specifically https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics/near/194786318 and the whole earlier discussion about how combinations of other optimizations can result in different uses of the same value (in Rust / the initial LLVM IR) observing different results. We talked about how maybe floats should be "frozen" when moving into the integer domain but this does not currently happen and as I said in https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics/near/194786318 LLVM can currently eliminate the float<->int bitcasts/transmutes/etc. that we do have (even if one might argue that it shouldn't).

@RalfJung
Copy link
Member

Hm okay if LLVM will duplicate casts then that would indeed contradict a "typed copy messes up NaN" semantics.

For the "FP operations pick arbitrary NaN" semantics, I suppose LLVM will also happily duplicate floating point operations since it considers them deterministic?

But together with "NaNs are not preserved", that actually leads to a contradiction, and if we can make LLVM do the right optimizations in the right order we can likely show a miscompilation from this.

@hanna-kruppe
Copy link
Contributor

Right, I believe there's potential miscompilations lurking there, but they're probably very difficult to tease out -- maybe even impossible today, if the stars don't align.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2020

Would it be worth bringing this up with LLVM? Seems like either they should clarify that NaN payloads are not preserved by some of their FP operations, or else they should consider this a bug. The former might be a problem because people compile browsers in LLVM and those browsers' JS/wasm runtimes might want to actually carry data in NaN payloads...

@programmerjake
Copy link
Member

In other words, the semantics of floating point operations would be something like "if the result is a NaN, non-deterministically pick any legal NaN representation". This non-determinism explains why debug and release builds differ in behavior.

I wonder if we should make Miri pick a random NaN payload and sign and signalling bit, just to drive home this point...

One note: the IEEE 754 fp standard requires the result of arithmetic operations to not be signaling NaNs.

@workingjubilee
Copy link
Member

What the IEEE754 FP standard says and what the implementation does are very different things, in practice, per #10186

@RalfJung
Copy link
Member

If we follow wasm, then Miri could pick any arithmetic NaN. Whether and how that aligns with being signalling or not, I do not know.

@RalfJung
Copy link
Member

RalfJung commented Nov 23, 2022

Based on this I am inclined to declare this not-a-bug: NaN-producing operations do not have a well-defined sign, so there cannot be a 'wrong' sign. This is the semantics both in LLVM and wasm. I think Rust should follow suit.

@Muon
Copy link

Muon commented Jan 24, 2023

This is definitely permissible according to IEEE 754. The only guarantee is that the result of 0/0 is a quiet NaN. The sign bit is not required to be the same between two divisions. Although the target FPU usually produces only specific NaNs, Rust does not (presently) promise that it upholds the semantics of the target FPU.

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2023

Closing in favor of #73328: we are not guaranteeing anything about the sign of a NaN produced by 0.0 / 0.0. (This matches, for instance, the WebAssembly specification.) Better documentation of all this is clearly required, that's what the other issue is about.

@RalfJung RalfJung closed this as completed Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants