No way to get reliable precision of floating-point operations #7333

gafter · 2015-12-08T20:16:44Z

There appears to be no way to compute floating-point operations in IEEE precision. This undermines C# as a candidate language for numerical analysis.

For example, suppose I have two variables

float d1 = ..., d2 = ...;

If I compute their quotient

float d3 = (float)((float)d1 / (float)d2);

the program is permitted to compute the quotient using (and rounding to) "excess" precision, and then round that result to float precision due to the cast. Because of the two separate rounding steps (first to the excess precision, and then to the required precision), that is not the same as computing the quotient in the appropriate IEEE precision. The same problem occurs for both float and double.

See also #3896 for one possible approach to addressing this in the language. This is also related to #7262.

The text was updated successfully, but these errors were encountered:

mikedn · 2015-12-08T21:30:23Z

Maybe I'm missing something but how exactly is this related to language/compiler? The example division expression generates the expected IL code:

   ...
    L_0007: conv.r4 
    L_0008: stloc.0 
    L_0009: conv.r4 
    L_000a: ldloc.0 
    L_000b: conv.r4 
    L_000c: div 
    L_000d: conv.r4

Beyond that it's up to the runtime/JIT/CPU to deal with it.

gafter · 2015-12-08T21:36:19Z

The compiler is permitted to emit code in which the division is computed (rounded) to an excess precision, which is then rounded to float by the final cast. Whether a particular compiler emits code that does that or not is not so much the issue as what the requirements are in the language specification.

I should also note that the CIL specification permits values to be stored with excess precision, too. It is not clear from the CIL specification if the division operation is permitted to result in excess precision. The spec says only

Floating-point division is per IEC 60599:1989.

Absent requirements in the language spec, the programmer cannot rely on the behavior.

mikedn · 2015-12-08T21:58:41Z

So this is about changing the language specification in such a way that a compiler is required to emit the proper casts even though the specification cannot actually guarantee that the operation won't be performed with excess precision? I suppose something like:

Floating-point operations will be performed with the least precision than the underlying implementation allows for the result type of the operation.

instead of

Floating-point operations may be performed with higher precision than the result type of the operation.

gafter · 2015-12-08T22:18:13Z

@mikedn No, the proposal would be to change the language to enable some way for the code to be written in which precision is guaranteed, without any weasel-wording about an underlying implementation that would deprive the language guarantee of any meaning. It is possible that implementing that modified spec might require a modified CLI specification and/or CLR implementation.

gafter · 2015-12-08T22:21:30Z

@CarolEidt Can the CLI spec be strengthened such that single-precision division (for one example among many) is performed using IEC 60599:1989 rules for that precision (even if the result is stored in higher precision)? If it were changed, would Microsoft's implementations have to change to comply with the modified spec?

mikedn · 2015-12-08T22:26:27Z

It is possible that implementing that modified spec might require a modified CLI specification and/or CLR implementation.

Hmm, but that's simply un-implementable (or at least unreasonably implementable) on architectures that lack single precision operations. The typical example would be x86 processors without SSE support (granted, those are pretty much history these days) but there may be others, Power PC it seems.

CarolEidt · 2015-12-08T23:11:54Z

@gafter I believe that what you are requesting is already supported. Rounding a result to a given precision is the same as performing the computation in that precision (unless you are speaking of library methods - which may not actually produce precise results for the given precision). So, if you cast each intermediate result to float, you will get the same answer as if the computation was all done in float. I don't see the value in changing this, except to the extent that some implementations may have faster float operations than double. If there is a hardware implementation that produces a different result for a float divide vs. the case where the operands are both rounded to float, and then the divide is evaluated as double, then rounded to float, then I would think that would be a bug in the hardware.

gafter · 2015-12-09T00:40:19Z

@CarolEidt You say

If there is a hardware implementation that produces a different result for a float divide vs. the case where the operands are both rounded to float, and then the divide is evaluated as double, then rounded to float, then I would think that would be a bug in the hardware.

I don't see why this would be true. Computing a result as a double causes rounding to occur to that (double) precision. Converting that result afterwards to a single causes another rounding of that already rounded result to occur. Why would that sequence of two rounding operations be guaranteed to produce the same result as if the original (exact) operation had been rounded to the narrower precision?

CarolEidt · 2015-12-09T00:51:55Z

@gafter The spec doesn't allow the JIT to elide the cast of the input values to floats. Those are operations that it cannot omit. It CAN choose to perform the operation as a double operation, but since the domain of IEEE double is a superset of the domain of IEEE float, those input values are identical in semantic when represented as double. So, in either case it is operating on the same input values, and the result, even if rounded twice (e.g. to double and then float), will still be the same.

gafter · 2015-12-09T01:02:50Z

@CarolEidt How do you figure "the result, even if rounded twice (e.g. to double and then float), will still be the same"? Imagine the following scenario

a quotient q = n/d is computed in precise precision
when rounded to double, (double)q drops some bits beyond the precision of a double (i.e. rounds down)
when that double is rounded to a float, it is precisely halfway between two floats and rounds down (even).
However, if it had rounded the original exact quotient q to a float, it would have been slightly larger than halfway between two floats and rounded up instead of down. But that "slightly larger" was composed of bits that were dropped when converting to double

Are you asserting that this cannot happen?

See also http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/

gafter · 2015-12-09T01:36:27Z

From http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

There are other examples that can malfunction on extended-based systems even when each subexpression is stored and thus rounded to the same precision. The cause is double-rounding. In the default precision mode, an extended-based system will initially round each result to extended double precision. If that result is then stored to double precision, it is rounded again. The combination of these two roundings can yield a value that is different than what would have been obtained by rounding the first result correctly to double precision. This can happen when the result as rounded to extended double precision is a "halfway case", i.e., it lies exactly halfway between two double precision numbers, so the second rounding is determined by the round-ties-to-even rule. If this second rounding rounds in the same direction as the first, the net rounding error will exceed half a unit in the last place. (Note, though, that double-rounding only affects double precision computations. One can prove that the sum, difference, product, or quotient of two p-bit numbers, or the square root of a p-bit number, rounded first to q bits and then to p bits gives the same value as if the result were rounded just once to p bits provided q >= 2p + 2. Thus, extended double precision is wide enough that single precision computations don't suffer double-rounding.)

Since p=24 for float and q=53 for double, your assertion is correct.

mburbea · 2015-12-09T17:34:48Z

This is an issue though for double precision, no?
double has 64bit, but native float has 80bits.

CarolEidt · 2015-12-09T17:52:26Z

@mburbea On x86 (32 bit), the CRT (and by extension the CLR) sets the precision to double. It is possible to change this by calling out to native code, but by default this is the setting, and it avoids this issue. On x64, the SSE registers are used and extended precision is not supported. Granted, one could argue that this is a potential issue in the specification (as it leaves this open to implementation choice), but I think it is moot for practical purposes.

gafter · 2015-12-09T17:57:57Z

When you say "practical purposes" I think you mean whether or not people performing numerical analysis will consider .NET and C# a viable language and platform to work with.

Given
double a = ..., b = ...;
there ought to be a reliable way to compute
double c = (double)(a + b);
such that the result is IEEE-correct for the double data type. Java has strictfp for that. If we do not have a reliable way to get IEEE math today, then I'll reopen this issue.

gafter · 2015-12-09T18:05:50Z

/cc @MadsTorgersen FYI, the current C# spec (and implementation!) does not appear to provide any way to perform floating-point operations (like +, -, /, and *) on double values and get IEEE-correct results.

mikedn · 2015-12-09T18:11:09Z

Granted, one could argue that this is a potential issue in the specification (as it leaves this open to implementation choice), but I think it is moot for practical purposes.

This kind of stuff happened in the past, the most well known example is Direct3D 9.0 which had the habit of changing x87's precision to single. Anyway, that only affects 32 bit and its more or less ancient history.

If we do not provide that today, then I'll reopen this issue.

But there's nothing the C# language/compiler can do about this anyway, short of emulating floating point operations. At best you could try to convince the JIT team to switch the x86 JIT to use SSE :)

gafter · 2015-12-09T18:12:29Z

@mikedn I agree that a solution to this issue is unlikely to reside entirely in the C# compiler.

CarolEidt · 2015-12-09T18:30:29Z

@mikedn the x86 port of RyuJIT will be using SSE. However, for this issue, unless a program explicitly changes the evaluation mode the evaluation will be in double anyway. I think that @gafter is actually more concerned that this is not a requirement of the spec, and there is also no way to force it to happen.

mburbea · 2015-12-09T18:34:04Z

Carol, does this hold true for things like sin, cos and the other trig
functions? Do they guarantee strict 32/64 bit precision? I know that java
has StrictMath class for that very purpose.

On Wed, Dec 9, 2015, 1:31 PM Carol Eidt notifications@github.com wrote:

@mikedn https://github.com/mikedn the x86 port of RyuJIT will be using
SSE. However, for this issue, unless a program explicitly changes the
evaluation mode the evaluation will be in double anyway. I think that
@gafter https://github.com/gafter is actually more concerned that this
is not a requirement of the spec, and there is also no way to force it to
happen.

—
Reply to this email directly or view it on GitHub
#7333 (comment).

CarolEidt · 2015-12-09T18:56:10Z

@mburbea I don' t know the answer to that.They simply invoke the CRT functions.

mikedn · 2015-12-09T22:32:17Z

However, for this issue, unless a program explicitly changes the evaluation mode the evaluation will be in double anyway.

@CarolEidt Are you saying that for @gafter's example RyuJIT x86 will generate divsd instead of divss like RyuJIT x64 does?

does this hold true for things like sin, cos and the other trig functions? Do they guarantee strict 32/64 bit precision?

Currently there aren't any float overloads of those functions so the question is a bit off. In any case, the current x86 JIT uses x87's transcendental instructions and the precision of those instructions is up for debate.

I don' t know the answer to that.They simply invoke the CRT functions.

Hmm, I don't know what compiler & options are used to compile the desktop x64 CLR but to my surprise the CRT function calls are replaced by x87 transcendental instructions. Odd, I was almost sure that VC++ always uses library implementation on x64 and on x86 with SSE enabled.

SamB · 2015-12-31T18:06:27Z

@mikedn: Intel doesn't seem to agree with you that the precision is up for debate; their blog entry on the subject, posted the same day as the Random ASCII one, is pretty clear about the fact that the transcendental instructions get VERY inaccurate in the vicinity of various multiples of pi/2.

(Confusingly, this cannot happen for values that have been rounded to single-precision: single precision's 24 bits can't cancel out enough bits of the effectively 68-bit approximation of pi that the CPU uses for range reduction to cause any problems, and the result actually has MORE significant bits than for doubles in the worst case.)

gafter · 2017-03-24T20:07:35Z

Closing, as I do not believe there is enough interest to pursue this.

tannergooding · 2017-03-24T20:17:12Z

@gafter, It might actually be worth noting that we do now have both single and double precision Math functions (System.Math and System.MathF). We also are now (as of a couple days ago) fully IEEE-compliant in relation to the inputs and required outputs for special mathematical functions of the IEEE Std 754:2008 Spec (at least on RyuJIT). Test validation to ensure we remain IEEE compliant was just merged minutes ago.

BBI-YggyKing · 2017-10-04T19:36:49Z

This issue is also of interest with respect to deterministic simulation in peer-to-peer online games. Such games implement networked multiplayer by passing inputs between peers and then running the entire simulation on each peer. (In a similar vein, an input stream can be saved and replayed.) For this to work, the simulation must produce the same results on every peer - games may even support PC vs. console vs. tablet gameplay, requiring reproducibility across very different CPU architectures.

We implemented such a peer-to-peer game using C# (Homeworld: Deserts of Kharak), and we decided that we could not guarantee repeatable floating point operations, so our simulation uses fixed point math implemented in software. Needless to say, if anyone can assure me that repeatable floating point calculations can be achieved in C# I would love to get back all those CPU cycles we are wasting in our math library!! :-)

NightElfik · 2019-11-30T04:04:13Z

Having an option for deterministic float arithmetic in C# is something that would be seriously useful for many people and is really hard to get without compiler support. Even adding extra casts to force truncation is not sufficient, makes code hard to read and write, and is error-prone.

We were experiencing test failures on our build server that were not reproducible locally and were caused by non-deterministic floats arithmetic between the two machines. These were so annoying and hard to debug that we were forced to rewrite our simulation in a custom fixed-point arithmetic just to be able to debug things and move forward.

The downside is that working with fixed point numbers really sucks, seriously! One does not appreciate the power of floats until they are taken away from them. One has to be extremely careful wile writing arithmetic code. For example, you need to multiply first, then divide, otherwise you are significantly loosing precision. Try computing x^4, it either overflows outside of fixed range or is zero (for x ~ 0.1).

These days, C# is also being used more in gaming industry, where many titles rely on having identical game state on separate machines. These are usually games with large game state like RTS where the whole state cannot be synced and only players actions are exchanged over the network.

Please re-consider this issue. Thank you!

Here are some relevant Stack overflow threads:

gafter · 2019-11-30T20:48:05Z

Even adding extra casts to force truncation is not sufficient

Can you please expand on that? Under what circumstances is that not sufficient? Can you please give an example where you cannot force the use of IEEE arithmetic?

NightElfik · 2019-11-30T21:16:22Z

Thanks for the follow-up Neal. I was referring to this SO answer, that notes:

Q: Is consistent truncation enough to guarantee reproducibility across machines?

A: No. I encourage you to read section 12.1.3, which has much interesting to say on the subject of denormals and NaNs.`

I was unable to find the referred spec, you might know what is it referring to?

However, even if extra casts to force truncations were enough, it is rather tedious and error prone to require humans to do this in the entire codebase.

For example expressions x *= 2; must be rewritten to x = (float)(x * 2); etc. It is also unclear how this behaves across function boundaries. If a small function gets inlined, does compiler emit extra cast? Likely not, so then all rerun statements better be written as return (float)x;.

Unfortunately, I do not have specific example where casts are not sufficient. Do you think they should be sufficient?

mikedn · 2019-11-30T21:59:39Z

We were experiencing test failures on our build server that were not reproducible locally and were caused by non-deterministic floats arithmetic between the two machines.

What CPUs and what versions of .NET are involved?

which has much interesting to say on the subject of denormals and NaNs

Some of that may be subject to hardware specifics and configuration. There's no a lot that the language can do in this case, short of using emulated FP.

However, even if extra casts to force truncations were enough, it is rather tedious and error prone to require humans to do this in the entire codebase.

At least on .NET Core such casts shouldn't be necessary as all float operations are performed using single precision on all supported target architectures (x86, x64, arm32, arm64). For example, x *= 2 generates:

C5FA59050D000000     vmulss   xmm0, dword ptr [reloc @RWD00]

Of course, one can never exclude the possibility of having bugs in the JIT compilers. But if that's the case a repro would be needed.

gafter · 2019-12-01T19:46:55Z

I do not have specific example where casts are not sufficient. Do you think they should be sufficient?

Yes. See #7333 (comment) in particular the quoted part in bold.

NightElfik · 2019-12-02T00:18:46Z

Ok, so I have resurrected our old test that was failing and it is still failing. I have extracted it (and all its deps) to a separate repo linked below.

Now, the interesting part is that when I run the test via console app, I often get different results for debug and release modes, but they are consistent with our build server and my colleague's machine (all have Intel CPUs).

However, when I run the same code via NUnit, it does pass on my machine and on my colleagues machine, but it does fails on our build server.

So NUnit is doing something funny here?

By any chance could anyone try this too? The test is not very scientific but that's what we had before. Thanks!

https://github.com/NightElfik/FloatDeterminismTest

tannergooding · 2019-12-02T02:13:57Z

It's worth noting that you are targeting .NET Framework 4.7.

This means that you will get RyuJIT if running in 64-bit mode. RyuJIT will use the SSE/SSE2 instructions that are IEEE 754 compliant and round directly to the target precision.

You will get the legacy JIT if running in 32-bit mode. The legacy JIT uses the x87 FPU stack while setting the rounding precision to 64-bits.

On .NET Core (I believe 2.0 and later), you will always get RyuJIT and it will always use SSE/SSE2 for computations.

tannergooding · 2019-12-02T02:30:25Z

Console app (any) Debug is set to Prefer 32-bit in Debug, but not Release which is why it gets different results.

The results I get for Console app (64 bit) Debug/Release and NUnit (64-bit) Debug/Release are:

10
Without casts:            2,    0.2251397,     1.938196 (0x40000000, 0x3E668B04, 0x3FF816D0)
With casts:               2,    0.2251397,     1.938196 (0x40000000, 0x3E668B04, 0x3FF816D0)

100
Without casts:     1.439869,            2,    0.1769729 (0x3FB84DA0, 0x40000000, 0x3E353864)
With casts:        1.439869,            2,    0.1769729 (0x3FB84DA0, 0x40000000, 0x3E353864)

1000
Without casts:   -0.4723696,            2,    -1.706456 (0xBEF1DA6C, 0x40000000, 0xBFDA6D2A)
With casts:      -0.4723696,            2,    -1.706456 (0xBEF1DA6C, 0x40000000, 0xBFDA6D2A)

10000
Without casts:    -1.075579,    -1.523837,            2 (0xBF89AC96, 0xBFC30D17, 0x40000000)
With casts:       -1.075579,    -1.523837,            2 (0xBF89AC96, 0xBFC30D17, 0x40000000)

100000
Without casts:    -1.200156,    0.1734155,           -2 (0xBF999EB7, 0x3E3193D8, 0xC0000000)
With casts:       -1.200156,    0.1734155,           -2 (0xBF999EB7, 0x3E3193D8, 0xC0000000)

1000000
Without casts:     1.621604,    0.3454828,            2 (0x3FCF90BC, 0x3EB0E320, 0x40000000)
With casts:        1.621604,    0.3454828,            2 (0x3FCF90BC, 0x3EB0E320, 0x40000000)

The results I get for Console app (32 bit) Debug/Release and NUnit (32 bit) Debug are:

10
Without casts:            2,    0.2251443,     1.938195 (0x40000001, 0x3E668C38, 0x3FF816CA)
With casts:               2,    0.2251351,     1.938197 (0x40000000, 0x3E6689D2, 0x3FF816D9)
MISMATCH!!!

100
Without casts:    -1.119223,   -0.2299059,            2 (0xBF8F42AF, 0xBE6B6C76, 0x40000000)
With casts:               2,   -0.5220664,    0.4623849 (0x40000000, 0xBF05A624, 0x3EECBDB6)
MISMATCH!!!

1000
Without casts:    0.4805907,           -2,      1.84443 (0x3EF60FFC, 0xC0000000, 0x3FEC1648)
With casts:              -2,     -1.21301,   -0.4805112 (0xC0000000, 0xBF9B43E6, 0xBEF60590)
MISMATCH!!!

10000
Without casts:           -2,   -0.3198557,    -1.839262 (0xC0000000, 0xBEA3C421, 0xBFEB6CEC)
With casts:          1.6705,           -2,    0.2956932 (0x3FD5D2EE, 0xC0000000, 0x3E97651B)
MISMATCH!!!

100000
Without casts:     1.605168,    -1.940973,            2 (0x3FCD7621, 0xBFF871CB, 0x40000000)
With casts:      -0.7341976,     1.686916,            2 (0xBF3BF460, 0x3FD7ECDC, 0x40000000)
MISMATCH!!!

1000000
Without casts:           -2,    0.7777206,     1.835202 (0xC0000000, 0x3F4718B3, 0x3FEAE7E3)
With casts:              -2,    0.8061017,     -1.16992 (0xC0000000, 0x3F4E5CAE, 0xBF95BFF0)
MISMATCH!!!

The results I get for NUnit (32 bit) Release are:

10
Without casts:            2,    0.2251637,     1.938193 (0x3FFFFFFF, 0x3E669153, 0x3FF816B6)
With casts:               2,    0.2251346,     1.938198 (0x3FFFFFFE, 0x3E6689B0, 0x3FF816DD)
MISMATCH!!!

100
Without casts:    -1.244922,            2,     0.876503 (0xBF9F599D, 0x3FFFFFFE, 0x3F606280)
With casts:       -1.364468,           -2,     1.616698 (0xBFAEA6E2, 0xC0000000, 0x3FCEEFF4)
MISMATCH!!!

1000
Without casts:    0.0662517,   -0.6017641,            2 (0x3D87AEF9, 0xBF1A0D36, 0x3FFFFFFF)
With casts:       0.6269189,     -1.13679,           -2 (0x3F207DC2, 0xBF918256, 0xBFFFFFFE)
MISMATCH!!!

10000
Without casts:    0.2860791,            2,     1.872738 (0x3E9278F7, 0x40000000, 0x3FEFB5E4)
With casts:        1.832273,            2,   -0.3357688 (0x3FEA87EF, 0x3FFFFFFE, 0xBEABE9E2)
MISMATCH!!!

100000
Without casts:    -1.388948,           -2,   -0.2181758 (0xBFB1C90C, 0xC0000000, 0xBE5F697B)
With casts:         1.48769,           -2,     1.662687 (0x3FBE6CA4, 0xC0000000, 0x3FD4D2F0)
MISMATCH!!!

1000000
Without casts:            2,   -0.2598107,   -0.3831332 (0x40000000, 0xBE8505E9, 0xBEC42A08)
With casts:       -1.767049,            2,    0.8859042 (0xBFE22EAC, 0x40000000, 0x3F62CA9E)
MISMATCH!!!

tannergooding · 2019-12-02T03:02:02Z

NUnit (32 bit) Release gets different results unless you run it under a debugger with Supress JIT optimization on module load (Managed only), in which case it gets the same as NUnit (32 bit) Debug.

The same goes for Console app (32 bit) Release. Running it without a debugger or unchecking Supress JIT optimization on module load (Managed only) causes it to get the same results as NUnit (32 bit) Release (that is 0x3FFFFFFF, 0x3E669153, 0x3FF816B6).

tannergooding · 2019-12-02T03:32:13Z

With Casts is getting a different result under the legacy JIT because you don't have all the right casts yet.

Notably in intersectAabbWithCasts, float scale = (float)(1f / (float)Math.Sqrt(dotWithCasts(newDirection, newDirection))); should be float scale = (float)(1f / (float)Math.Sqrt((float)dotWithCasts(newDirection, newDirection)));

There are likely others as well, but I haven't gone through the rest of the code extensively.

CyrusNajmabadi · 2019-12-02T04:24:50Z

Note: i tend to agree that this is somewhat of a hard space to get right. It's really easy to miss a necessary cast to get consistent resslts.

That said, i think a good-enough solution here is to just write an analyzer to check this stuff and to tell you when you're missing a cast.

gafter · 2019-12-02T17:11:43Z

There are some floating-point operations that do not have defined results, and for which Intel and AMD processors give different results. See, for example, #37772 where we make the compiler resilient to these differences during constant-folding. We are not going to paper over these differences.

tannergooding · 2019-12-02T17:21:39Z

@gafter, Do you mean x86 and ARM? As far as I'm aware, both Intel and AMD (which are implementations of x86) are fully compatible with each other (whether using the x87 FPU stack or the SSE+ instructions).

gafter · 2019-12-02T22:39:01Z

No, I mean Intel and AMD. I believe they differ in some circumstances truncating a floating-point value to an integral value when the result is undefined. I could be wrong.

tannergooding · 2019-12-02T23:03:45Z

I don't believe that is the case. Both of them document for the x87 FPU stack and SSE+ instructions that the "indefinite integer value" is returned when the converted result cannot be represented.

Intel documents this value as:

(2^(w-1), where w represents the number of bits in the destination format)

AMD documents this value as:

8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers

So both are ultimately the same.

gafter · 2019-12-02T23:52:56Z

How do you explain the output of this program (on Intel and AMD)?

using System;
class Program
{
  static void Main()
  {
    double aslocal;
    aslocal = 65535.17567;
    Console.WriteLine(unchecked((short)aslocal));
    aslocal = 65536.17567;
    Console.WriteLine(unchecked((short)aslocal));
  }
}

tannergooding · 2019-12-03T00:20:46Z

Both Intel and AMD produce the same result (whether using the x87 FPU or SSE+):

-1
0

Both inputs generate the same assembly.

SSE+:

00007FFDC205093F  vmovsd      xmm0,qword ptr [address]
00007FFDC2050948  vmovsd      qword ptr [stack_addr],xmm0  
00007FFDC205094E  vcvttsd2si  ecx,mmword ptr [stack_addr]  
00007FFDC2050954  movsx       rcx,cx  
00007FFDC2050958  call        00007FFE17952980  
00007FFDC205095D  nop

x87 FPU:

00DF08C7 DD 05 10 09 DF 00    fld         qword ptr ds:[addr]  
00DF08CD DD 5D C0             fstp        qword ptr [stack_addr]  
00DF08D0 F2 0F 10 45 C0       movsd       xmm0,mmword ptr [stack_addr]  
00DF08D5 F2 0F 2C C8          cvttsd2si   ecx,xmm0  
00DF08D9 0F BF C9             movsx       ecx,cx  
00DF08DC E8 37 11 1C 73       call        73FB1A18  
00DF08E1 90                   nop

The SSE+ instructions that support converting (cvtsd2si, cvtss2si, cvttsd2si, and cvttss2si) all deal with only 32-bit or 64-bit results.

The x87 FPU instructions that support converting (fist and fistp) supports 16-bit, 32-bit, or 64-bit results. However, it looks like even under the legacy JIT, it still emits cvtts*2si rather than fist/fistp

Given that the instruction emitted only supports 32-bit/64-bit results, the results are 65535 (0xFFFF) and 65536 (0x10000). These, when cast to short and then sign-extended back to int are 0 and -1, respectively.

This is probably a runtime issue that should be logged, checked against the back-compat bar, and addressed as appropriate.

tannergooding · 2019-12-03T00:32:21Z

I logged dotnet/runtime#461

NightElfik · 2019-12-03T05:17:12Z

Thank you Tanner and Neal for the discussion. Would it be possible to document the float arithmetic behavior and have it as something people can refer to as "ground truth"? Something that would show when people search for "C# float determinism" or similar. Maybe a blog post or medium post? Currently this search query gives many outdated answers.

IronWarrior · 2021-12-23T07:18:13Z

Was there ever any follow up on this, with respect to a "deterministic" or "strict" mode? Echoing the above there tends to be a ton of varying information out there on the subject...for other perspectives I know that some Rust libraries (Rapier, for example) have cross platform floating point determinism, guaranteed by using software implementations of trig etc. to avoid hardware differences there. As well, Unity has previously explored determinism with their LLVM based Burst compiler (they found some issues with denormals between ARMv7 and 8).

gafter added 0 - Backlog Area-Language Design labels Dec 8, 2015

gafter added Resolution-Not Reproducible The described behavior could not be reproduced by developers and removed 0 - Backlog labels Dec 9, 2015

gafter closed this as completed Dec 9, 2015

gafter reopened this Dec 9, 2015

gafter added Bug and removed Resolution-Not Reproducible The described behavior could not be reproduced by developers labels Dec 9, 2015

gafter changed the title ~~Correct precision of floating-point operations~~ No way to get reliable precision of floating-point operations Dec 9, 2015

gafter closed this as completed Mar 24, 2017

paavohuhtala mentioned this issue Jan 3, 2018

Memo: Networking OpenSAGE/OpenSAGE#34

Open

tannergooding mentioned this issue Dec 3, 2019

Converting a float/double to byte/sbyte or short/ushort does not correctly account for overflow dotnet/runtime#461

Closed

abelbraaksma mentioned this issue Jun 21, 2020

Be aware of differences between x86 and x64 floating point arithmetic when writing tests dotnet/fsharp#9522

Closed

No way to get reliable precision of floating-point operations #7333

No way to get reliable precision of floating-point operations #7333

Comments

gafter commented Dec 8, 2015

mikedn commented Dec 8, 2015

gafter commented Dec 8, 2015

mikedn commented Dec 8, 2015

gafter commented Dec 8, 2015

gafter commented Dec 8, 2015

mikedn commented Dec 8, 2015

CarolEidt commented Dec 8, 2015

gafter commented Dec 9, 2015

CarolEidt commented Dec 9, 2015

gafter commented Dec 9, 2015

gafter commented Dec 9, 2015

mburbea commented Dec 9, 2015

CarolEidt commented Dec 9, 2015

gafter commented Dec 9, 2015

gafter commented Dec 9, 2015

mikedn commented Dec 9, 2015

gafter commented Dec 9, 2015

CarolEidt commented Dec 9, 2015

mburbea commented Dec 9, 2015

CarolEidt commented Dec 9, 2015

mikedn commented Dec 9, 2015

SamB commented Dec 31, 2015

gafter commented Mar 24, 2017

tannergooding commented Mar 24, 2017

BBI-YggyKing commented Oct 4, 2017

NightElfik commented Nov 30, 2019

gafter commented Nov 30, 2019

NightElfik commented Nov 30, 2019

mikedn commented Nov 30, 2019

gafter commented Dec 1, 2019

NightElfik commented Dec 2, 2019

tannergooding commented Dec 2, 2019

tannergooding commented Dec 2, 2019

tannergooding commented Dec 2, 2019

tannergooding commented Dec 2, 2019

CyrusNajmabadi commented Dec 2, 2019

gafter commented Dec 2, 2019

tannergooding commented Dec 2, 2019 • edited Loading

gafter commented Dec 2, 2019 • edited Loading

tannergooding commented Dec 2, 2019

gafter commented Dec 2, 2019

tannergooding commented Dec 3, 2019 • edited Loading

tannergooding commented Dec 3, 2019

NightElfik commented Dec 3, 2019

IronWarrior commented Dec 23, 2021

tannergooding commented Dec 2, 2019 •

edited

Loading

gafter commented Dec 2, 2019 •

edited

Loading

tannergooding commented Dec 3, 2019 •

edited

Loading