-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to get reliable precision of floating-point operations #7333
Comments
Maybe I'm missing something but how exactly is this related to language/compiler? The example division expression generates the expected IL code:
Beyond that it's up to the runtime/JIT/CPU to deal with it. |
The compiler is permitted to emit code in which the division is computed (rounded) to an excess precision, which is then rounded to I should also note that the CIL specification permits values to be stored with excess precision, too. It is not clear from the CIL specification if the division operation is permitted to result in excess precision. The spec says only
Absent requirements in the language spec, the programmer cannot rely on the behavior. |
So this is about changing the language specification in such a way that a compiler is required to emit the proper casts even though the specification cannot actually guarantee that the operation won't be performed with excess precision? I suppose something like:
instead of
|
@mikedn No, the proposal would be to change the language to enable some way for the code to be written in which precision is guaranteed, without any weasel-wording about an underlying implementation that would deprive the language guarantee of any meaning. It is possible that implementing that modified spec might require a modified CLI specification and/or CLR implementation. |
@CarolEidt Can the CLI spec be strengthened such that single-precision division (for one example among many) is performed using IEC 60599:1989 rules for that precision (even if the result is stored in higher precision)? If it were changed, would Microsoft's implementations have to change to comply with the modified spec? |
Hmm, but that's simply un-implementable (or at least unreasonably implementable) on architectures that lack single precision operations. The typical example would be x86 processors without SSE support (granted, those are pretty much history these days) but there may be others, Power PC it seems. |
@gafter I believe that what you are requesting is already supported. Rounding a result to a given precision is the same as performing the computation in that precision (unless you are speaking of library methods - which may not actually produce precise results for the given precision). So, if you cast each intermediate result to float, you will get the same answer as if the computation was all done in float. I don't see the value in changing this, except to the extent that some implementations may have faster float operations than double. If there is a hardware implementation that produces a different result for a float divide vs. the case where the operands are both rounded to float, and then the divide is evaluated as double, then rounded to float, then I would think that would be a bug in the hardware. |
@CarolEidt You say
I don't see why this would be true. Computing a result as a double causes rounding to occur to that (double) precision. Converting that result afterwards to a single causes another rounding of that already rounded result to occur. Why would that sequence of two rounding operations be guaranteed to produce the same result as if the original (exact) operation had been rounded to the narrower precision? |
@gafter The spec doesn't allow the JIT to elide the cast of the input values to floats. Those are operations that it cannot omit. It CAN choose to perform the operation as a double operation, but since the domain of IEEE double is a superset of the domain of IEEE float, those input values are identical in semantic when represented as double. So, in either case it is operating on the same input values, and the result, even if rounded twice (e.g. to double and then float), will still be the same. |
@CarolEidt How do you figure "the result, even if rounded twice (e.g. to double and then float), will still be the same"? Imagine the following scenario
Are you asserting that this cannot happen? See also http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/ |
From http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Since p=24 for float and q=53 for double, your assertion is correct. |
This is an issue though for double precision, no? |
@mburbea On x86 (32 bit), the CRT (and by extension the CLR) sets the precision to double. It is possible to change this by calling out to native code, but by default this is the setting, and it avoids this issue. On x64, the SSE registers are used and extended precision is not supported. Granted, one could argue that this is a potential issue in the specification (as it leaves this open to implementation choice), but I think it is moot for practical purposes. |
When you say "practical purposes" I think you mean whether or not people performing numerical analysis will consider .NET and C# a viable language and platform to work with. Given |
/cc @MadsTorgersen FYI, the current C# spec (and implementation!) does not appear to provide any way to perform floating-point operations (like |
This kind of stuff happened in the past, the most well known example is Direct3D 9.0 which had the habit of changing x87's precision to single. Anyway, that only affects 32 bit and its more or less ancient history.
But there's nothing the C# language/compiler can do about this anyway, short of emulating floating point operations. At best you could try to convince the JIT team to switch the x86 JIT to use SSE :) |
@mikedn I agree that a solution to this issue is unlikely to reside entirely in the C# compiler. |
@mikedn the x86 port of RyuJIT will be using SSE. However, for this issue, unless a program explicitly changes the evaluation mode the evaluation will be in double anyway. I think that @gafter is actually more concerned that this is not a requirement of the spec, and there is also no way to force it to happen. |
Carol, does this hold true for things like sin, cos and the other trig On Wed, Dec 9, 2015, 1:31 PM Carol Eidt notifications@github.com wrote:
|
@mburbea I don' t know the answer to that.They simply invoke the CRT functions. |
@CarolEidt Are you saying that for @gafter's example RyuJIT x86 will generate
Currently there aren't any float overloads of those functions so the question is a bit off. In any case, the current x86 JIT uses x87's transcendental instructions and the precision of those instructions is up for debate.
Hmm, I don't know what compiler & options are used to compile the desktop x64 CLR but to my surprise the CRT function calls are replaced by x87 transcendental instructions. Odd, I was almost sure that VC++ always uses library implementation on x64 and on x86 with SSE enabled. |
@mikedn: Intel doesn't seem to agree with you that the precision is up for debate; their blog entry on the subject, posted the same day as the Random ASCII one, is pretty clear about the fact that the transcendental instructions get VERY inaccurate in the vicinity of various multiples of pi/2. (Confusingly, this cannot happen for values that have been rounded to single-precision: single precision's 24 bits can't cancel out enough bits of the effectively 68-bit approximation of pi that the CPU uses for range reduction to cause any problems, and the result actually has MORE significant bits than for doubles in the worst case.) |
Closing, as I do not believe there is enough interest to pursue this. |
@gafter, It might actually be worth noting that we do now have both single and double precision Math functions (System.Math and System.MathF). We also are now (as of a couple days ago) fully IEEE-compliant in relation to the inputs and required outputs for special mathematical functions of the IEEE Std 754:2008 Spec (at least on RyuJIT). Test validation to ensure we remain IEEE compliant was just merged minutes ago. |
This issue is also of interest with respect to deterministic simulation in peer-to-peer online games. Such games implement networked multiplayer by passing inputs between peers and then running the entire simulation on each peer. (In a similar vein, an input stream can be saved and replayed.) For this to work, the simulation must produce the same results on every peer - games may even support PC vs. console vs. tablet gameplay, requiring reproducibility across very different CPU architectures. We implemented such a peer-to-peer game using C# (Homeworld: Deserts of Kharak), and we decided that we could not guarantee repeatable floating point operations, so our simulation uses fixed point math implemented in software. Needless to say, if anyone can assure me that repeatable floating point calculations can be achieved in C# I would love to get back all those CPU cycles we are wasting in our math library!! :-) |
Having an option for deterministic float arithmetic in C# is something that would be seriously useful for many people and is really hard to get without compiler support. Even adding extra casts to force truncation is not sufficient, makes code hard to read and write, and is error-prone. We were experiencing test failures on our build server that were not reproducible locally and were caused by non-deterministic floats arithmetic between the two machines. These were so annoying and hard to debug that we were forced to rewrite our simulation in a custom fixed-point arithmetic just to be able to debug things and move forward. The downside is that working with fixed point numbers really sucks, seriously! One does not appreciate the power of floats until they are taken away from them. One has to be extremely careful wile writing arithmetic code. For example, you need to multiply first, then divide, otherwise you are significantly loosing precision. Try computing x^4, it either overflows outside of fixed range or is zero (for x ~ 0.1). These days, C# is also being used more in gaming industry, where many titles rely on having identical game state on separate machines. These are usually games with large game state like RTS where the whole state cannot be synced and only players actions are exchanged over the network. Please re-consider this issue. Thank you! Here are some relevant Stack overflow threads: |
Can you please expand on that? Under what circumstances is that not sufficient? Can you please give an example where you cannot force the use of IEEE arithmetic? |
Thanks for the follow-up Neal. I was referring to this SO answer, that notes:
I was unable to find the referred spec, you might know what is it referring to? However, even if extra casts to force truncations were enough, it is rather tedious and error prone to require humans to do this in the entire codebase. For example expressions Unfortunately, I do not have specific example where casts are not sufficient. Do you think they should be sufficient? |
What CPUs and what versions of .NET are involved?
Some of that may be subject to hardware specifics and configuration. There's no a lot that the language can do in this case, short of using emulated FP.
At least on .NET Core such casts shouldn't be necessary as all C5FA59050D000000 vmulss xmm0, dword ptr [reloc @RWD00] Of course, one can never exclude the possibility of having bugs in the JIT compilers. But if that's the case a repro would be needed. |
Yes. See #7333 (comment) in particular the quoted part in bold. |
Ok, so I have resurrected our old test that was failing and it is still failing. I have extracted it (and all its deps) to a separate repo linked below. Now, the interesting part is that when I run the test via console app, I often get different results for debug and release modes, but they are consistent with our build server and my colleague's machine (all have Intel CPUs). However, when I run the same code via NUnit, it does pass on my machine and on my colleagues machine, but it does fails on our build server. So NUnit is doing something funny here? By any chance could anyone try this too? The test is not very scientific but that's what we had before. Thanks! https://github.com/NightElfik/FloatDeterminismTest |
It's worth noting that you are targeting .NET Framework 4.7. This means that you will get RyuJIT if running in 64-bit mode. RyuJIT will use the SSE/SSE2 instructions that are IEEE 754 compliant and round directly to the target precision. You will get the legacy JIT if running in 32-bit mode. The legacy JIT uses the x87 FPU stack while setting the rounding precision to 64-bits. On .NET Core (I believe 2.0 and later), you will always get RyuJIT and it will always use SSE/SSE2 for computations. |
The results I get for
The results I get for
The results I get for
|
The same goes for |
Notably in There are likely others as well, but I haven't gone through the rest of the code extensively. |
Note: i tend to agree that this is somewhat of a hard space to get right. It's really easy to miss a necessary cast to get consistent resslts. That said, i think a good-enough solution here is to just write an analyzer to check this stuff and to tell you when you're missing a cast. |
There are some floating-point operations that do not have defined results, and for which Intel and AMD processors give different results. See, for example, #37772 where we make the compiler resilient to these differences during constant-folding. We are not going to paper over these differences. |
@gafter, Do you mean |
No, I mean Intel and AMD. I believe they differ in some circumstances truncating a floating-point value to an integral value when the result is undefined. I could be wrong. |
I don't believe that is the case. Both of them document for the x87 FPU stack and SSE+ instructions that the "indefinite integer value" is returned when the converted result cannot be represented. Intel documents this value as:
AMD documents this value as:
So both are ultimately the same. |
How do you explain the output of this program (on Intel and AMD)? using System;
class Program
{
static void Main()
{
double aslocal;
aslocal = 65535.17567;
Console.WriteLine(unchecked((short)aslocal));
aslocal = 65536.17567;
Console.WriteLine(unchecked((short)aslocal));
}
} |
Both Intel and AMD produce the same result (whether using the x87 FPU or SSE+):
Both inputs generate the same assembly. SSE+: 00007FFDC205093F vmovsd xmm0,qword ptr [address]
00007FFDC2050948 vmovsd qword ptr [stack_addr],xmm0
00007FFDC205094E vcvttsd2si ecx,mmword ptr [stack_addr]
00007FFDC2050954 movsx rcx,cx
00007FFDC2050958 call 00007FFE17952980
00007FFDC205095D nop x87 FPU: 00DF08C7 DD 05 10 09 DF 00 fld qword ptr ds:[addr]
00DF08CD DD 5D C0 fstp qword ptr [stack_addr]
00DF08D0 F2 0F 10 45 C0 movsd xmm0,mmword ptr [stack_addr]
00DF08D5 F2 0F 2C C8 cvttsd2si ecx,xmm0
00DF08D9 0F BF C9 movsx ecx,cx
00DF08DC E8 37 11 1C 73 call 73FB1A18
00DF08E1 90 nop The SSE+ instructions that support converting ( The x87 FPU instructions that support converting ( Given that the instruction emitted only supports 32-bit/64-bit results, the results are This is probably a runtime issue that should be logged, checked against the back-compat bar, and addressed as appropriate. |
I logged dotnet/runtime#461 |
Thank you Tanner and Neal for the discussion. Would it be possible to document the float arithmetic behavior and have it as something people can refer to as "ground truth"? Something that would show when people search for "C# float determinism" or similar. Maybe a blog post or medium post? Currently this search query gives many outdated answers. |
Was there ever any follow up on this, with respect to a "deterministic" or "strict" mode? Echoing the above there tends to be a ton of varying information out there on the subject...for other perspectives I know that some Rust libraries (Rapier, for example) have cross platform floating point determinism, guaranteed by using software implementations of trig etc. to avoid hardware differences there. As well, Unity has previously explored determinism with their LLVM based Burst compiler (they found some issues with denormals between ARMv7 and 8). |
There appears to be no way to compute floating-point operations in IEEE precision. This undermines C# as a candidate language for numerical analysis.
For example, suppose I have two variables
If I compute their quotient
the program is permitted to compute the quotient using (and rounding to) "excess" precision, and then round that result to
float
precision due to the cast. Because of the two separate rounding steps (first to the excess precision, and then to the required precision), that is not the same as computing the quotient in the appropriate IEEE precision. The same problem occurs for bothfloat
anddouble
.See also #3896 for one possible approach to addressing this in the language. This is also related to #7262.
The text was updated successfully, but these errors were encountered: