Bizarre decimal-to-double behavior difference on same machine, same code #102149

cyanite · 2024-05-13T10:18:32Z

Description

I made a dotnet class library with a single method:

public static double ConvertForMe(decimal d)
{
       return (double)d;
}

I then wrote a small console program that calls this, like this:

var dec = -4619.7403751459050600M;
var doubleDll = DoubleConversion.DoubleConversion.ConvertForMe(dec);
Console.WriteLine($"Decimal:   {dec}");
Console.WriteLine($"Double dll: {doubleDll}");.

I get:

Decimal:   -4619,7403751459050600
Double dll: -4619,740375145904

That's fine, I guess, and this can be reproduced on various machines and online dotnet playgrounds. However, on the same machine using the same, previously compiled, class library, but calling it from inside a larger .NET 8-based server we have running there in test, we instead get:

Decimal:   -4619,7403751459050600
Double dll: -4619,740375145905

Which amounts to a difference in the LSB in the Int64 representation of the double (using BitConverter). Since this is financial software, people are wondering about this, and I can't explain it. This only seems to happen on this particular machine (and only when invoked via our larger server, using Assembly.Load and IoC).

Reproduction Steps

As described above. We haven't been able to reproduce the abnormal result in any other contexts so far.

Expected behavior

The version ending in 4. Well, it's hard to say. I think the version ending in 5 seems more correct, but every other context, machine and online playground seems to give the one ending in 4.

Actual behavior

When we call the method in the dll from our (ultimately .NET 8 console program started as Windows service), we get the version ending in 5.

Regression?

No response

Known Workarounds

No response

Configuration

The problem happens on the same machine, but the environment is slightly different. Common environment variables that might influence this (but that are present both in the 4 and 5 case):

DOTNET_gcServer: 1

Environment variables unique to the case where we get the abnormal behavior:

COMPlus_ReadyToRun: 0
CORECLR_ENABLE_PROFILING: 1
CORECLR_PROFILER: {some guid}
CORECLR_PROFILER_PATH_64: some-path-to-dynatrace/.../oneagentloader.dll

I have tried to manually set the COMPlus variable but it doesn't reproduce the abnormal behavior. I haven't investigated if the tracing can be related.

Other configuration:
.NET 8.0.2
Microsoft Windows 10.0.14393

Both programs run as framework-dependent.

Other information

No response

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-05-13T10:45:02Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

huoyaoyuan · 2024-05-13T10:57:18Z

The actual output doesn't change. What has changed is the formatting of double. You can use BitConverter.DoubleToInt64Bits to verify the underlying bits of double.

See also #59982 etc.

cyanite · 2024-05-13T11:33:35Z

Right, I will check that. The weird part isn't how the number converts, but how it converts, or formats, differently on the same machine, though. But I will check the bits to see if they are identical. Also, it does seem to be related to Dynatrace in some way, which I'll also check.

cyanite · 2024-05-13T12:28:59Z

@huoyaoyuan it seems it's not just the string formatting that's different. The raw bits in the "4" case are C0B20BBD8939BE6E whereas they are C0B20BBD8939BE6F in the "5" case.

The abnormal case does, however, stop when Dynatrace is not used, for some reason.

huoyaoyuan · 2024-05-13T13:04:04Z

What's your .NET Framework version installed on your system? I can't reproduce it with .NET Framework 4.8.1.

cyanite · 2024-05-13T13:11:17Z

I assume we have .NET 4.8 installed, I don't know. Whatever comes with Windows 10 version 14393 (which is from 2016, "Redstone"). But all these programs are .NET Core (.NET 8). The installed runtimes on the machine are

Microsoft.AspNetCore.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

cyanite · 2024-05-13T13:18:58Z

All right, the situation with "the problem disappered when Dynatrace was removed" isn't quite that clear. It's much more bizzare:

When the server is started, and the test is done, the normal result is returned (the call is made from a client, which provides the number; the server sends back a long string with output in various formats including the raw bits).

When the test is repeated, without restarting the server (which runs as a Windows service), the correct value is still returned... for a while. After about 1-2 minutes, it starts returning the abnormal value (i.e. the "5" case) instead, from the same running program.

jkotas · 2024-05-13T13:24:38Z

Looks like a problem with tiered compilation. Repro:

for (;;)
{
    var dec = -4619.7403751459050600M;
    var doubleDll = ConvertForMe(dec);
    Console.WriteLine($"Decimal:   {dec}");
    Console.WriteLine($"Double dll: {doubleDll}");
}


static double ConvertForMe(decimal d)
{
       return (double)d;
}

It starts with -4619.740375145904 and changes to -4619.740375145905 after some time.

dotnet-policy-service · 2024-05-13T13:25:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

cyanite · 2024-05-13T13:51:21Z

@jkotas that sounds promising. I can't exactly reproduce that myself (I also tried with variously switching tiered compilation off or asking for aggressive optimization for the method, or nothing at all), but would the cause then be that the two JIT compilers have a behaviroial difference?

jkotas · 2024-05-13T13:59:00Z

I can't exactly reproduce that myself

Try:

dotnet new console
<copy and paste my snippet into Program.cs>
dotnet run -c release

would the cause then be that the two JIT compilers have a behaviroial difference?

Yes, it looks like a bug in the JIT compiler.

JulieLeeMSFT · 2024-05-13T16:19:08Z

@EgorBo, PTAL.

EgorBo · 2024-05-13T16:25:31Z

From a quick look, it looks to be somewhere in CoreLib behaving differently for R2R and Tier0 or Tier1. Most likely VarR8FromDec function

Clockwork-Muse · 2024-05-13T17:09:39Z

Since this is financial software

As a side question, why are you using double here, then? Why not keep it in decimal for the entire flow state? Obviously this is potentially a bug we should fix, but my next reaction would be seeing if I could remove the class of problem entirely.

EgorBo · 2024-05-13T17:20:30Z

Alright, so looks like the issue is in "unsigned int -> float" conversion difference for AVX512 (JIT) and SSE2 (crossgen)

EgorBo · 2024-05-13T17:25:06Z

Minimal repro:

using System;
using System.Runtime.CompilerServices;

class Prog
{
    public static unsafe void Main()
    {
        double dbl = Test(9303915604039947368);
        Console.WriteLine(*(ulong*)&dbl);
    }

    [MethodImpl(MethodImplOptions.AggressiveOptimization |
                MethodImplOptions.NoInlining)]
    static double Test(ulong x) => x;
}

It prints different values with DOTNET_EnableAVX512F=0 and DOTNET_EnableAVX512F=1 on a CPU with AVX512F

EgorBo · 2024-05-13T17:49:59Z

Test AVX512 codegen:

; Assembly listing for method Prog:Test(ulong):double (FullOpts)
       vcvtusi2sd xmm0, rcx
       ret

Test no-AVX512F:

; Assembly listing for method Prog:Test(ulong):double (FullOpts)
       vxorps   xmm0, xmm0, xmm0
       vcvtsi2sd xmm0, xmm0, rcx
       test     rcx, rcx
       jge      SHORT G_M52525_IG03
       vaddsd   xmm0, xmm0, qword ptr [reloc @RWD00]
G_M52525_IG03:
       ret      

RWD00  	dq	43F0000000000000h

@tannergooding does this ring a bell to you? it sounds like AVX512's impl is correct so probably we had a bug in our pre-AVX512 impl?

tannergooding · 2024-05-13T17:59:45Z

#43895 tracks the issue, it's known and something we should fix.

It's just been pushed out release after release as "low priority". Given we normalized the floating-point to integer conversions this release, it'd be good for us to also fix the integer to floating-point conversions.

cyanite · 2024-05-14T07:47:47Z

Since this is financial software

As a side question, why are you using double here, then? Why not keep it in decimal for the entire flow state? Obviously this is potentially a bug we should fix, but my next reaction would be seeing if I could remove the class of problem entirely.

Right. It's life insurance mathematics. Double is used to do expensive calculations that would take too long with decimal. This isn't a money amount as such.

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 13, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 13, 2024

huoyaoyuan added area-System.Numerics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 13, 2024

jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Numerics labels May 13, 2024

JulieLeeMSFT assigned EgorBo May 13, 2024

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 13, 2024

JulieLeeMSFT modified the milestones: 8.0.x, 9.0.0 May 13, 2024

EgorBo mentioned this issue May 13, 2024

Fix ulong->double conversion on x64 (without avx512f) #102179

Merged

EgorBo closed this as completed in #102179 May 15, 2024

vzarytovskii mentioned this issue May 22, 2024

.NET 9preview5 dotnet/fsharp#17224

Merged

github-actions bot locked and limited conversation to collaborators Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bizarre decimal-to-double behavior difference on same machine, same code #102149

Bizarre decimal-to-double behavior difference on same machine, same code #102149

cyanite commented May 13, 2024 •

edited

Loading

dotnet-policy-service bot commented May 13, 2024

huoyaoyuan commented May 13, 2024

cyanite commented May 13, 2024

cyanite commented May 13, 2024

huoyaoyuan commented May 13, 2024

cyanite commented May 13, 2024

cyanite commented May 13, 2024

jkotas commented May 13, 2024

dotnet-policy-service bot commented May 13, 2024

cyanite commented May 13, 2024

jkotas commented May 13, 2024

JulieLeeMSFT commented May 13, 2024

EgorBo commented May 13, 2024 •

edited

Loading

Clockwork-Muse commented May 13, 2024

EgorBo commented May 13, 2024

EgorBo commented May 13, 2024 •

edited

Loading

EgorBo commented May 13, 2024 •

edited

Loading

tannergooding commented May 13, 2024

cyanite commented May 14, 2024

Bizarre decimal-to-double behavior difference on same machine, same code #102149

Bizarre decimal-to-double behavior difference on same machine, same code #102149

Comments

cyanite commented May 13, 2024 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

dotnet-policy-service bot commented May 13, 2024

huoyaoyuan commented May 13, 2024

cyanite commented May 13, 2024

cyanite commented May 13, 2024

huoyaoyuan commented May 13, 2024

cyanite commented May 13, 2024

cyanite commented May 13, 2024

jkotas commented May 13, 2024

dotnet-policy-service bot commented May 13, 2024

cyanite commented May 13, 2024

jkotas commented May 13, 2024

JulieLeeMSFT commented May 13, 2024

EgorBo commented May 13, 2024 • edited Loading

Clockwork-Muse commented May 13, 2024

EgorBo commented May 13, 2024

EgorBo commented May 13, 2024 • edited Loading

EgorBo commented May 13, 2024 • edited Loading

tannergooding commented May 13, 2024

cyanite commented May 14, 2024

cyanite commented May 13, 2024 •

edited

Loading

EgorBo commented May 13, 2024 •

edited

Loading

EgorBo commented May 13, 2024 •

edited

Loading

EgorBo commented May 13, 2024 •

edited

Loading