Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe cast confuses JIT Inline Optimizer #50543

Closed
gerhard17 opened this issue Jan 15, 2021 · 1 comment
Closed

Unsafe cast confuses JIT Inline Optimizer #50543

gerhard17 opened this issue Jan 15, 2021 · 1 comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead

Comments

@gerhard17
Copy link

gerhard17 commented Jan 15, 2021

Version Used: NET5

Not a hard bug - the suboptimal code is still logical correct.

Steps to Reproduce:

Compiler settings

  <OutputType>Exe</OutputType>
  <TargetFramework>net5</TargetFramework>
  <TieredCompilation>false</TieredCompilation>
  <Optimize>true</Optimize>

Assume following example (stripped down from real source code) which should load a struct with a kind of literal value.
In the Test() method, three variables x, y, z are initialized with 1.0, 2.0 and 1.0 again.

public struct MyStruct
{
	public ulong A0, A1, A2, A3;
	public ulong B0, B1, B2, B3;
}

public sealed class TestCase
{
	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public static void LoadInline(out MyStruct result, ulong value) {
		result = default;
		result.A0 = value;
	}

	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public static void LoadInline(out MyStruct result, double value) {
		// Branch is JIT inline removable, when value is a compile time constant
		if (value == 1.0) {
			LoadInline(out result, 1);
			return;
		}
		// Branch is JIT inline removable, when value is a compile time constant
		if (value == 2.0) {
			LoadInline(out result, 2);
			return;
		}
		// Use default Value
		LoadInline(out result, 0);
		// Strange behavior, when code is active
		// ulong ul = Unsafe.As<double, ulong>(ref value); // <-- LINE A
		// result.A1 = ul;                                 // <-- LINE B
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void SomeUsage(in MyStruct result) {
		;
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void Test() {
		LoadInline(out var x, 1.0);
		LoadInline(out var y, 2.0);
		LoadInline(out var z, 1.0);
		SomeUsage(x);
		SomeUsage(y);
		SomeUsage(z);
	}
}

Case 1: When you compile to a optimized release build following optimal code is generated for the three LoadInline() calls.
[SkipLocalsInit is NOT used, so at the begin of the Test() method local memory is already initialized]

			LoadInline(out var x, 1.0);
00007FFB079BAA0B  mov         dword ptr [rsp+0A8h],1  
			LoadInline(out var y, 2.0);
00007FFB079BAA16  mov         dword ptr [rsp+68h],2  
			LoadInline(out var z, 1.0);
00007FFB079BAA1E  mov         dword ptr [rsp+28h],1  

Case 2: When LINE A and LINE B are both activated, following sub optimal (but working) code is produced.
The second and third call perform a redundant zero out of the struct. Interestingly the first call is still optimal. Observe the fact that the method arguments in call 1 and call 3 are the same, but produce different code.

		LoadInline(out var x, 1.0);
00007FFB079CAA0E  mov         qword ptr [rsp+0A8h],1  
		LoadInline(out var y, 2.0);
00007FFB079CAA1A  vxorps      xmm0,xmm0,xmm0  
00007FFB079CAA1E  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079CAA24  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079CAA2A  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079CAA33  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079CAA3C  mov         qword ptr [rsp+68h],2  
		LoadInline(out var z, 1.0);
00007FFB079CAA45  vxorps      xmm0,xmm0,xmm0  
00007FFB079CAA49  vmovdqu     xmmword ptr [rsp+28h],xmm0  
00007FFB079CAA4F  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079CAA55  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079CAA5B  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079CAA61  mov         qword ptr [rsp+28h],1  

Case 3: Even more strange - but not my real world use case - the optimizer gets totaly confused, when only LINE A is active and LINE B is commented out. The code is still working, but obviously the JIT constant branch removal has totaly failed.

		LoadInline(out var x, 1.0);
00007FFB079EAA0E  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0318h (07FFB079EACC8h)]  
00007FFB079EAA16  vmovsd      qword ptr [rsp+30h],xmm0  
00007FFB079EAA1C  vmovsd      xmm0,qword ptr [rsp+30h]  
00007FFB079EAA22  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0328h (07FFB079EACD8h)]  
00007FFB079EAA2A  jp          InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)  
00007FFB079EAA2C  jne         InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)  
00007FFB079EAA2E  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAA32  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAA3B  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAA44  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAA4D  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAA56  mov         qword ptr [rsp+0B8h],1  
00007FFB079EAA62  jmp         InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)  
00007FFB079EAA67  vmovsd      xmm0,qword ptr [rsp+30h]  
00007FFB079EAA6D  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0338h (07FFB079EACE8h)]  
00007FFB079EAA75  jp          InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)  
00007FFB079EAA77  jne         InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)  
00007FFB079EAA79  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAA7D  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAA86  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAA8F  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAA98  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAAA1  mov         qword ptr [rsp+0B8h],2  
00007FFB079EAAAD  jmp         InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)  
00007FFB079EAAAF  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAAB3  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAABC  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAAC5  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAACE  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAAD7  xor         ecx,ecx  
00007FFB079EAAD9  mov         qword ptr [rsp+0B8h],rcx  
		LoadInline(out var y, 2.0);
00007FFB079EAAE1  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0348h (07FFB079EACF8h)]  
00007FFB079EAAE9  vmovsd      qword ptr [rsp+28h],xmm0  
00007FFB079EAAEF  vmovsd      xmm0,qword ptr [rsp+28h]  
00007FFB079EAAF5  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0358h (07FFB079EAD08h)]  
00007FFB079EAAFD  jp          InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)  
00007FFB079EAAFF  jne         InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)  
00007FFB079EAB01  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB05  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB0B  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB14  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB1D  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB26  mov         qword ptr [rsp+78h],1  
00007FFB079EAB2F  jmp         InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)  
00007FFB079EAB31  vmovsd      xmm0,qword ptr [rsp+28h]  
00007FFB079EAB37  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0368h (07FFB079EAD18h)]  
00007FFB079EAB3F  jp          InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)  
00007FFB079EAB41  jne         InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)  
00007FFB079EAB43  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB47  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB4D  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB56  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB5F  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB68  mov         qword ptr [rsp+78h],2  
00007FFB079EAB71  jmp         InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)  
00007FFB079EAB73  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB77  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB7D  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB86  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB8F  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB98  xor         ecx,ecx  
00007FFB079EAB9A  mov         qword ptr [rsp+78h],rcx  
		LoadInline(out var z, 1.0);
00007FFB079EAB9F  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0378h (07FFB079EAD28h)]  
00007FFB079EABA7  vmovsd      qword ptr [rsp+20h],xmm0  
00007FFB079EABAD  vmovsd      xmm0,qword ptr [rsp+20h]  
00007FFB079EABB3  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0388h (07FFB079EAD38h)]  
00007FFB079EABBB  jp          InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)  
00007FFB079EABBD  jne         InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)  
00007FFB079EABBF  vxorps      xmm0,xmm0,xmm0  
00007FFB079EABC3  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EABC9  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EABCF  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EABD5  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EABDB  mov         qword ptr [rsp+38h],1  
00007FFB079EABE4  jmp         InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)  
00007FFB079EABE6  vmovsd      xmm0,qword ptr [rsp+20h]  
00007FFB079EABEC  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0398h (07FFB079EAD48h)]  
00007FFB079EABF4  jp          InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)  
00007FFB079EABF6  jne         InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)  
00007FFB079EABF8  vxorps      xmm0,xmm0,xmm0  
00007FFB079EABFC  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EAC02  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EAC08  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EAC0E  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EAC14  mov         qword ptr [rsp+38h],2  
00007FFB079EAC1D  jmp         InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)  
00007FFB079EAC1F  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAC23  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EAC29  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EAC2F  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EAC35  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EAC3B  xor         ecx,ecx  
00007FFB079EAC3D  mov         qword ptr [rsp+38h],rcx  

Expected Behavior:

All three variants should produce the same optimal code.

			LoadInline(out var x, 1.0);
00007FFB079BAA0B  mov         dword ptr [rsp+0A8h],1  
			LoadInline(out var y, 2.0);
00007FFB079BAA16  mov         dword ptr [rsp+68h],2  
			LoadInline(out var z, 1.0);
00007FFB079BAA1E  mov         dword ptr [rsp+28h],1  

Actual Behavior:

Case 1: The optimal code :-)
Case 2: For large enough structs the code and runtime behavior is very bad and seems unsymetric between the same method calls.
Case 3: This is not a real use case because a senseless Unsafe cast is made without any usage. But no warning is raised, when you accidentally compile such a source code.

Off Topic:
It would be great - especially for my use case - when a method argument could be restricted to a compile time constant.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 15, 2021
@gerhard17
Copy link
Author

Moved to dotnet/runtime#47082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

No branches or pull requests

1 participant