JIT: fix bug returning small structs on linux x64 #18563

AndyAyersMS · 2018-06-20T05:45:06Z

The jit was retyping all calls with small struct returns as 32 bit ints
when generating code for linux x64. When the results of those calls were
assigned to fields the jit would use 32 bit stores which could corrupt
neighboring fields.

The fix is to keep the small int types for 8 and 16 ~~byte~~ bit structs so that
the corresponding stores are the right size.

Fixes #18522.

AndyAyersMS · 2018-06-20T05:54:45Z

@CarolEidt PTAL
cc @dotnet/jit-contrib

No diffs for windows. For linux this causes diff in both crossgen and pmi:

Crossgen Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary: (Lower is better)
Total bytes of diff: 2434 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
        1098 : System.Data.Common.dasm (0.04% of base)
         240 : Microsoft.DotNet.ProjectModel.dasm (0.05% of base)
         218 : Newtonsoft.Json.dasm (0.02% of base)
         198 : Microsoft.CodeAnalysis.CSharp.dasm (0.00% of base)
         198 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
Top file improvements by size (bytes):
        -112 : System.Private.CoreLib.dasm (0.00% of base)
          -4 : System.Linq.Parallel.dasm (0.00% of base)
          -4 : System.Net.HttpListener.dasm (0.00% of base)
28 total files with size differences (3 improved, 25 regressed), 102 unchanged.
Top method regessions by size (bytes):
          84 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (18 methods)
          80 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this (2 methods)
          78 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (18 methods)
          64 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref (2 methods)
          42 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref (2 methods)
Top method improvements by size (bytes):
         -68 : System.Private.CoreLib.dasm - DomainNeutralILStubClass:IL_STUB_CLRtoWinRT():struct:this (18 methods)
         -12 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:GetPoint():struct:this (2 methods)
         -12 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:GetSize():struct:this (2 methods)
         -10 : System.Private.CoreLib.dasm - ValueTuple:Create():struct (2 methods)
         -10 : System.Private.CoreLib.dasm - Task:Yield():struct (2 methods)
271 total methods with size differences (11 improved, 260 regressed), 141547 unchanged.
Completed analysis in 233.40s

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary: (Lower is better)
Total bytes of diff: 839 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
         554 : System.Data.Common.dasm (0.04% of base)
         120 : Microsoft.DotNet.ProjectModel.dasm (0.06% of base)
         105 : Newtonsoft.Json.dasm (0.02% of base)
          51 : xunit.runner.utility.dotnet.dasm (0.05% of base)
          48 : xunit.execution.dotnet.dasm (0.02% of base)
Top file improvements by size (bytes):
        -339 : System.Reflection.Metadata.dasm (-0.09% of base)
         -38 : System.Net.HttpListener.dasm (-0.02% of base)
27 total files with size differences (2 improved, 25 regressed), 103 unchanged.
Top method regessions by size (bytes):
          42 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (9 methods)
          40 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this
          39 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (9 methods)
          32 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref
          21 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref
Top method improvements by size (bytes):
         -94 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSignature(byref):struct:this (4 methods)
         -76 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeValue(struct,struct):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSpecificationSignature(byref):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeLocalSignature(byref):struct:this (4 methods)
         -38 : System.Net.HttpListener.dasm - <CloseNetworkConnectionAsync>d__50:MoveNext():this
297 total methods with size differences (16 improved, 281 regressed), 169407 unchanged.
Completed analysis in 133.55s

PMI diffs on the new test case:

 ; Assembly listing for method GitHub_18522:Main():int
 ; Emitting BLENDED_CODE for X64 CPU with AVX
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; Final local variable assignments
 ;
 ;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
 ;  V01 tmp1         [V01,T00] (  3,  6   )   byref  ->  rax        
-;  V02 tmp2         [V02    ] (  3,  3   )  struct ( 8) [rsp+0x10]   do-not-enreg[SF] ld-addr-op
-;  V03 tmp3         [V03    ] (  3,  3   )  ushort  ->  [rsp+0x10]   do-not-enreg[] V02.F0(offs=0x00) P-DEP
-;  V04 cse0         [V04,T01] (  4,  4   )    long  ->  [rsp+0x08]  
+;* V02 tmp2         [V02    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op
+;* V03 tmp3         [V03    ] (  0,  0   )  ushort  ->  zero-ref    V02.F0(offs=0x00) P-INDEP
+;  V04 cse0         [V04,T01] (  4,  4   )    long  ->  [rsp+0x00]  
 ;
-; Lcl frame size = 24
+; Lcl frame size = 8
 G_M49330_IG01:
-       sub      rsp, 24
+       push     rax
 G_M49330_IG02:
        mov      rdi, 0xD1FFAB1E
        mov      esi, 3
        call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        mov      word  ptr [rax+10], 170
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        add      rax, 8
-       mov      word  ptr [rsp+10H], 0
-       mov      word  ptr [rsp+10H], 0
-       mov      edi, dword ptr [rsp+10H]
-       mov      dword ptr [rax], edi         // bug: wide store
+       mov      word  ptr [rax], 0           // fix: narrow store (+ const)
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        cmp      word  ptr [rax+10], 170
        je       SHORT G_M49330_IG04
        xor      eax, eax
 G_M49330_IG03:
-       add      rsp, 24
+       add      rsp, 8
        ret      
 G_M49330_IG04:
        mov      eax, 100
 G_M49330_IG05:
-       add      rsp, 24
+       add      rsp, 8
        ret      
-; Total bytes of code 118, prolog size 4 for method GitHub_18522:Main():int
+; Total bytes of code 100, prolog size 1 for method GitHub_18522:Main():int

 ; Assembly listing for method GitHub_18522:M113():struct
 ; Emitting BLENDED_CODE for X64 CPU with AVX
 ; optimized code
 ; partially interruptible
 ; Final local variable assignments
 ;
-;  V00 loc0         [V00    ] (  2,  2   )  struct ( 8) [rsp+0x00]   do-not-enreg[SF] must-init ld-addr-op
+;* V00 loc0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op
 ;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
-;  V02 tmp1         [V02    ] (  2,  2   )  ushort  ->  [rsp+0x00]   do-not-enreg[] V00.F0(offs=0x00) P-DEP
+;* V02 tmp1         [V02    ] (  0,  0   )  ushort  ->  zero-ref    V00.F0(offs=0x00) P-INDEP
 ;
-; Lcl frame size = 8
+; Lcl frame size = 0
 G_M32228_IG01:
-       push     rax
-       xor      rax, rax
-       mov      qword ptr [rsp], rax
 G_M32228_IG02:
-       mov      word  ptr [rsp], 0
-       mov      eax, dword ptr [rsp]
+       xor      eax, eax
 G_M32228_IG03:
-       add      rsp, 8
        ret      
-; Total bytes of code 21, prolog size 7 for method GitHub_18522:M113():struct
+; Total bytes of code 3, prolog size 0 for method GitHub_18522:M113():struct

AndyAyersMS · 2018-06-20T06:02:09Z

I have not yet figured out if this is a regression. I believe the bug requires that the small struct be assigned directly to a field which is perhaps not as common as one might imagine.

jakobbotsch · 2018-06-20T11:03:30Z

This fixes most examples (jakobbotsch/Fuzzlyn@a603f0e), but some still remain unfixed:

I can recheck when you make updates. 😄

EDIT: Some of these examples needed rereducing. See below.

AndyAyersMS · 2018-06-20T14:43:51Z

@jakobbotsch thanks much! I'll look at the new examples. It's really nice to have Fuzzlyn double-checking things. Keep up the good work.

Arm and Arm64 failures look like the masked load tests. I though these were disabled?

CarolEidt · 2018-06-20T15:08:31Z

Arm and Arm64 failures look like the masked load tests. I though these were disabled?

@jashook indicates that they still need to be removed from the .lst files, and he'll do so shortly.

CarolEidt

LGTM

jashook

Lgtm, could you add the test to the arm/arm64 lstFiles so the regression is run on windows arm(64) as well?

jashook · 2018-06-20T15:12:53Z

#18569 addresses arm(64) failures

AndyAyersMS · 2018-06-20T15:37:48Z

(edited: oops, was using old jits -- with the fix above things look to be correct...)

First new example 11923495345812789064 looks ok from a diff standpoint:

;;; windows
       48B96829AC14F8010000 mov      rcx, 0x1F814AC2968
       488B09               mov      rcx, gword ptr [rcx]
       4883C108             add      rcx, 8
       C60100               mov      byte  ptr [rcx], 0

;;; linux
       48BF6829055807020000 mov      rdi, 0x20758052968
       488B3F               mov      rdi, gword ptr [rdi]
       4883C708             add      rdi, 8
       C60700               mov      byte  ptr [rdi], 0

briansull

Looks Good
I see that the bug was indeed in the #ifdef UNIX section

jakobbotsch · 2018-06-20T18:33:12Z

(edited: oops, was using old jits -- with the fix above things look to be correct...)

Ooops. It appears the bug is still there before reduction, but the reduced example I have posted does not have the bug (i.e. the reducer needs to be rerun again to display the new variant required).
I have verified that at least 11049252875418439527 and 19651690852725464 have the issue.

AndyAyersMS · 2018-06-20T18:47:04Z

In 11049252875418439527 there is a 6 byte struct S0 returned an assigned. Windows assigns to it in two steps 4 bytes then 2. Linux does it in one step of 8 bytes.

;;; windows
       8B10                 mov      edx, dword ptr [rax]
       8911                 mov      dword ptr [rcx], edx
       668B5004             mov      dx, word  ptr [rax+4]
       66895104             mov      word  ptr [rcx+4], dx

;;; linux
       488B4016             mov      rax, qword ptr [rax+22]
       4889470E             mov      qword ptr [rdi+14], rax   // writes an extra 2 bytes

Implication is that if we are indeed going to return non-power of two sized structs in registers then we need a different strategy to unpack these on the caller side -- we can't just map them to the normal power of two int type sizes (unless it's "safe" to expand the destination size too, which it sometimes is). So my fix above likely only works if the struct being copied is 1, 2, or 4 bytes. We also need to properly handle sizes 3, 5, 6, and 7.

jakobbotsch · 2018-06-20T19:27:50Z

Here are the rereduced examples:

It may also be informative to look at the commit diff:
jakobbotsch/Fuzzlyn@299fdaa

Sorry about the confusion earlier.

AndyAyersMS · 2018-06-20T19:54:15Z

No problem. I haven't checked them all, but I suspect all these new cases have S0 sizes that are not powers of two.

One thought as to how to fix the size 3, 5, 6, 7 cases is to flag when the "primitive type" produced by getPrimitiveTypeForStruct is larger than the actual type returned, and force the call to return the result to a temp which can be safely widened to a power of two size. Then reinterpret the temp as a struct and assign from the temp to the destination.

Not sure if this is viable though.

Also may be clunky if/when we inline as we may see what look like type mismatches that could block inlines, or reinterpret/copy chains (struct -> int -> int -> struct) that don't nicely optimize away.

AndyAyersMS · 2018-06-20T19:59:43Z

And just for the record:

On windows x86/x64 we won't return non-power-of-2 sized structs in registers.
Windows arm32 and arm64 should have similar bugs with the 3, 5, 6, 7 size cases unless there is some compensating magic I have overlooked.

CarolEidt · 2018-06-20T20:50:38Z

I am probably thinking about this too simplistically, but I think the right thing to do is to not change the type in the non-power-of-2 case, and simply allow struct assignments to be created as needed. Then morph should decide (hopefully correctly) when a copy is required.
The wrinkle is that it may be the case that there are assumptions that we don't assign a non-scalar from a call. But I don't know why it shouldn't work, as long as the LHS of the assignment is a node that returns a valid struct handle from gtGetStructHandleIfPresent()

AndyAyersMS · 2018-06-20T21:10:12Z

Yeah, I had that idea too -- basically just use the windows rules.

By doing this we would (I believe) diverge from the SysV ABI for some kinds of small structs. Not sure if that has a ripple effect anywhere.

CarolEidt · 2018-06-20T21:23:06Z

By doing this we would (I believe) diverge from the SysV ABI

I was actually assuming that the RHS of the struct assignment could simply be the call (i.e. the register result), and we would handle any necessary temp creation in fgMorphCopyBlock, but it is probably the case that it would require more complexity than I'm imagining.

AndyAyersMS · 2018-06-20T21:34:00Z

Ah, I see. I might need to disentangle things a bit to make sure we still think of it as something returned by value ... but it might work.

AndyAyersMS · 2018-06-20T23:09:40Z

What I'm leaning towards is roughly:

add a new struct passing kind, something like SPK_EnclosingPrimitiveType -- to indicate that the value is returned in a single register that is larger than the actual return type. In those cases, getReturnTypeForStruct would return TYP_STRUCT.
ReturnTypeDesc::InitializeStructReturnType would handle this case like it does SPK_PrimitiveType but with different asserts, keeping the return type as TYP_STRUCT.
Because of this we'd no longer retype those odd-sized struct returning calls, and morph would handle them as struct assigns.
We would still retype if the struct was 1, 2, 4, or 8 bytes and would get somewhat better codegen for those cases.

AndyAyersMS · 2018-06-20T23:14:15Z

Still wondering about the returned via two register cases, though maybe the really odd ones require ARM64?

@jakobbotsch if you get a chance to tweak Fuzzlyn, you might try playing around with cases where the assigned structs are 9-15 bytes in size.

CarolEidt · 2018-06-20T23:17:34Z

@AndyAyersMS - that sounds like the right direction.

We would still retype if the struct was 1, 2, 4, or 8 bytes and would get somewhat better codegen for those cases.

It would be good to (eventually) consider whether we could get the same codegen without retyping there. IIRC fgMorphCopyBlock should take care of those cases via fgMorphOneAsgBlockOp, as long as we can defer any temp creation until that time.

AndyAyersMS · 2018-06-20T23:24:40Z

I'm willing to give it a try now, since I end up having to special case power of 2 sizes, and I could just toss all that.

AndyAyersMS · 2018-06-21T03:52:08Z

Updated to try and handle 3, 5, 6, and 7 byte struct cases. May be a bit ragged, but is holding up on the handful of tests I have locally.

Will post diff stats later and look at more examples.

AndyAyersMS · 2018-06-21T05:39:46Z

No new PMI or Crossgen FX diffs for Linux. Will run PMI diffs on tests overnight.

AndyAyersMS · 2018-06-21T07:48:11Z

No new test diffs other than on the newly added test case. Diffs there:

-       mov      rdi, qword ptr [rdi+22]
-       mov      qword ptr [rax+14], rdi    // store 8 bytes at rax + 14 -- bad
+       lea      rdi, bword ptr [rdi+22]
+       add      rax, 14
+       mov      esi, dword ptr [rdi]
+       mov      dword ptr [rax], esi       // store 4 bytes at rax + 14 -- ok
+       mov      si, word  ptr [rdi+4]
+       mov      word  ptr [rax+4], si      // store 2 bytes at rax + 18 -- ok
...
-; Total bytes of code 124, prolog size 10 for method GitHub_18522_1:Main():int
+; Total bytes of code 136, prolog size 10 for method GitHub_18522_1:Main():int

jakobbotsch · 2018-06-21T09:47:16Z

I can confirm that fixes all the previous examples, and also a few others I thought were unrelated.
jakobbotsch/Fuzzlyn@d8c9dbd
(ignore 11949841452559086614, something very strange seems to be going on here, I'm trying to get to the bottom of that one...)
EDIT: Seems the original cause of 11949841452559086614 was indeed this bug, but the reducer managed to reduce it to a different bug causing the same behavior. Now the question is if that one is new...

jakobbotsch · 2018-06-21T14:57:11Z

@jakobbotsch if you get a chance to tweak Fuzzlyn, you might try playing around with cases where the assigned structs are 9-15 bytes in size.

Fuzzlyn already (probabilistically) generates structs of this size. It does not appear to have found anything in this area.

CarolEidt

LGTM - thanks!
@jashook also requested you add the tests to tests\arm\Tests.lst and tests\arm64\Tests.lst

CarolEidt · 2018-06-21T14:55:02Z

src/jit/importer.cpp

@@ -8465,7 +8465,14 @@ GenTree* Compiler::impFixupCallStructReturn(GenTreeCall* call, CORINFO_CLASS_HAN
        if (retRegCount == 1)
        {
            // struct returned in a single register
-            call->gtReturnType = retTypeDesc->GetReturnRegType(0);
+            // retype iff struct size exactly matches integer type size.
+            if (retTypeDesc->IsEnclosingType())


Is there a reason that you didn't just reverse the condition and avoid the empty stmt?

I had some JITDUMP stuff there at one point. Will update.

CarolEidt · 2018-06-21T14:59:47Z

src/jit/lclvars.cpp

@@ -141,7 +141,8 @@ void Compiler::lvaInitTypeRef()
        Compiler::structPassingKind howToReturnStruct;
        var_types                   returnType = getReturnTypeForStruct(retClsHnd, &howToReturnStruct);

-        if (howToReturnStruct == SPK_PrimitiveType)
+        // We can safely widen the return type for enclosed structs.
+        if ((howToReturnStruct == SPK_PrimitiveType) || (howToReturnStruct == SPK_EnclosingType))


It's interesting that the compRetNativeType is widened. I assume, then, that morph is taking care of the necessary transformation to ensure that the SPK_EnclosingType case is handled, so that codegen gets what it needs for this case?

Hmm, not sure. I should look at this part more closely.

There is some code in morph that can be simplified now as SPK_Primitive type implies power of two sizes.

That code was for arg passing, not for return values. So nothing to do there.

AndyAyersMS · 2018-06-21T15:45:59Z

Did a bit of testing over on OSX with older dotnet SDKs, and as far as I can tell this bug goes back to at least Core 1.1.

AndyAyersMS · 2018-06-22T21:59:20Z

@dotnet/jit-contrib two big changes since you all last reviewed:

for the "enclosing type" small structs, copy return value to temp during import if we know the call will remain a call
if we defer doing copy to temp in the importer because the call is an inline candidate, do it later once we've resolved the fate of the inline.

Test cases to hopefully cover the spectrum of possibilities fairly well.

Still thinking of ditching the "enclosing type" concept since everwhere we check it we have the class handle and so can refetch the size and compare it to the primitive size.

Now see a few more diffs in normal PMI runs over FX.

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary:
(Lower is better)
Total bytes of diff: 839 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
         554 : System.Data.Common.dasm (0.04% of base)
         120 : Microsoft.DotNet.ProjectModel.dasm (0.06% of base)
         105 : Newtonsoft.Json.dasm (0.02% of base)
          51 : xunit.runner.utility.dotnet.dasm (0.05% of base)
          48 : xunit.execution.dotnet.dasm (0.02% of base)
Top file improvements by size (bytes):
        -339 : System.Reflection.Metadata.dasm (-0.09% of base)
         -38 : System.Net.HttpListener.dasm (-0.02% of base)
27 total files with size differences (2 improved, 25 regressed), 103 unchanged.
Top method regessions by size (bytes):
          42 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (9 methods)
          40 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this
          39 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (9 methods)
          32 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref
          21 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref
Top method improvements by size (bytes):
         -94 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSignature(byref):struct:this (4 methods)
         -76 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeValue(struct,struct):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSpecificationSignature(byref):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeLocalSignature(byref):struct:this (4 methods)
         -38 : System.Net.HttpListener.dasm - <CloseNetworkConnectionAsync>d__50:MoveNext():this
297 total methods with size differences (16 improved, 281 regressed), 169409 unchanged.

The overall diff impact is still small and the regressions are mostly necessary diffs. Some of the new diffs happen in methods that return a byte-sized struct:

 ; ============================================================
 ; Assembly listing for method System.Reflection.Metadata.MethodSignature`1[__Canon][System.__Canon]:get_ReturnType():ref:this
 ; Emitting BLENDED_CODE for X64 CPU with AVX

 G_M15744_IG01:
 
 G_M15744_IG02:
-       8B4710               mov      eax, dword ptr [rdi+16]
+       480FBE4710           movsx    rax, byte  ptr [rdi+16]
 
 G_M15744_IG03:
        C3                   ret      
 
-; Total bytes of code 4, prolog size 0 for method System.Reflection.Metadata.MethodSignature`1[Int32][System.Int32]:get_Header():struct:this
+; Total bytes of code 6, prolog size 0 for method System.Reflection.Metadata.MethodSignature`1[Int32][System.Int32]:get_Header():struct:this

Previously bytes 1-3 might contain unrelated stuff.

AndyAyersMS · 2018-06-22T22:04:29Z

Hmm, looks like a rebase is going to be needed. When I do that I'll also reorganize so all the code commits are first (and squashed) and the test commits second (likewise squashed). Let me see how this holds up in testing before doing any of that.

AndyAyersMS · 2018-06-22T22:21:14Z

The new CoreFX legs (checked, release) seem to be generally broken.

No diffs on windows over FX with both crossgen and PMI.
No new diffs for linux over FX via crossgen.

AndyAyersMS · 2018-06-22T23:38:29Z

The new _7 test failed for arm32/arm64. Investigating.
Arm64 seems to be writing 4 bytes after the call instead of 3:

            bl      GitHub_18522_7:M16():struct
            str     w0, [x19]

AndyAyersMS · 2018-06-23T07:43:24Z

Looks like ARM64 / Windows needs similar "return via temp" logic for the 3, 5, 6, 7 size cases... so more cases to handle.

briansull · 2018-06-25T17:39:52Z

src/jit/compiler.cpp

+            // we need to preserve small types.
+            useType           = GetEightByteType(structDesc, 0);
+            howToReturnStruct = SPK_PrimitiveType;
+        }


This looks correct, you might want to add that for the other cases useType will remain TYP_UNKNOWN instead of changing to TYP_INT.

Added comment.

briansull · 2018-06-25T17:43:24Z

src/jit/compiler.cpp

+            else
+            {
+                // Currently: 3, 5, 6, or 7 byte structs
+                assert(structSize <= genTypeSize(useType));


This assert should be less than:

assert(structSize < genTypeSize(useType));

briansull · 2018-06-25T17:46:12Z

src/jit/compiler.cpp

-        // Set 'useType' to the type of the first eightbyte item
-        useType = GetEightByteType(structDesc, 0);
-        assert(structDesc.passedInRegisters == true);
+        if (structDesc.eightByteClassifications[0] == SystemVClassificationTypeSSE)


You could also add:

assert(structSize <= sizeof(double)));

As the code at 1031 which finalized the value of useType depends on this property.

briansull · 2018-06-25T17:50:14Z

src/jit/compiler.h

@@ -4223,6 +4227,8 @@ class Compiler
    {
        SPK_Unknown,       // Invalid value, never returned
        SPK_PrimitiveType, // The struct is passed/returned using a primitive type.
+        SPK_EnclosingType, // Like SPK_Primitive type, but used for return types that
+                           //  are not a power of two byte size.


Comment should be more like:

// Like SPK_Primitive type, but used for return types that // require a primative type temp that is larger that the struct size. // Currently used for structs of sizes: 3,5,6 or 7 bytes

I am still not sure if I'm keeping this -- let me work through arm32 / arm64 and see how I feel once that's fixed.

AndyAyersMS · 2018-06-25T18:58:12Z

Rebased to get past test list conflicts...

AndyAyersMS · 2018-06-25T19:12:23Z

BTW, I still expect arm32/arm64 to fail on some of the new tests -- in fact I need to understand why the 6 byte struct cases are not failing. Perhaps there is magic somewhere that I could extend to the 3 byte case.

AndyAyersMS · 2018-06-25T21:11:09Z

Result is as expected: Arm64 fails all the new tests; Arm32 only fails the _7 test since that tests a 3 byte struct, not a 6 byte struct. No magic.

AndyAyersMS · 2018-06-25T22:12:47Z

Ok, hopefully this is functionally complete. SPK_EnclosingType is still there and I've updated the comment per Brian's suggestion.

AndyAyersMS · 2018-06-26T02:11:53Z

FMA test failed on Ubuntu -- no PMI diffs in this test, so am going to wager this was our old friend Random (see #10585).

@dotnet-jit retest Ubuntu x64 Checked Innerloop Build and Test

AndyAyersMS · 2018-06-26T06:54:55Z

Re FMA failing: also #18467

jakobbotsch · 2018-06-26T07:47:00Z

tests/arm/Tests.lst

+Categories=EXPECTED_PASS
+HostStyle=0
+
+[GitHub_18522_6.cmd_11903]


_11906? Same below.

Is this actually important?

Not sure -- but now I remember there is a tool to regenerate these, so I suppose we should just use that.

tests\scripts\lst_creator.py

It would be nice if the tool output included the command line used to create that output.

The tool is a little funky because it does a recursive discovery of the tests. Hopefully with the work @sbomer is doing this legacy stuff just goes away. I think the easier approach here is to just keep it as it is, the numbering is not extremely important, it is just output by the tool to make sure the tags are unique. GitHub_18522_6.cmd_11903 should be unique enough.

Note that the tool will strip the number anyways, so this is not going to break its input either.

The jit was retyping all calls with small struct returns as power of two sized ints when generating code for linux x64 and arm32/arm64. When the results of those calls were assigned to fields the jit would use overly wide stores which could corrupt neighboring fields. The fix is to keep better track of the smallest power of two enclosing int type size for the struct. If this exactly matches the struct size then the the result of the call can be used in an arbitrary context. If the enclosing type is larger, the call's result must be copied to a temp and then the temp can be reinterpreted as the struct for subsequent uses.

Defer retyping inline candidates and tail call candidates. Then handle deferred updates for cases where calls don't get inlined.

Unlike Windows x64, these ABIs return 3 (and 5,6,7) byte structs in a register. So make the necessary transformations for those cases too. Note the actual behavior change is triggered by the code in `getPrimitiveTypeForStruct` so there are no ifdefs at the transform points here.

jit must handle: callee is not an inline candidate, is an inline candidate and gets inlined, or inline candidate that does not get inlined. In the case where the callee gets inlined, handle the transitive cases where the callee's returne value itself comes from a call. Add a 3 byte test case to get coverage on arm32. Add new tests to the arm32/arm64 test lists.

AndyAyersMS · 2018-06-26T16:24:29Z

Rebased and reordered so all the code commits (unsquashed, will squash on merge) come first and all the test commits (now squashed) come second.

AndyAyersMS · 2018-06-27T19:43:08Z

@briansull @CarolEidt would appreciate one more round of review (mainly those last two non-test commits) as this is a new part of the change that extends the handling to windows ABIs for arm.

briansull · 2018-06-25T19:36:03Z

src/jit/flowgraph.cpp

-#if FEATURE_MULTIREG_RET
-
-    // Did we record a struct return class handle above?
+#if FEATURE_MULTIREG_RET || defined(UNIX_AMD64_ABI)


You didn't need to add UNIX_AMD64_ABI to this ifdef
as line 556 of target.h defines this as 1 for UNIX_AMD64_ABI:

#define FEATURE_MULTIREG_RET 1 // Support for returning a single value in more than one register

Latest version just has FEATURE_MULTIREG_RET.

The jit was retyping all calls with small struct returns as power of two sized ints when generating code for linux x64 and arm32/arm64. When the results of those calls were assigned to fields the jit would use overly wide stores which could corrupt neighboring fields. The fix is to keep better track of the smallest power of two enclosing int type size for the struct. If this exactly matches the struct size then the the result of the call can be used in an arbitrary context. If the enclosing type is larger, the call's result must be copied to a temp and then the temp can be reinterpreted as the struct for subsequent uses. Defer retyping inline candidates and tail call candidates. Then handle deferred updates for cases where calls don't get inlined. Add test cases for 6 byte structs showing the various situations the jit must handle: callee is not an inline candidate, is an inline candidate and gets inlined, or inline candidate that does not get inlined. Add a 3 byte test case to get coverage on arm32. Add new tests to the arm32/arm64 test lists. Commit migrated from dotnet/coreclr@2f2a9b1

CarolEidt approved these changes Jun 20, 2018

View reviewed changes

jashook approved these changes Jun 20, 2018

View reviewed changes

briansull reviewed Jun 20, 2018

View reviewed changes

CarolEidt approved these changes Jun 21, 2018

View reviewed changes

briansull reviewed Jun 25, 2018

View reviewed changes

AndyAyersMS force-pushed the Fix18522 branch from c6a8f92 to e17fd4c Compare June 25, 2018 18:56

AndyAyersMS mentioned this pull request Jun 26, 2018

Fix value numbering when selecting a constant #18627

Merged

jakobbotsch reviewed Jun 26, 2018

View reviewed changes

AndyAyersMS added 7 commits June 26, 2018 09:14

Extend fix to handle struct that are not exact integer type sizes.

bdc5aac

handle short structs via temps really

44287dc

Handle inline candidate cases

352b645

Defer retyping inline candidates and tail call candidates. Then handle deferred updates for cases where calls don't get inlined.

review feedback

1830186

AndyAyersMS force-pushed the Fix18522 branch from 2b31db3 to eecd311 Compare June 26, 2018 16:22

briansull approved these changes Jun 27, 2018

View reviewed changes

AndyAyersMS merged commit 2f2a9b1 into dotnet:master Jun 28, 2018

AndyAyersMS deleted the Fix18522 branch June 28, 2018 08:00

jkotas mentioned this pull request Jun 28, 2018

Update CoreClr, CoreFx, CoreSetup, ProjectNTfs, ProjectNTfsTestILC to preview1-26628-04, preview1-26628-03, preview1-26628-01, beta-26628-00, beta-26628-00, respectively (master) dotnet/corefx#30725

Closed

JIT: fix bug returning small structs on linux x64 #18563

JIT: fix bug returning small structs on linux x64 #18563

Conversation

AndyAyersMS commented Jun 20, 2018 • edited Loading

AndyAyersMS commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

jakobbotsch commented Jun 20, 2018 • edited Loading

AndyAyersMS commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

jashook left a comment

Choose a reason for hiding this comment

jashook commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018 • edited Loading

briansull left a comment

Choose a reason for hiding this comment

jakobbotsch commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

jakobbotsch commented Jun 20, 2018 • edited Loading

AndyAyersMS commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

AndyAyersMS commented Jun 20, 2018

AndyAyersMS commented Jun 21, 2018

AndyAyersMS commented Jun 21, 2018

AndyAyersMS commented Jun 21, 2018

jakobbotsch commented Jun 21, 2018 • edited Loading

jakobbotsch commented Jun 21, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 21, 2018

AndyAyersMS commented Jun 22, 2018

AndyAyersMS commented Jun 22, 2018

AndyAyersMS commented Jun 22, 2018 • edited Loading

AndyAyersMS commented Jun 22, 2018 • edited Loading

AndyAyersMS commented Jun 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull Jun 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 25, 2018

AndyAyersMS commented Jun 25, 2018

AndyAyersMS commented Jun 25, 2018

AndyAyersMS commented Jun 25, 2018

AndyAyersMS commented Jun 26, 2018

AndyAyersMS commented Jun 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 26, 2018

AndyAyersMS commented Jun 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 20, 2018 •

edited

Loading

jakobbotsch commented Jun 20, 2018 •

edited

Loading

AndyAyersMS commented Jun 20, 2018 •

edited

Loading

jakobbotsch commented Jun 20, 2018 •

edited

Loading

jakobbotsch commented Jun 21, 2018 •

edited

Loading

AndyAyersMS commented Jun 22, 2018 •

edited

Loading

AndyAyersMS commented Jun 22, 2018 •

edited

Loading

briansull Jun 25, 2018 •

edited

Loading

AndyAyersMS Jun 26, 2018 •

edited

Loading