Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: fix bug returning small structs on linux x64 #18563

Merged
merged 7 commits into from
Jun 28, 2018

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jun 20, 2018

The jit was retyping all calls with small struct returns as 32 bit ints
when generating code for linux x64. When the results of those calls were
assigned to fields the jit would use 32 bit stores which could corrupt
neighboring fields.

The fix is to keep the small int types for 8 and 16 byte bit structs so that
the corresponding stores are the right size.

Fixes #18522.

@AndyAyersMS
Copy link
Member Author

@CarolEidt PTAL
cc @dotnet/jit-contrib

No diffs for windows. For linux this causes diff in both crossgen and pmi:

Crossgen Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary: (Lower is better)
Total bytes of diff: 2434 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
        1098 : System.Data.Common.dasm (0.04% of base)
         240 : Microsoft.DotNet.ProjectModel.dasm (0.05% of base)
         218 : Newtonsoft.Json.dasm (0.02% of base)
         198 : Microsoft.CodeAnalysis.CSharp.dasm (0.00% of base)
         198 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
Top file improvements by size (bytes):
        -112 : System.Private.CoreLib.dasm (0.00% of base)
          -4 : System.Linq.Parallel.dasm (0.00% of base)
          -4 : System.Net.HttpListener.dasm (0.00% of base)
28 total files with size differences (3 improved, 25 regressed), 102 unchanged.
Top method regessions by size (bytes):
          84 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (18 methods)
          80 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this (2 methods)
          78 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (18 methods)
          64 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref (2 methods)
          42 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref (2 methods)
Top method improvements by size (bytes):
         -68 : System.Private.CoreLib.dasm - DomainNeutralILStubClass:IL_STUB_CLRtoWinRT():struct:this (18 methods)
         -12 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:GetPoint():struct:this (2 methods)
         -12 : System.Private.CoreLib.dasm - CLRIPropertyValueImpl:GetSize():struct:this (2 methods)
         -10 : System.Private.CoreLib.dasm - ValueTuple:Create():struct (2 methods)
         -10 : System.Private.CoreLib.dasm - Task:Yield():struct (2 methods)
271 total methods with size differences (11 improved, 260 regressed), 141547 unchanged.
Completed analysis in 233.40s

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary: (Lower is better)
Total bytes of diff: 839 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
         554 : System.Data.Common.dasm (0.04% of base)
         120 : Microsoft.DotNet.ProjectModel.dasm (0.06% of base)
         105 : Newtonsoft.Json.dasm (0.02% of base)
          51 : xunit.runner.utility.dotnet.dasm (0.05% of base)
          48 : xunit.execution.dotnet.dasm (0.02% of base)
Top file improvements by size (bytes):
        -339 : System.Reflection.Metadata.dasm (-0.09% of base)
         -38 : System.Net.HttpListener.dasm (-0.02% of base)
27 total files with size differences (2 improved, 25 regressed), 103 unchanged.
Top method regessions by size (bytes):
          42 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (9 methods)
          40 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this
          39 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (9 methods)
          32 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref
          21 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref
Top method improvements by size (bytes):
         -94 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSignature(byref):struct:this (4 methods)
         -76 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeValue(struct,struct):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSpecificationSignature(byref):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeLocalSignature(byref):struct:this (4 methods)
         -38 : System.Net.HttpListener.dasm - <CloseNetworkConnectionAsync>d__50:MoveNext():this
297 total methods with size differences (16 improved, 281 regressed), 169407 unchanged.
Completed analysis in 133.55s

PMI diffs on the new test case:

 ; Assembly listing for method GitHub_18522:Main():int
 ; Emitting BLENDED_CODE for X64 CPU with AVX
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; Final local variable assignments
 ;
 ;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
 ;  V01 tmp1         [V01,T00] (  3,  6   )   byref  ->  rax        
-;  V02 tmp2         [V02    ] (  3,  3   )  struct ( 8) [rsp+0x10]   do-not-enreg[SF] ld-addr-op
-;  V03 tmp3         [V03    ] (  3,  3   )  ushort  ->  [rsp+0x10]   do-not-enreg[] V02.F0(offs=0x00) P-DEP
-;  V04 cse0         [V04,T01] (  4,  4   )    long  ->  [rsp+0x08]  
+;* V02 tmp2         [V02    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op
+;* V03 tmp3         [V03    ] (  0,  0   )  ushort  ->  zero-ref    V02.F0(offs=0x00) P-INDEP
+;  V04 cse0         [V04,T01] (  4,  4   )    long  ->  [rsp+0x00]  
 ;
-; Lcl frame size = 24
+; Lcl frame size = 8
 G_M49330_IG01:
-       sub      rsp, 24
+       push     rax
 G_M49330_IG02:
        mov      rdi, 0xD1FFAB1E
        mov      esi, 3
        call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        mov      word  ptr [rax+10], 170
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        add      rax, 8
-       mov      word  ptr [rsp+10H], 0
-       mov      word  ptr [rsp+10H], 0
-       mov      edi, dword ptr [rsp+10H]
-       mov      dword ptr [rax], edi         // bug: wide store
+       mov      word  ptr [rax], 0           // fix: narrow store (+ const)
        mov      rax, 0xD1FFAB1E
        mov      rax, gword ptr [rax]
        cmp      word  ptr [rax+10], 170
        je       SHORT G_M49330_IG04
        xor      eax, eax
 G_M49330_IG03:
-       add      rsp, 24
+       add      rsp, 8
        ret      
 G_M49330_IG04:
        mov      eax, 100
 G_M49330_IG05:
-       add      rsp, 24
+       add      rsp, 8
        ret      
-; Total bytes of code 118, prolog size 4 for method GitHub_18522:Main():int
+; Total bytes of code 100, prolog size 1 for method GitHub_18522:Main():int

 ; Assembly listing for method GitHub_18522:M113():struct
 ; Emitting BLENDED_CODE for X64 CPU with AVX
 ; optimized code
 ; partially interruptible
 ; Final local variable assignments
 ;
-;  V00 loc0         [V00    ] (  2,  2   )  struct ( 8) [rsp+0x00]   do-not-enreg[SF] must-init ld-addr-op
+;* V00 loc0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op
 ;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
-;  V02 tmp1         [V02    ] (  2,  2   )  ushort  ->  [rsp+0x00]   do-not-enreg[] V00.F0(offs=0x00) P-DEP
+;* V02 tmp1         [V02    ] (  0,  0   )  ushort  ->  zero-ref    V00.F0(offs=0x00) P-INDEP
 ;
-; Lcl frame size = 8
+; Lcl frame size = 0
 G_M32228_IG01:
-       push     rax
-       xor      rax, rax
-       mov      qword ptr [rsp], rax
 G_M32228_IG02:
-       mov      word  ptr [rsp], 0
-       mov      eax, dword ptr [rsp]
+       xor      eax, eax
 G_M32228_IG03:
-       add      rsp, 8
        ret      
-; Total bytes of code 21, prolog size 7 for method GitHub_18522:M113():struct
+; Total bytes of code 3, prolog size 0 for method GitHub_18522:M113():struct

@AndyAyersMS
Copy link
Member Author

I have not yet figured out if this is a regression. I believe the bug requires that the small struct be assigned directly to a field which is perhaps not as common as one might imagine.

@jakobbotsch
Copy link
Member

jakobbotsch commented Jun 20, 2018

This fixes most examples (jakobbotsch/Fuzzlyn@a603f0e), but some still remain unfixed:

I can recheck when you make updates. 😄

EDIT: Some of these examples needed rereducing. See below.

@AndyAyersMS
Copy link
Member Author

@jakobbotsch thanks much! I'll look at the new examples. It's really nice to have Fuzzlyn double-checking things. Keep up the good work.

Arm and Arm64 failures look like the masked load tests. I though these were disabled?

@CarolEidt
Copy link

Arm and Arm64 failures look like the masked load tests. I though these were disabled?

@jashook indicates that they still need to be removed from the .lst files, and he'll do so shortly.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, could you add the test to the arm/arm64 lstFiles so the regression is run on windows arm(64) as well?

@jashook
Copy link

jashook commented Jun 20, 2018

#18569 addresses arm(64) failures

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 20, 2018

(edited: oops, was using old jits -- with the fix above things look to be correct...)

First new example 11923495345812789064 looks ok from a diff standpoint:

;;; windows
       48B96829AC14F8010000 mov      rcx, 0x1F814AC2968
       488B09               mov      rcx, gword ptr [rcx]
       4883C108             add      rcx, 8
       C60100               mov      byte  ptr [rcx], 0

;;; linux
       48BF6829055807020000 mov      rdi, 0x20758052968
       488B3F               mov      rdi, gword ptr [rdi]
       4883C708             add      rdi, 8
       C60700               mov      byte  ptr [rdi], 0

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good
I see that the bug was indeed in the #ifdef UNIX section

@jakobbotsch
Copy link
Member

(edited: oops, was using old jits -- with the fix above things look to be correct...)

Ooops. It appears the bug is still there before reduction, but the reduced example I have posted does not have the bug (i.e. the reducer needs to be rerun again to display the new variant required).
I have verified that at least 11049252875418439527 and 19651690852725464 have the issue.

@AndyAyersMS
Copy link
Member Author

In 11049252875418439527 there is a 6 byte struct S0 returned an assigned. Windows assigns to it in two steps 4 bytes then 2. Linux does it in one step of 8 bytes.

;;; windows
       8B10                 mov      edx, dword ptr [rax]
       8911                 mov      dword ptr [rcx], edx
       668B5004             mov      dx, word  ptr [rax+4]
       66895104             mov      word  ptr [rcx+4], dx

;;; linux
       488B4016             mov      rax, qword ptr [rax+22]
       4889470E             mov      qword ptr [rdi+14], rax   // writes an extra 2 bytes

Implication is that if we are indeed going to return non-power of two sized structs in registers then we need a different strategy to unpack these on the caller side -- we can't just map them to the normal power of two int type sizes (unless it's "safe" to expand the destination size too, which it sometimes is). So my fix above likely only works if the struct being copied is 1, 2, or 4 bytes. We also need to properly handle sizes 3, 5, 6, and 7.

@jakobbotsch
Copy link
Member

jakobbotsch commented Jun 20, 2018

Here are the rereduced examples:

It may also be informative to look at the commit diff:
jakobbotsch/Fuzzlyn@299fdaa

Sorry about the confusion earlier.

@AndyAyersMS
Copy link
Member Author

No problem. I haven't checked them all, but I suspect all these new cases have S0 sizes that are not powers of two.

One thought as to how to fix the size 3, 5, 6, 7 cases is to flag when the "primitive type" produced by getPrimitiveTypeForStruct is larger than the actual type returned, and force the call to return the result to a temp which can be safely widened to a power of two size. Then reinterpret the temp as a struct and assign from the temp to the destination.

Not sure if this is viable though.

Also may be clunky if/when we inline as we may see what look like type mismatches that could block inlines, or reinterpret/copy chains (struct -> int -> int -> struct) that don't nicely optimize away.

@AndyAyersMS
Copy link
Member Author

And just for the record:

  • On windows x86/x64 we won't return non-power-of-2 sized structs in registers.
  • Windows arm32 and arm64 should have similar bugs with the 3, 5, 6, 7 size cases unless there is some compensating magic I have overlooked.

@CarolEidt
Copy link

I am probably thinking about this too simplistically, but I think the right thing to do is to not change the type in the non-power-of-2 case, and simply allow struct assignments to be created as needed. Then morph should decide (hopefully correctly) when a copy is required.
The wrinkle is that it may be the case that there are assumptions that we don't assign a non-scalar from a call. But I don't know why it shouldn't work, as long as the LHS of the assignment is a node that returns a valid struct handle from gtGetStructHandleIfPresent()

@AndyAyersMS
Copy link
Member Author

Yeah, I had that idea too -- basically just use the windows rules.

By doing this we would (I believe) diverge from the SysV ABI for some kinds of small structs. Not sure if that has a ripple effect anywhere.

@CarolEidt
Copy link

By doing this we would (I believe) diverge from the SysV ABI

I was actually assuming that the RHS of the struct assignment could simply be the call (i.e. the register result), and we would handle any necessary temp creation in fgMorphCopyBlock, but it is probably the case that it would require more complexity than I'm imagining.

@AndyAyersMS
Copy link
Member Author

Ah, I see. I might need to disentangle things a bit to make sure we still think of it as something returned by value ... but it might work.

@AndyAyersMS
Copy link
Member Author

What I'm leaning towards is roughly:

  • add a new struct passing kind, something like SPK_EnclosingPrimitiveType -- to indicate that the value is returned in a single register that is larger than the actual return type. In those cases, getReturnTypeForStruct would return TYP_STRUCT.
  • ReturnTypeDesc::InitializeStructReturnType would handle this case like it does SPK_PrimitiveType but with different asserts, keeping the return type as TYP_STRUCT.
  • Because of this we'd no longer retype those odd-sized struct returning calls, and morph would handle them as struct assigns.
  • We would still retype if the struct was 1, 2, 4, or 8 bytes and would get somewhat better codegen for those cases.

@AndyAyersMS
Copy link
Member Author

Still wondering about the returned via two register cases, though maybe the really odd ones require ARM64?

@jakobbotsch if you get a chance to tweak Fuzzlyn, you might try playing around with cases where the assigned structs are 9-15 bytes in size.

@CarolEidt
Copy link

@AndyAyersMS - that sounds like the right direction.

We would still retype if the struct was 1, 2, 4, or 8 bytes and would get somewhat better codegen for those cases.

It would be good to (eventually) consider whether we could get the same codegen without retyping there. IIRC fgMorphCopyBlock should take care of those cases via fgMorphOneAsgBlockOp, as long as we can defer any temp creation until that time.

@AndyAyersMS
Copy link
Member Author

I'm willing to give it a try now, since I end up having to special case power of 2 sizes, and I could just toss all that.

@AndyAyersMS
Copy link
Member Author

Updated to try and handle 3, 5, 6, and 7 byte struct cases. May be a bit ragged, but is holding up on the handful of tests I have locally.

Will post diff stats later and look at more examples.

@AndyAyersMS
Copy link
Member Author

No new PMI or Crossgen FX diffs for Linux. Will run PMI diffs on tests overnight.

@AndyAyersMS
Copy link
Member Author

No new test diffs other than on the newly added test case. Diffs there:

-       mov      rdi, qword ptr [rdi+22]
-       mov      qword ptr [rax+14], rdi    // store 8 bytes at rax + 14 -- bad
+       lea      rdi, bword ptr [rdi+22]
+       add      rax, 14
+       mov      esi, dword ptr [rdi]
+       mov      dword ptr [rax], esi       // store 4 bytes at rax + 14 -- ok
+       mov      si, word  ptr [rdi+4]
+       mov      word  ptr [rax+4], si      // store 2 bytes at rax + 18 -- ok
...
-; Total bytes of code 124, prolog size 10 for method GitHub_18522_1:Main():int
+; Total bytes of code 136, prolog size 10 for method GitHub_18522_1:Main():int

@jakobbotsch
Copy link
Member

jakobbotsch commented Jun 21, 2018

I can confirm that fixes all the previous examples, and also a few others I thought were unrelated.
jakobbotsch/Fuzzlyn@d8c9dbd
(ignore 11949841452559086614, something very strange seems to be going on here, I'm trying to get to the bottom of that one...)
EDIT: Seems the original cause of 11949841452559086614 was indeed this bug, but the reducer managed to reduce it to a different bug causing the same behavior. Now the question is if that one is new...

@jakobbotsch
Copy link
Member

@jakobbotsch if you get a chance to tweak Fuzzlyn, you might try playing around with cases where the assigned structs are 9-15 bytes in size.

Fuzzlyn already (probabilistically) generates structs of this size. It does not appear to have found anything in this area.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!
@jashook also requested you add the tests to tests\arm\Tests.lst and tests\arm64\Tests.lst

@@ -8465,7 +8465,14 @@ GenTree* Compiler::impFixupCallStructReturn(GenTreeCall* call, CORINFO_CLASS_HAN
if (retRegCount == 1)
{
// struct returned in a single register
call->gtReturnType = retTypeDesc->GetReturnRegType(0);
// retype iff struct size exactly matches integer type size.
if (retTypeDesc->IsEnclosingType())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that you didn't just reverse the condition and avoid the empty stmt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some JITDUMP stuff there at one point. Will update.

@@ -141,7 +141,8 @@ void Compiler::lvaInitTypeRef()
Compiler::structPassingKind howToReturnStruct;
var_types returnType = getReturnTypeForStruct(retClsHnd, &howToReturnStruct);

if (howToReturnStruct == SPK_PrimitiveType)
// We can safely widen the return type for enclosed structs.
if ((howToReturnStruct == SPK_PrimitiveType) || (howToReturnStruct == SPK_EnclosingType))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that the compRetNativeType is widened. I assume, then, that morph is taking care of the necessary transformation to ensure that the SPK_EnclosingType case is handled, so that codegen gets what it needs for this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. I should look at this part more closely.

There is some code in morph that can be simplified now as SPK_Primitive type implies power of two sizes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code was for arg passing, not for return values. So nothing to do there.

@AndyAyersMS
Copy link
Member Author

Did a bit of testing over on OSX with older dotnet SDKs, and as far as I can tell this bug goes back to at least Core 1.1.

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib two big changes since you all last reviewed:

  • for the "enclosing type" small structs, copy return value to temp during import if we know the call will remain a call
  • if we defer doing copy to temp in the importer because the call is an inline candidate, do it later once we've resolved the fate of the inline.

Test cases to hopefully cover the spectrum of possibilities fairly well.

Still thinking of ditching the "enclosing type" concept since everwhere we check it we have the class handle and so can refetch the size and compare it to the primitive size.

Now see a few more diffs in normal PMI runs over FX.

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 linuxnonjit.dll
Summary:
(Lower is better)
Total bytes of diff: 839 (0.00% of base)
    diff is a regression.
Top file regressions by size (bytes):
         554 : System.Data.Common.dasm (0.04% of base)
         120 : Microsoft.DotNet.ProjectModel.dasm (0.06% of base)
         105 : Newtonsoft.Json.dasm (0.02% of base)
          51 : xunit.runner.utility.dotnet.dasm (0.05% of base)
          48 : xunit.execution.dotnet.dasm (0.02% of base)
Top file improvements by size (bytes):
        -339 : System.Reflection.Metadata.dasm (-0.09% of base)
         -38 : System.Net.HttpListener.dasm (-0.02% of base)
27 total files with size differences (2 improved, 25 regressed), 103 unchanged.
Top method regessions by size (bytes):
          42 : System.Data.Common.dasm - SqlByte:op_Explicit(struct):struct (9 methods)
          40 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this
          39 : System.Data.Common.dasm - SqlBoolean:op_Explicit(struct):struct (9 methods)
          32 : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Combine(ref):ref
          21 : Microsoft.DotNet.ProjectModel.dasm - DependencyContextBuilder:GetCompilationOptions(ref):ref
Top method improvements by size (bytes):
         -94 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSignature(byref):struct:this (4 methods)
         -76 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeValue(struct,struct):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeMethodSpecificationSignature(byref):struct:this (4 methods)
         -49 : System.Reflection.Metadata.dasm - SignatureDecoder`2:DecodeLocalSignature(byref):struct:this (4 methods)
         -38 : System.Net.HttpListener.dasm - <CloseNetworkConnectionAsync>d__50:MoveNext():this
297 total methods with size differences (16 improved, 281 regressed), 169409 unchanged.

The overall diff impact is still small and the regressions are mostly necessary diffs. Some of the new diffs happen in methods that return a byte-sized struct:

 ; ============================================================
 ; Assembly listing for method System.Reflection.Metadata.MethodSignature`1[__Canon][System.__Canon]:get_ReturnType():ref:this
 ; Emitting BLENDED_CODE for X64 CPU with AVX

 G_M15744_IG01:
 
 G_M15744_IG02:
-       8B4710               mov      eax, dword ptr [rdi+16]
+       480FBE4710           movsx    rax, byte  ptr [rdi+16]
 
 G_M15744_IG03:
        C3                   ret      
 
-; Total bytes of code 4, prolog size 0 for method System.Reflection.Metadata.MethodSignature`1[Int32][System.Int32]:get_Header():struct:this
+; Total bytes of code 6, prolog size 0 for method System.Reflection.Metadata.MethodSignature`1[Int32][System.Int32]:get_Header():struct:this

Previously bytes 1-3 might contain unrelated stuff.

@AndyAyersMS
Copy link
Member Author

Hmm, looks like a rebase is going to be needed. When I do that I'll also reorganize so all the code commits are first (and squashed) and the test commits second (likewise squashed). Let me see how this holds up in testing before doing any of that.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 22, 2018

The new CoreFX legs (checked, release) seem to be generally broken.

No diffs on windows over FX with both crossgen and PMI.
No new diffs for linux over FX via crossgen.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 22, 2018

The new _7 test failed for arm32/arm64. Investigating.
Arm64 seems to be writing 4 bytes after the call instead of 3:

            bl      GitHub_18522_7:M16():struct
            str     w0, [x19]

@AndyAyersMS
Copy link
Member Author

Looks like ARM64 / Windows needs similar "return via temp" logic for the 3, 5, 6, 7 size cases... so more cases to handle.

// we need to preserve small types.
useType = GetEightByteType(structDesc, 0);
howToReturnStruct = SPK_PrimitiveType;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, you might want to add that for the other cases useType will remain TYP_UNKNOWN instead of changing to TYP_INT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

else
{
// Currently: 3, 5, 6, or 7 byte structs
assert(structSize <= genTypeSize(useType));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert should be less than:

assert(structSize < genTypeSize(useType));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Set 'useType' to the type of the first eightbyte item
useType = GetEightByteType(structDesc, 0);
assert(structDesc.passedInRegisters == true);
if (structDesc.eightByteClassifications[0] == SystemVClassificationTypeSSE)
Copy link

@briansull briansull Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add:

assert(structSize <= sizeof(double)));

As the code at 1031 which finalized the value of useType depends on this property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -4223,6 +4227,8 @@ class Compiler
{
SPK_Unknown, // Invalid value, never returned
SPK_PrimitiveType, // The struct is passed/returned using a primitive type.
SPK_EnclosingType, // Like SPK_Primitive type, but used for return types that
// are not a power of two byte size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be more like:

// Like SPK_Primitive type,  but used for return types that 
// require a primative type temp that is larger that the struct size.
// Currently used for structs of sizes: 3,5,6 or 7 bytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure if I'm keeping this -- let me work through arm32 / arm64 and see how I feel once that's fixed.

@AndyAyersMS
Copy link
Member Author

Rebased to get past test list conflicts...

@AndyAyersMS
Copy link
Member Author

BTW, I still expect arm32/arm64 to fail on some of the new tests -- in fact I need to understand why the 6 byte struct cases are not failing. Perhaps there is magic somewhere that I could extend to the 3 byte case.

@AndyAyersMS
Copy link
Member Author

Result is as expected: Arm64 fails all the new tests; Arm32 only fails the _7 test since that tests a 3 byte struct, not a 6 byte struct. No magic.

@AndyAyersMS
Copy link
Member Author

Ok, hopefully this is functionally complete. SPK_EnclosingType is still there and I've updated the comment per Brian's suggestion.

@AndyAyersMS
Copy link
Member Author

FMA test failed on Ubuntu -- no PMI diffs in this test, so am going to wager this was our old friend Random (see #10585).

@dotnet-jit retest Ubuntu x64 Checked Innerloop Build and Test

@AndyAyersMS
Copy link
Member Author

Re FMA failing: also #18467

Categories=EXPECTED_PASS
HostStyle=0

[GitHub_18522_6.cmd_11903]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_11906? Same below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually important?

Copy link
Member Author

@AndyAyersMS AndyAyersMS Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure -- but now I remember there is a tool to regenerate these, so I suppose we should just use that.

tests\scripts\lst_creator.py

It would be nice if the tool output included the command line used to create that output.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool is a little funky because it does a recursive discovery of the tests. Hopefully with the work @sbomer is doing this legacy stuff just goes away. I think the easier approach here is to just keep it as it is, the numbering is not extremely important, it is just output by the tool to make sure the tags are unique. GitHub_18522_6.cmd_11903 should be unique enough.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the tool will strip the number anyways, so this is not going to break its input either.

The jit was retyping all calls with small struct returns as power of two
sized ints when generating code for linux x64 and arm32/arm64. When the
results of those calls were assigned to fields the jit would use overly wide
stores which could corrupt neighboring fields.

The fix is to keep better track of the smallest power of two enclosing int type
size for the struct. If this exactly matches the struct size then the the
result of the call can be used in an arbitrary context. If the enclosing type is
larger, the call's result must be copied to a temp and then the temp can be
reinterpreted as the struct for subsequent uses.
Defer retyping inline candidates and tail call candidates.
Then handle deferred updates for cases where calls don't get inlined.
Unlike Windows x64, these ABIs return 3 (and 5,6,7) byte structs in a register.
So make the necessary transformations for those cases too.

Note the actual behavior change is triggered by the code in
`getPrimitiveTypeForStruct` so there are no ifdefs at the transform
points here.
jit must handle: callee is not an inline candidate, is an inline candidate
and gets inlined, or inline candidate that does not get inlined.

In the case where the callee gets inlined, handle the transitive cases
where the callee's returne value itself comes from a call.

Add a 3 byte test case to get coverage on arm32.

Add new tests to the arm32/arm64 test lists.
@AndyAyersMS
Copy link
Member Author

Rebased and reordered so all the code commits (unsquashed, will squash on merge) come first and all the test commits (now squashed) come second.

@AndyAyersMS
Copy link
Member Author

@briansull @CarolEidt would appreciate one more round of review (mainly those last two non-test commits) as this is a new part of the change that extends the handling to windows ABIs for arm.

#if FEATURE_MULTIREG_RET

// Did we record a struct return class handle above?
#if FEATURE_MULTIREG_RET || defined(UNIX_AMD64_ABI)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't need to add UNIX_AMD64_ABI to this ifdef
as line 556 of target.h defines this as 1 for UNIX_AMD64_ABI:

#define FEATURE_MULTIREG_RET 1 // Support for returning a single value in more than one register

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version just has FEATURE_MULTIREG_RET.

@AndyAyersMS AndyAyersMS merged commit 2f2a9b1 into dotnet:master Jun 28, 2018
@AndyAyersMS AndyAyersMS deleted the Fix18522 branch June 28, 2018 08:00
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
The jit was retyping all calls with small struct returns as power of two
sized ints when generating code for linux x64 and arm32/arm64. When the
results of those calls were assigned to fields the jit would use overly wide
stores which could corrupt neighboring fields.

The fix is to keep better track of the smallest power of two enclosing int type
size for the struct. If this exactly matches the struct size then the the
result of the call can be used in an arbitrary context. If the enclosing type is
larger, the call's result must be copied to a temp and then the temp can be
reinterpreted as the struct for subsequent uses.

Defer retyping inline candidates and tail call candidates.
Then handle deferred updates for cases where calls don't get inlined.

Add test cases for 6 byte structs showing the various situations the
jit must handle: callee is not an inline candidate, is an inline candidate
and gets inlined, or inline candidate that does not get inlined.

Add a 3 byte test case to get coverage on arm32. Add new tests to the
arm32/arm64 test lists.


Commit migrated from dotnet/coreclr@2f2a9b1
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deterministic program outputs indeterministic results on Linux in release
5 participants